Repository logoRepository logo

Statistical and Machine Learning Models for Predicting of Hospital Cost on Diagnosis Related Groups (DRGs) in Chronic Disease in Southern Thailand

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Prince of Songkla University

Abstract

This dissertation applied statistical methods for predicting hospital cost on Diagnosis Related Groups (DRGs) for chronic disease in Southern Thailand. This study consists of two parts. The first part of this dissertation aimed to analyze the determinants of costs for chronic disease patient visits in a major public hospital based on hospital claim data from Suratthani hospital in 2016. There was a total of 18,342 records of hospital visit costs. The determinant for predicting hospital cost included age and gender, principal and up to 12 diagnoses, up to 12 number of procedures, length of stays and discharge status. Linear regression was used to analyze associations between determinants and outcome. This study shows that the hospital cost determinants for chronic disease patients were the number of procedures (r2=0.54) and length of hospital stay (r2 = 0.43) with r2 of 0.73. In conclusion, the main factors effected hospital costs for chronic disease are the number of procedures and length of hospital stay. The objective of the second part of this dissertation was to compare linear regression, penalized linear: including lasso ridge and elastic net and machine learning models: including support vector regression (SVR), neural network (NN) random forest (RF), and Extreme Gradient Boosting (XGBoost) prediction performance of hospital visit cost from chronic disease in Thailand. The original data was divided into a training and testing set with 70:30 ratios and a double-sized dataset produced by the bootstrap technique. All models' predictive performance was measured with root mean square error (RMSE) and the Coefficient of determination (r2). The results revealed that the RF model had the best predictive performance of hospital visit cost for all dataset sizes in training and testing datasets with the lowest prediction errors. In contrast, linear regression had the most inadequate prediction performance and the highest prediction errors. RF, XGBoost, NN, and SVR models had better prediction performance for larger samples except for the linear regression model and penalized linear. In conclusion, linear regression and penalized linear models had similar prediction performance for all sample sizes, whereas machine learning had better performance when the sample size increased.

Description

Doctor of Philosophy in Research Methodology , 2022

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Thailand