Statistical and Machine Learning Models for Predicting of Hospital Cost on Diagnosis Related Groups (DRGs) in Chronic Disease  in Southern Thailand

Wichayaporn Thongpeth

Please use this identifier to cite or link to this item: http://kb.psu.ac.th/psukb/handle/2016/17868

Title:	Statistical and Machine Learning Models for Predicting of Hospital Cost on Diagnosis Related Groups (DRGs) in Chronic Disease in Southern Thailand
Other Titles:	ตัวแบบทางสถิติและการเรียนรู้ด้วยเครื่องสำหรับทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วมในโรคเรื้อรังในภาคใต้ของประเทศไทย
Authors:	Don McNeil Apiradee Lim Wichayaporn Thongpeth Faculty of Science and Technology (Mathematics and Computer Science) คณะวิทยาศาสตร์และเทคโนโลยี ภาควิชาคณิตศาสตร์และวิทยาการคอมพิวเตอร์
Keywords:	Chronic Disease;Diagnosis Related Groups;Machine Learning;Hospital Cost
Issue Date:	2022
Publisher:	Prince of Songkla University
Abstract:	This dissertation applied statistical methods for predicting hospital cost on Diagnosis Related Groups (DRGs) for chronic disease in Southern Thailand. This study consists of two parts. The first part of this dissertation aimed to analyze the determinants of costs for chronic disease patient visits in a major public hospital based on hospital claim data from Suratthani hospital in 2016. There was a total of 18,342 records of hospital visit costs. The determinant for predicting hospital cost included age and gender, principal and up to 12 diagnoses, up to 12 number of procedures, length of stays and discharge status. Linear regression was used to analyze associations between determinants and outcome. This study shows that the hospital cost determinants for chronic disease patients were the number of procedures (r2=0.54) and length of hospital stay (r2 = 0.43) with r2 of 0.73. In conclusion, the main factors effected hospital costs for chronic disease are the number of procedures and length of hospital stay. The objective of the second part of this dissertation was to compare linear regression, penalized linear: including lasso ridge and elastic net and machine learning models: including support vector regression (SVR), neural network (NN) random forest (RF), and Extreme Gradient Boosting (XGBoost) prediction performance of hospital visit cost from chronic disease in Thailand. The original data was divided into a training and testing set with 70:30 ratios and a double-sized dataset produced by the bootstrap technique. All models' predictive performance was measured with root mean square error (RMSE) and the Coefficient of determination (r2). The results revealed that the RF model had the best predictive performance of hospital visit cost for all dataset sizes in training and testing datasets with the lowest prediction errors. In contrast, linear regression had the most inadequate prediction performance and the highest prediction errors. RF, XGBoost, NN, and SVR models had better prediction performance for larger samples except for the linear regression model and penalized linear. In conclusion, linear regression and penalized linear models had similar prediction performance for all sample sizes, whereas machine learning had better performance when the sample size increased.
Abstract(Thai):	วิทยานิพนธ์นี้เป็นการประยุกต์ใช้วิธีการทางสถิติในการสร้างตัวแบบทางสถิติสำหรับทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วม (DRGs) ของโรคเรื้อรังในประเทศไทย และเปรียบเทียบประสิทธิภาพในการทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วม (DRGs) ของโรคเรื้อรังระหว่างตัวแบบทางสถิติและตัวแบบการเรียนรู้ด้วยเครื่อง โดยแบ่งการศึกษาออกเป็นสองส่วน ดังนี้ ส่วนที่หนึ่งของการศึกษามีวัตถุประสงค์เพื่อศึกษาความสัมพันธ์ของปัจจัยที่มีผลต่อต้นทุนโรงพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วม (DRGs) ของโรคเรื้อรังและสร้างตัวแบบในการทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วมโดยใช้ข้อมูลจากฐานข้อมูลผู้ป่วยในของโรงพยาบาลสุราษฎร์ธานีที่ใช้ในการเบิกจ่ายกับหลักประกันสุขภาพแห่งชาติ จำนวนที่เข้ารับการรักษารวมทั้งสิ้น 18,342 ครั้ง ตัวแปรที่ใช้ในการทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วม คือ อายุ เพศ การวินิจฉัยโรคหลัก จำนวนการวินิจฉัยโรคแทรกซ้อน จำนวนหัตถการและการรักษา สถานภาพการจําหนายผูปวย จำนวนวันนอนในโรงพยาบาล ค่าใช้จ่ายในการรักษา ทำการวิเคราะห์ความสัมพันธ์ระหว่างปัจจัยทำนายและตัวแปรตามด้วยตัวแบบการถดถอยเชิงเส้น ผลการศึกษาพบว่าปัจจัยที่มีผลต่อต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วม มีความสัมพันธ์กับต้นทุนโรงพยาบาลในระบบกลุ่มวินิจฉัยโรคร่วมในโรคเรื้อรัง โดยมีค่า r2 เท่ากับ 0.73 และปัจจัยที่มีความสัมพันธ์กับต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรค ร่วมในระดับที่สูง คือ จำนวนการทำหัตถการและการรักษา (r2=0.54) และจำนวนวันนอนในโรงพยาบาล (r2 = 0.43) โดยสรุป ปัจจัยหลักที่กำหนดค่ารักษาพยาบาลตามกลุ่มวินิจฉัยโรคร่วมในโรคเรื้อรัง คือ จำนวนการทำหัตถการและการรักษา และจำนวนวันนอนในโรงพยาบาล ส่วนที่สองของการศึกษานี้ มีวัตถุประสงค์เพื่อเปรียบเทียบประสิทธิภาพของแบบจำลองในทำนายต้นทุนค่ารักษาพยาบาลตามระบบกลุ่มวินิจฉัยโรคร่วมด้วยตัวแบบทางสถิติแบบเชิงเส้น (Linear Regression: LR) วิธีการถดถอยเชิงเส้นที่ปรับด้วยฟังก์ชันการลงโทษ (Penalized Linear Regression) ประกอบด้วย การถดถอยเชิงเส้นด้วยวิธีริดจ์ (Ridge Regression) การถดถอยเชิงเส้นด้วยวิธีแลซโซ (Lasso Regression) วิธีการถดถอยอิลาสติคเน็ต (Elastic Net Regression) และตัวแบบการเรียนรู้ด้วยเครื่อง (Machine Learning: ML) ประกอบด้วยการเทคนิคซัพพอร์ตเวกเตอร์รีเกรสชัน (Support Vector Regression: SVR) โครงข่ายประสาทเทียม (Neural Network: NN) และป่าสุ่ม (Random Forest: RF) และ เอ็กซ์ทรีมกาเดียนบูทติ้ง (Extreme Gradient Boosting: XGBoost) ทำการแบ่งกลุ่มข้อมูลเป็นชุดข้อมูลเรียนรู้ และชุดข้อมูลทดสอบ ในสัดส่วน 70:30 และเพิ่มขนาดข้อมูลโดยวิธีบูตสแตรป (bootstraps) 2 เท่าและ 4 เท่า และวัดประสิทธิภาพการทำนายของแบบจำลองทั้งหมดด้วยค่าความคลาดเคลื่อนกำลังสองเฉลี่ย (Root mean square error: RMSE) และสัมประสิทธิ์การกําหนด (Coefficient of determination: r2) ผลการศึกษาพบว่าการวิเคราะห์แบบวิธีป่าสุ่มให้ประสิทธิภาพของการทำนายดีที่สุด ทั้งในข้อมูลที่ไม่ได้เพิ่มและเพิ่มขนาดตัวอย่าง โดยประสิทธิภาพการทำนายดีขึ้นเมื่อข้อมูลมีขนาดใหญ่ขึ้น ในขณะที่แบบจำลองทางสถิติ วิธีการถดถอยเชิงเส้นที่ปรับด้วยฟังก์ชันการลงโทษและเทคนิค ซัพพอร์ตเวกเตอร์รีเกรสชันให้ประสิทธิภาพการทำนายใกล้เคียงกันสำหรับข้อมูลที่ไม่ได้มีการเพิ่มขนาดตัวอย่าง โดยสรุปการถดถอยเชิงเส้นและการถดถอยเชิงเส้นที่ปรับด้วยฟังก์ชันการลงโทษ มีประสิทธิภาพในการทำนายใกล้เคียงกันและไม่เปลี่ยนแปลงสำหรับข้อมูลทั้งที่เพิ่มและไม่เพิ่มขนาดส่วนแบบจำลองการเรียนรู้ด้วยเครื่องมีประสิทธิภาพดีขึ้นเมื่อขนาดข้อมูลใหญ่ขึ้น
Description:	Doctor of Philosophy in Research Methodology , 2022
URI:	http://kb.psu.ac.th/psukb/handle/2016/17868
Appears in Collections:	746 Thesis

Files in This Item:

File	Description	Size	Format
6320330002.pdf	บทความหลัก	6.93 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License