Repository logoRepository logo

การเปรียบเทียบวิธีการประมาณค่าสูญหายของตัวแปรตามในตัวแบบการถดถอยลอจิสติก

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

มหาวิทยาลัยสงขลานครินทร์

Abstract

Missing data is an important issue affecting data analysis. It can lead to erroneous conclusions. The objective of this study is to compare and develop the performances of missing data imputation methods applied to binary logistic regression analysis. Seven imputation methods were applied: mode imputation (Mode), hot deck imputation (HD), multiple imputation (MI), k-nearest neighbor imputation (KNN), random forest imputation (RF), logistic regression imputation (LR), and modified logistic regression imputation (MLR), a method developed from the LR method by modifying the cutoff point from 0.5 to an optimal cutoff point for that dataset. In this study, missing data were simulated under three types of mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The simulation was run using sample sizes of 20, 50, 100, 150, 200, 500, and 1,000 and missing percentages of 10%, 20%, 30%, and 40%. The estimated mean square error (EMSE) was used to compare performances. The results revealed that the developed MLR method had the best performance with small sample sizes but the MI method had the best performance with large sample sizes. The performances of the imputation methods decreased when the percentage of missing data increased. However, when the sample size increased, performances increased.

Description

วิทยานิพนธ์ (วท.ม. (สถิติประยุกต์))--มหาวิทยาลัยสงขลานครินทร์, 2566

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Thailand