Objective: This study aimed to assess the validity and utility of a machine learning (ML) model for predicting the 3-year incidence of neurodegenerative diseases (ND) in patients with type 2 diabetes (T2DM).
Methods: We used data from two cohorts, the discovery cohort (one hospital; n=22,311) and the validation cohort (two hospitals; n=2,915) of patients with T2DM recruited between 2008-2022. The outcome of interest was the presence/absence of ND at 3 years. We selected different ML-based models with hyperparameter tuning in the discovery cohort and conducted an area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort.
Results: ND was observed in 133 (0.6%) in the discovery cohort and 15 (0.5%) in the validation cohort. The AdaBoost model had a mean AUROC of 0.82 (95% CI, 0.79-0.85) in the discovery dataset. When this result was applied to the validation dataset, the AdaBoost model exhibited the best performance among the models, with an AUROC of 0.83 (accuracy of 78.6%, sensitivity of 78.6%, specificity of 78.6%, and balanced accuracy of 78.6%). The most influential factors in the AdaBoost model were age and cardiovascular disease.
Conclusion: This study demonstrates the potential utility of ML in predicting ND incidence in patients with T2DM, highlighting its feasibility for patient screening.
H. Sang: None. M. Lee: None. H. Lee: None. J. Park: None. S. Kim: None. H. Woo: None. Y. Hwang: None. T. Park: None. H. Lim: None. D. Yon: None. S. Rhee: None.
National Institutes of Health Research Project of South Korea (No. 2022-ER1102-01).