Objective: This study aimed to assess the validity and utility of a machine learning (ML) model for predicting the 3-year incidence of neurodegenerative diseases (ND) in patients with type 2 diabetes (T2DM).

Methods: We used data from two cohorts, the discovery cohort (one hospital; n=22,311) and the validation cohort (two hospitals; n=2,915) of patients with T2DM recruited between 2008-2022. The outcome of interest was the presence/absence of ND at 3 years. We selected different ML-based models with hyperparameter tuning in the discovery cohort and conducted an area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort.

Results: ND was observed in 133 (0.6%) in the discovery cohort and 15 (0.5%) in the validation cohort. The AdaBoost model had a mean AUROC of 0.82 (95% CI, 0.79-0.85) in the discovery dataset. When this result was applied to the validation dataset, the AdaBoost model exhibited the best performance among the models, with an AUROC of 0.83 (accuracy of 78.6%, sensitivity of 78.6%, specificity of 78.6%, and balanced accuracy of 78.6%). The most influential factors in the AdaBoost model were age and cardiovascular disease.

Conclusion: This study demonstrates the potential utility of ML in predicting ND incidence in patients with T2DM, highlighting its feasibility for patient screening.

Disclosure

H. Sang: None. M. Lee: None. H. Lee: None. J. Park: None. S. Kim: None. H. Woo: None. Y. Hwang: None. T. Park: None. H. Lim: None. D. Yon: None. S. Rhee: None.

Funding

National Institutes of Health Research Project of South Korea (No. 2022-ER1102-01).

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.