Background & Objective: Diabetic Kidney Disease (DKD) is leading cause of end-stage renal disease, cardiovascular disease, and mortality in patients with diabetes. This study is aim to develop and validate machine learning (ML)-based DKD prediction model for patients with type 2 diabetes (T2D).
Method: We conducted a retrospective study using common data model based on electronic health records of three tertiary referral hospitals in Korea: Jeonbuk National University Hospital (JNUH) cohort for developing ML model, Wonkwang University Hospital (WKUH) and Kyunghee University Hospital (KHUH) cohorts for external validation. Patients with T2DM who had been followed for more than 3 years were included. Three ML algorithms (Random Forest(RF), XGBoost(XGB), Ensemble(RF+XGB)) were applied, and the optimal model was selected according to the area under a receiver operating characteristic curve (AUROC) and accuracy.
Results: A total of 20,528 patients were included, with a DKD occurrence of 1,361 (26.6%), 269 (26.5%), 5,461 (37.9%) in the JBUH, WKUH, and KHUH cohort, respectively. The RF model yielded the highest AUROC of 0.79 and accuracy of 71% on the JNUH cohort, showing better performance than XGB, Ensemble models. The top five important features included baseline estimated glomerular filtration rate, age, hemoglobin, serum creatinine and hemoglobin A1c. Application of RF model in two external cohorts yielded 57-77% sensitivity, 73-79% specificity, and 0.76-0.85 AUROC.
Conclusion: This study suggested that the RF model could contribute to the prediction of the development of DKD in T2D. Further research is needed on larger samples of participants using different ML models to determine the most effective one.
T. Park: None. K. Lee: None.