Diabetes duration is key information for epidemiologic studies but is not routinely collected in real-world data, such as claims and electronic health records (EHRs). This study aimed to build a predictive model for diabetes duration to nourish future research. Using data from the National Health and Nutrition Examination Survey (2009 to 2018), we identified individuals with self-reported diabetes and extracted information routinely collected in EHR. Predictors included demographics (e.g., age, sex), biomarkers (e.g., HbA1c, systolic blood pressure), diabetes-related comorbidities (e.g., retinopathy, end-stage renal disease), and glucose-lowering therapy (e.g., insulin, metformin). We used diabetes duration (in years) as the outcome. We compared the ordinary least square (OLS) model, least absolute shrinkage and selection operator (LASSO) regression, random forest, and extreme gradient boosting (XGBoost) models, using 10-fold cross-validation for tuning hyperparameters. A total of 3,267 survey participants were included, with a median diabetes duration of 9 years (Q1: 4 years, Q3: 16 years). The LASSO regression achieved the best performance (Root Mean Square Error [RMSE]: 7.62, Mean Absolute Error [MAE]: 5.74, Average Error [AE]: 0.53), followed by random forest (RMSE: 7.63, MAE: 5.74, AE: 0.47), XGBoost (RMSE: 7.63, MAE: 5.76, AE: 0.55), and OLS model (RMSE: 7.64, MAE: 5.76, AE: 0.59). The random forest algorithm identified age, insulin therapy, metformin monotherapy, retinopathy, and HbA1c as the predominating factors associated with diabetes duration. The prediction is more accurate if the diabetes duration is: 1) <10 years (RMSE:4.43, MAE: 3.66, AE: 0.10); 2) 10 to 20 years (RMSE: 5.71, MAE: 4.72, AE: 0.79). Our model could properly predict the diabetes duration using information available in EHR data. Model performance improved when applied to individuals living with diabetes for shorter than 20 years.


D.Guan: None. T.Jiao: None. H.Shao: Consultant; Lilly Diabetes. P.Li: None. V.Fonseca: Consultant; Abbott, Corcept Therapeutics, Eli Lilly and Company, Other Relationship; BRAVO4HEALTH, LLC, Research Support; Fractyl Health, Inc., Stock/Shareholder; Amgen Inc. L.Shi: None. M.K.Ali: Advisory Panel; Bayer Inc., Eli Lilly and Company, Research Support; Merck & Co., Inc. J.Varghese: None. R.M.Carrillo-larco: None. M.Rouhizadeh: None. A.G.Winterstein: Consultant; Bayer Inc., Genentech, Inc., Ipsen Biopharmaceuticals, Inc., Research Support; Merck Sharp & Dohme Corp.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.