Medication non-adherence is one of the common leading public health challenges facing the U.S. Poor medication adherence in patients with diabetes, especially type 2 diabetes mellitus (T2DM), may be associated with inadequate glycemic control, increased morbidity and mortality and lead to increased health services utilization and hospital admissions. The use of predictive models based on ML using “big” healthcare data can help identify and predict a subpopulation with high risk of nonadherence, thus providing a scope for improving value in health care and reducing the cost burden. In this study, we extracted 111,180 T2DM patients initiating metformin monotherapy (index date) from the Truven database to train the ML models and predict patients’ level of adherence to metformin monotherapy. Patients must have had at least 6 months of pre-index (baseline) and at least 2 years of post-index (follow-up) data available. Adherence was measured as proportion of days covered (PDC) for metformin in the second year after the index date with PDC >= 0.8 labeled as high-adherence and < 0.8 labeled as low-adherence. A total of 120 variables were extracted including patients’ demographic information, T2DM-related medication and procedure use, comorbidities, and metformin usage information. In the preliminary analysis, random forest classifiers with 80% random split training set and validated with the 20% test set has reached 0.73 accuracy and 0.73 sensitivity. Due to the high dimensionality and sparsity of the features, we will apply further feature engineering and additional non-linear ML models such as XGboosting, BART and super learner to compare with logistic regression model and to optimize the accuracy and sensitivity. A T2DM cohort from Optum claims database will be used for cross-validation to ensure the generalization of the model. This ML model could also be adapted for other therapeutic areas.


X. Chen: Employee; Self; Merck & Co., Inc. G. Fernandes: None. J. Chen: None. Z. Liu: Employee; Self; Merck & Co., Inc. R. Baumgartner: None.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at