Visual Abstract

Background: EMRs have enabled the application of machine learning (ML) to characterize disease progression. ML requires large data to produce meaningful results, thus penalizing health systems with small EMRs. Applying models pre-trained on different EMRs is useful but may also suffer from discrepancies between medical sites using different disease codes, procedures, etc. We study the models trained on a large EMR and applied on a smaller one to estimate T2DM complication time using survival models.

Method: We trained survival models on a large US insurance claim dataset to create a right-censored cohort (C_1) that includes onset time of T2DM complications, patient profiles, and disease histories. We constructed features using age, sex, and past occurrences of 250 most frequent Clinical Classification Software codes. We trained Random Survival Forest (RSF) models on this cohort. We then created another cohort (C_2) from a smaller EMR from a large Japanese hospital, on which we trained RSF models and also applied models trained on C_1.

Results: Table 1 shows a feature importance ranking of trained RSF models and C-indices results. When applying C_1 pre-trained models to C_2, C-indices are high in Nephrology, Hyperosmolar, and Neuropathy. In these cases, common features are highly ranked by both models.

Conclusions: This study illustrates the applicability of survival models with disease code covariates on cohorts from different geos.

Disclosure

A. Koseki: Employee; Self; IBM, Research Support; Self; Astellas Pharma Inc. M. Makino: None. A. Suzuki: Research Support; Self; Chugai Pharmaceutical Co., Ltd., Ono Pharmaceutical Co., Ltd., Taisho Pharmaceutical Co., Ltd., Takeda Pharmaceutical Co., Speaker’s Bureau; Self; Eli Lilly Japan K. K. R. Tokumasu: None. P. Chakraborty: None. M. Ghalwash: None. T. Iwamori: None. M. Kudo: Employee; Self; IBM. D. Sow: None. H. Yanagisawa: None. R. Yanagiya: None.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.