Maturity onset diabetes of the young (MODY) is a rare cause of diabetes. Most clinical information required to diagnose MODY is difficult to extract from Electronic Health Records (EHR) due to the lack of specific billing code and diagnoses found only in clinical notes. Natural language processing (NLP) can be used to extract information from clinical text and assign labels. In this study, we designed NLP algorithms to scan clinical notes, characterize structured information, and confirmed classification with chart review to develop a reusable process to identify MODY patients in EHR.A de-identified version of the EHR from a large academic medical center was used. We built NLP algorithms to identify notes that mention MODY or MODY 14 genes. We further characterized the relevant mention with context labels), then reviewed charts to confirm MODY cases. We implemented published electronic phenotype algorithms for type 1 diabetes (T1D) and type 2 diabetes (T2D) to identify alternative diagnoses and characterize overlap. Finally, we extracted structured clinical features related to diabetes including frequency of billing codes, outpatient diabetes prescriptions, and laboratory values (A1C, c-peptide, antibody).The total number of notes identified with NLP was 6240 belonging to 959 unique subjects. The notes including negation, affirmation, possible, and family FH were 799, 3407, 1408, and 372 and subjects were 122, 354, and 181, respectively. The MODY count confirmed with review was 354.In MODY cohort, 66% of MODY subjects had T1D codes, 59% had T2D codes, 45% had both T1D and T2D codes, and 20% had no diabetes codes. The number of subjects identified using published algorithms that overlapped the MODY cohort was 26 subjects for T2D and 6 subjects for T1D. On average, MODY subjects had more T1D codes compared to T2D diabetes (59.8 vs. 35.3). The mean A1C of MODY subjects was 7.7 which was lower than for algorithm defined T1D, 8.2. They received lower insulin treatment compared to non-insulin treatment, on average.


L. Sulieman: None. A. Ramirez: None.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at