Identifying when an incident diabetes (DM) diagnosis was made is complicated using retrospective, structured electronic health record (EHR) data alone. Unstructured clinical notes have been underused but contain valuable information that could complement traditional methods. However, manually reviewing clinical notes is time-consuming. We developed and validated a simple rule-based Natural Language Processing (NLP) method to extract incident DM timing from clinical notes.

In a single center we used structured EHR data to identify a cohort (age <45 as of 12/31/19) with likely type 1 (T1D) or type 2 (T2D) DM based on 2016-2019 records: (≥1 T1D ICD-10 code and insulin and no other DM medication) or ([≥2 T2D and no T1D codes] or [≥1 T2D code and a DM medication besides insulin or metformin]).

This cohort had 2,654 patients (548,316 clinical notes, 2003-present). We randomly selected 58,450 clinical notes (1,465 patients) as a training set to look for relevant text patterns. We handcrafted the rules into our NLP tool. We required 3 distinct concepts at the sentence level to determine an incident DM diagnosis: DM (not, e.g., epilepsy), an onset attribute (e.g., “diagnosed in”), and a temporal component (e.g., 8/2008). We pre-defined all related keywords and date formats for these concepts in our training notes. We then tested the NLP algorithm against manual review in an independent set of 100 randomly selected patients from the cohort. Analysis was at the patient level (true+: ≥1 true+ note per patient).

NLP in the training set found 1,268 patients with at least 1 of the 3 concepts and 826 patients with all 3. In the test set, we excluded 4 patients without substantive notes. NLP correctly detected incident DM timing in 73 of 96 patients. The NLP had recall 88%, specificity 77%, precision (PPV) 96%, and NPV 50%.

NLP was helpful in finding incident DM timing and may complement structured EHR queries for identifying incident DM. Refinement of our NLP algorithm is ongoing.

Disclosure

A.Wong: None. V.W.Zhong: None. M.Rosenman: None.

Funding

Centers for Disease Control and Prevention (1U18DP006693-01-00)

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.