To determine if natural language processing (NLP) improves detection of nonsevere hypoglycemia (NSH) in patients with type 2 diabetes and no NSH documentation by diagnosis codes and to measure if NLP detection improves the prediction of future severe hypoglycemia (SH).
From 2005 to 2017, we identified NSH events by diagnosis codes and NLP. We then built an SH prediction model.
There were 204,517 patients with type 2 diabetes and no diagnosis codes for NSH. Evidence of NSH was found in 7,035 (3.4%) of patients using NLP. We reviewed 1,200 of the NLP-detected NSH notes and confirmed 93% to have NSH. The SH prediction model (C-statistic 0.806) showed increased risk with NSH (hazard ratio 4.44; P < 0.001). However, the model with NLP did not improve SH prediction compared with diagnosis code–only NSH.
Detection of NSH improved with NLP in patients with type 2 diabetes without improving SH prediction.
Introduction
The risk of hypoglycemia (glucose <70 mg/dL) with diabetes treatment is well recognized, and its prevention is considered a “critical component of diabetes management” (1). Episodes of severe hypoglycemia (SH), defined in this study as requiring hospitalization or emergency department visit in patients with type 2 diabetes, can be identified through structured electronic medical record (EMR) or administrative data (2,3). However, nonsevere hypoglycemia (NSH), not requiring assistance for recovery (1), is often reported by patients during outpatient visits and captured only if an EMR hypoglycemia diagnosis code is entered. Natural language processing (NLP) combines computer science and linguistics methods (4) and can be used to extract nonstructured EMR text data. The improvement of EMR hypoglycemia detection using NLP had been previously reported (5). We aimed to describe capture of NSH in patients with type 2 diabetes using both diagnosis codes and NLP and to create a predictive model for SH events.
Research Design and Methods
We used a modified version of the algorithm used by Kho et al. (6) to identify patients in the Cleveland Clinic Health System with type 2 diabetes during 2005 to 2017. Hypoglycemia events were identified using ICD-9/ICD-10 codes (Supplementary Appendix 1); events not associated with hospitalization/emergency department visit were categorized as NSH. A randomly selected subset of patient records stratified by years was reviewed to confirm code identification of NSH. The study protocol was approved by the Insitutional Review Board at Cleveland Clinic (Cleveland, OH).
Using clinical progress notes, an NLP algorithm was developed. The cTAKES program (7) was used to break down sentences into phrases to identify hypoglycemia-related Unified Medical Language System concepts. Regular expressions were written to classify polarity (event or no event) of phrases by pattern matching. Using word embeddings, the remaining phrases were classified with additional pattern matching (see Supplementary Appendix 2 for detailed descriptions). For each year, 100 events were randomly selected, with 2005 to 2006 combined given the lower number of notes. The phrases were reviewed by practicing clinicians and confirmed through record review of a subset of patients as representing true events. The weighted proportion of actual events in a set of 1,200 notes was used to define the positive predictive value.
We then limited our data set to include patients with at least one primary care or endocrinology department visit each year and created a Cox proportional hazards prediction model for SH (5 years’ duration) using the following variables based upon our previous work (2): sex, race, median income (8), history of comorbidities (cardiovascular disease, congestive heart failure [CHF], depression, other psychiatric disorders, dementia, cognitive impairment, chronic kidney disease [CKD], and alcohol or substance abuse), and time-dependent varables including age, insurance type (Medicare, Medicaid, commercial, or other), glycosylated hemoglobin (HbA1c), BMI, and diabetes medications (insulin, sulfonylurea, glucagon-like peptide 1 receptor agonists [GLP-1RAs], sodium–glucose cotransporter 2 inhibitors, dipeptidyl peptidase 4 inhibitors, metformin, and α-glucosidase inhibitors). We had previously identified NSH by ICD codes as a variable associated with SH but for this study added two variables: ever-history of NSH and history of NSH within the past 3 months detected by codes and NLP. We also included interaction variables for HbA1c, with insulin, and with sulfonylureas.
Results
The number of patients with type 2 diabetes in this data set increased from 12,706 in 2005 to 176,001 in 2017 (210,191 unique patients). A total of 1,177,590 progress notes were processed. Upon clinician chart review, 1,111 of 1,200 randomly selected events classified as NSH by NLP were confirmed (93% positive predictive value). From 2005 to 2017, 10,205 NSH events were captured by codes and 14,763 events by NLP, with overlap of only 5 events. Among 204,517 patients with no codes for NSH, evidence of NSH was found in 7,035 (3.4%) using NLP.
The chart review confirmed NSH episodes were often documented under general diabetes diagnosis categories. The incidence proportion of patients with SH by ICD codes increased from 0.3 to 1.7% and those with NSH increased from 0.4 to 1.3% from 2005 to 2017. When NLP was added, the incidence proportion of patients with NSH increased from 0.8% (2005) to 2.6% (2017).
There were 47,280 patients included to create the prediction model for SH (Supplementary Appendix 3). The models using NSH codes alone and codes plus NLP (Table 1) had C-statistics of 0.812 and 0.806, respectively. The codes plus NLP model showed increased risk for SH with NSH (ever-history of NSH: hazard ratio [HR] 4.44, P < 0.001; and NSH in past 3 months: HR 1.65, P < 0.001), black race (HR 1.81; P < 0.001), Medicaid insurance (HR 1.35; P = 0.008), history of cardiovascular disease (HR 1.58; P < 0.001), CHF (HR 2.35; P < 0.001), depression (HR 1.28; P = 0.03), psychiatric disorders other than depression (HR 1.55; P < 0.001), alcohol or substance abuse (HR 1.55; P = 0.04), and CKD (HR 1.86; P < 0.001). The effect of insulin use on SH was greater at lower HbA1c (HR of 3.32 when HbA1c is 6% [42 mmol/mol]; HR of 2.04 when HbA1c is 9% [75 mmol/mol]). There was decreased risk of SH with higher BMI (HR 0.69 when BMI was 30 kg/m2; HR 0.63 when BMI was 35 kg/m2) and decreased risk with metformin (HR 0.53; P < 0.001) and GLP-1RA (HR 0.36; P < 0.001). Sulfonylurea use had a mixed effect on the risk of SH (HR of 1.61 when HbA1c was 6% [42 mmol/mol]; HR of 0.69 when HbA1c was 9% [75 mmol/mol]). The effect of HbA1c varied at the extremes. With a reference HbA1c of 6% (42 mmol/mol), the HR for SH was 1.59 at an HbA1c of 5% (31 mmol/mol), HR was 0.73 at an HbA1c of 7% (53 mmol/mol), and HR was 1.39 when HbA1c was 9% (75 mmol/mol). Of note, NSH within the past 3 months was only a significant predictor in the the model including NLP. Receiver operating characteristic curves comparing code-detected NSH, NLP-detected NSH, and code plus NLP–detected NSH models were similar (Supplementary Appendix 4), while Kaplan-Meier curves for probability of being free from SH showed lower probability for code-detected NLP (Supplementary Appendix 5).
Prediction models for risk of SH according to method used for capturing NSH events: diagnosis codes only or diagnosis codes plus NLP
Variable . | NSH using diagnosis codes (C index = 0.812) . | NSH using diagnosis codes plus NLP (C index = 0.806) . | ||||
---|---|---|---|---|---|---|
Adjusted HR* . | 95% CI . | P value† . | Adjusted HR‡ . | 95% CI . | P value† . | |
History of NSH in past 3 months | 1.116 | 0.805, 1.546 | 0.51 | 1.647 | 1.239, 2.189 | <0.001 |
Ever-history of NSH | 8.872 | 7.464, 10.546 | <0.001 | 4.441 | 3.745, 5.268 | <0.001 |
Age | 1.004 | 0.996, 1.012 | 0.30 | 1.007 | 0.999, 1.015 | 0.08 |
Sex, male | 0.906 | 0.78, 1.054 | 0.20 | 0.866 | 0.745, 1.007 | 0.06 |
Race | <0.001 | <0.001 | ||||
White | Reference | Reference | ||||
Black | 1.827 | 1.527, 2.186 | 1.813 | 1.515, 2.169 | ||
Other | 1.031 | 0.741, 1.435 | 1.060 | 0.762, 1.475 | ||
Median income (per 1,000 U.S. dollars), based on patient’s zip code | 0.999 | 0.995, 1.003 | 0.57 | 0.999 | 0.996, 1.003 | 0.76 |
Insurance | 0.009 | 0.008 | ||||
Medicare | Reference | Reference | ||||
Medicaid | 1.359 | 1.013, 1.822 | 1.351 | 1.007, 1.813 | ||
Commercial | 0.846 | 0.676, 1.058 | 0.846 | 0.676, 1.058 | ||
Other | 0.787 | 0.582, 1.065 | 0.772 | 0.571, 1.045 | ||
HbA1c, %§ | <0.001 | <0.001 | ||||
5 | 1.483 | 1.200, 1.832 | 1.593 | 1.289, 1.969 | ||
6 | Reference | Reference | ||||
7 | 0.770 | 0.650, 0.912 | 0.725 | 0.611, 0.859 | ||
8 | 1.007 | 0.832, 1.219 | 0.932 | 0.770, 1.129 | ||
9 | 1.502 | 1.207, 1.870 | 1.385 | 1.113, 1.724 | ||
BMI, kg/m2‖ | <0.001 | <0.001 | ||||
25 | Reference | Reference | ||||
30 | 0.712 | 0.638, 0.795 | 0.690 | 0.618, 0.770 | ||
35 | 0.644 | 0.569, 0.729 | 0.630 | 0.557, 0.713 | ||
Cardiovascular disease | 1.625 | 1.359, 1.944 | <0.001 | 1.579 | 1.321, 1.889 | <0.001 |
CHF | 2.261 | 1.821, 2.807 | <0.001 | 2.349 | 1.893, 2.916 | <0.001 |
Depression | 1.264 | 1.005, 1.589 | 0.045 | 1.282 | 1.02, 1.611 | 0.03 |
Other psychiatric disorders | 1.486 | 1.253, 1.763 | <0.001 | 1.549 | 1.307, 1.835 | <0.001 |
Dementia | 1.410 | 0.905, 2.195 | 0.13 | 1.367 | 0.88, 2.124 | 0.16 |
Cognitive impairment | 1.000 | 0.633, 1.58 | 1.0 | 1.051 | 0.667, 1.655 | 0.83 |
CKD | 1.843 | 1.529, 2.222 | <0.001 | 1.858 | 1.54, 2.241 | <0.001 |
Alcohol or substance abuse | 1.458 | 0.96, 2.215 | 0.08 | 1.551 | 1.021, 2.354 | 0.04 |
Insulin§ | <0.001 | <0.001 | ||||
HbA1c 5% | 3.108 | 1.976, 4.888 | 2.637 | 1.675, 4.153 | ||
HbA1c 6% | 3.797 | 3.063, 4.706 | 3.323 | 2.671, 4.134 | ||
HbA1c 7% | 4.238 | 3.388, 5.300 | 3.794 | 3.024, 4.761 | ||
HbA1c 8% | 3.297 | 2.638, 4.121 | 2.921 | 2.330, 3.661 | ||
HbA1c 9% | 2.344 | 1.846, 2.977 | 2.037 | 1.600, 2.594 | ||
Sulfonylurea§ | <0.001 | <0.001 | ||||
HbA1c 5% | 2.934 | 1.765, 4.878 | 2.484 | 1.498, 4.120 | ||
HbA1c 6% | 1.806 | 1.434, 2.273 | 1.610 | 1.280, 2.026 | ||
HbA1c 7% | 1.160 | 0.907, 1.483 | 1.085 | 0.849, 1.386 | ||
HbA1c 8% | 0.885 | 0.698, 1.123 | 0.851 | 0.671, 1.079 | ||
HbA1c 9% | 0.705 | 0.533, 0.932 | 0.694 | 0.525, 0.917 | ||
GLP-1RA | 0.364 | 0.228, 0.581 | <0.001 | 0.361 | 0.227, 0.576 | <0.001 |
DPP-4 | 0.924 | 0.723, 1.181 | 0.53 | 0.869 | 0.68, 1.112 | 0.26 |
SGLT2i | 0.539 | 0.255, 1.142 | 0.11 | 0.576 | 0.272, 1.22 | 0.15 |
Metformin | 0.551 | 0.465, 0.653 | <0.001 | 0.532 | 0.449, 0.63 | <0.001 |
AGI | 1.279 | 0.527, 3.105 | 0.59 | 1.373 | 0.566, 3.335 | 0.48 |
Variable . | NSH using diagnosis codes (C index = 0.812) . | NSH using diagnosis codes plus NLP (C index = 0.806) . | ||||
---|---|---|---|---|---|---|
Adjusted HR* . | 95% CI . | P value† . | Adjusted HR‡ . | 95% CI . | P value† . | |
History of NSH in past 3 months | 1.116 | 0.805, 1.546 | 0.51 | 1.647 | 1.239, 2.189 | <0.001 |
Ever-history of NSH | 8.872 | 7.464, 10.546 | <0.001 | 4.441 | 3.745, 5.268 | <0.001 |
Age | 1.004 | 0.996, 1.012 | 0.30 | 1.007 | 0.999, 1.015 | 0.08 |
Sex, male | 0.906 | 0.78, 1.054 | 0.20 | 0.866 | 0.745, 1.007 | 0.06 |
Race | <0.001 | <0.001 | ||||
White | Reference | Reference | ||||
Black | 1.827 | 1.527, 2.186 | 1.813 | 1.515, 2.169 | ||
Other | 1.031 | 0.741, 1.435 | 1.060 | 0.762, 1.475 | ||
Median income (per 1,000 U.S. dollars), based on patient’s zip code | 0.999 | 0.995, 1.003 | 0.57 | 0.999 | 0.996, 1.003 | 0.76 |
Insurance | 0.009 | 0.008 | ||||
Medicare | Reference | Reference | ||||
Medicaid | 1.359 | 1.013, 1.822 | 1.351 | 1.007, 1.813 | ||
Commercial | 0.846 | 0.676, 1.058 | 0.846 | 0.676, 1.058 | ||
Other | 0.787 | 0.582, 1.065 | 0.772 | 0.571, 1.045 | ||
HbA1c, %§ | <0.001 | <0.001 | ||||
5 | 1.483 | 1.200, 1.832 | 1.593 | 1.289, 1.969 | ||
6 | Reference | Reference | ||||
7 | 0.770 | 0.650, 0.912 | 0.725 | 0.611, 0.859 | ||
8 | 1.007 | 0.832, 1.219 | 0.932 | 0.770, 1.129 | ||
9 | 1.502 | 1.207, 1.870 | 1.385 | 1.113, 1.724 | ||
BMI, kg/m2‖ | <0.001 | <0.001 | ||||
25 | Reference | Reference | ||||
30 | 0.712 | 0.638, 0.795 | 0.690 | 0.618, 0.770 | ||
35 | 0.644 | 0.569, 0.729 | 0.630 | 0.557, 0.713 | ||
Cardiovascular disease | 1.625 | 1.359, 1.944 | <0.001 | 1.579 | 1.321, 1.889 | <0.001 |
CHF | 2.261 | 1.821, 2.807 | <0.001 | 2.349 | 1.893, 2.916 | <0.001 |
Depression | 1.264 | 1.005, 1.589 | 0.045 | 1.282 | 1.02, 1.611 | 0.03 |
Other psychiatric disorders | 1.486 | 1.253, 1.763 | <0.001 | 1.549 | 1.307, 1.835 | <0.001 |
Dementia | 1.410 | 0.905, 2.195 | 0.13 | 1.367 | 0.88, 2.124 | 0.16 |
Cognitive impairment | 1.000 | 0.633, 1.58 | 1.0 | 1.051 | 0.667, 1.655 | 0.83 |
CKD | 1.843 | 1.529, 2.222 | <0.001 | 1.858 | 1.54, 2.241 | <0.001 |
Alcohol or substance abuse | 1.458 | 0.96, 2.215 | 0.08 | 1.551 | 1.021, 2.354 | 0.04 |
Insulin§ | <0.001 | <0.001 | ||||
HbA1c 5% | 3.108 | 1.976, 4.888 | 2.637 | 1.675, 4.153 | ||
HbA1c 6% | 3.797 | 3.063, 4.706 | 3.323 | 2.671, 4.134 | ||
HbA1c 7% | 4.238 | 3.388, 5.300 | 3.794 | 3.024, 4.761 | ||
HbA1c 8% | 3.297 | 2.638, 4.121 | 2.921 | 2.330, 3.661 | ||
HbA1c 9% | 2.344 | 1.846, 2.977 | 2.037 | 1.600, 2.594 | ||
Sulfonylurea§ | <0.001 | <0.001 | ||||
HbA1c 5% | 2.934 | 1.765, 4.878 | 2.484 | 1.498, 4.120 | ||
HbA1c 6% | 1.806 | 1.434, 2.273 | 1.610 | 1.280, 2.026 | ||
HbA1c 7% | 1.160 | 0.907, 1.483 | 1.085 | 0.849, 1.386 | ||
HbA1c 8% | 0.885 | 0.698, 1.123 | 0.851 | 0.671, 1.079 | ||
HbA1c 9% | 0.705 | 0.533, 0.932 | 0.694 | 0.525, 0.917 | ||
GLP-1RA | 0.364 | 0.228, 0.581 | <0.001 | 0.361 | 0.227, 0.576 | <0.001 |
DPP-4 | 0.924 | 0.723, 1.181 | 0.53 | 0.869 | 0.68, 1.112 | 0.26 |
SGLT2i | 0.539 | 0.255, 1.142 | 0.11 | 0.576 | 0.272, 1.22 | 0.15 |
Metformin | 0.551 | 0.465, 0.653 | <0.001 | 0.532 | 0.449, 0.63 | <0.001 |
AGI | 1.279 | 0.527, 3.105 | 0.59 | 1.373 | 0.566, 3.335 | 0.48 |
AGI, α-glucosidase inhibitor; DPP-4, dipeptidyl peptidase 4 inhibitor; SGLT2i, sodium–glucose cotransporter 2 inhibitor.
Adjusted for NSH based on diagnosis code.
Wald test. For HbA1c, BMI, insulin, and sulfonylurea, it tests the linearity of the variable, all interactions, and all nonlinear terms.
Adjusted for NSH based on diagnosis code and NLP.
The restricted cubic spline term was used. The knots are placed at 6%, 7%, and 8% of HbA1c.
The restricted cubic spline term was used. The knots are placed at 25, 30, and 35 kg/m2 of BMI.
Conclusions
Building upon previous work identifying SH in our health system, we found that applying NLP to progress notes improved NSH detection and that NSH is a significant predictor for SH. However, SH prediction using NSH diagnosis codes did not improve with adding NLP, possibly related to the strength of the other variables in contributing to the model and possible greater severity of coded NSH as the Kaplan-Meier curves suggest. There was little overlap between NSH events by codes plus NLP, suggesting that while NSH may be reported by patients, providers may not enter a hypoglycemia diagnosis code, highlighting the benefit of NLP.
While it is known that NSH may have significant adverse consequences (9), treatment deintensification is uncommon (10,11). Identifying patients with increased risk for hypoglycemia (12,13) and alerting providers may affect provider behavior to adjust treatment for at-risk patients (13). Our prediction model identifies demographic characteristics and comorbidities predicting increased SH risk. Modifiable predictors include history of NSH and sulfonylurea and insulin use predicting increased risk of SH at lower HbA1c levels and metformin and GLP-1RAs predicting lower risk. The HbA1c effect is of higher predicted risk overall at very low HbA1c levels at which treatment deintensification may be relevant, but also at very high HbA1c, at which fluctuating blood sugars may increase SH risk and limit ability to attain a lower HbA1c.
While our data set is robust, we recognize these limitations: 1) the EMR from one health system does not capture NSH or SH events outside of the system, and 2) lack of diabetes duration information in EMR for prediction. Future work will focus on notifying providers at the point of care of captured NSH events to consider deintensification, medication adjustments to lower SH risk, or meal and activity timing, especially in patients with multiple comorbidities, to reduce risk of future SH.
This article contains supplementary material online at https://doi.org/10.2337/figshare.12116709.
Article Information
Duality of Interest. This study was funded by Novo Nordisk, Inc. A.D.M.-H. has received research support from Merck, Novo Nordisk, Inc., Boehringer Ingelheim, and the Agency for Healthcare Research and Quality (K08-HS-024128) within the past 12 months. A.M. reports receiving research funding from Merck, Boehringer Ingelheim, Novartis, and Novo Nordisk, Inc., within the past 12 months. A.Z., X.J., J.M.B., and M.W.K. report receiving research funding from Merck and Novo Nordisk, Inc., within the past 12 months. T.D.H., W.W., M.M., and R.G. report being employees of Novo Nordisk, Inc., and holding company stock. P.P. and S.X.K. report being employees of Novo Nordisk, Inc., and holding company stock at the time of the study. K.M.P. reports receiving research funding from Novo Nordisk, Inc., and Merck; receiving consulting fees from Novo Nordisk, Inc., Sanofi, Eli Lilly and Company, and Merck; and participating in the speakers bureaus of Novo Nordisk, Inc., Merck, and AstraZeneca within the past 12 months. R.S.Z. reports receiving research funding from Novo Nordisk, Inc., and Merck and participating in the speakers bureaus of Merck and Johnson & Johnson within the past 12 months. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. A.D.M.-H., A.M., T.D.H., S.X.K., and K.M.P. researched and analyzed the data, contributed to the study design/conception, and contributed to the drafting and/or critical revision of the manuscript. A.Z. researched and analyzed the data. X.J., W.W., P.P., and R.S.Z. researched and analyzed the data and contributed to the study design/conception. M.M. researched and analyzed the data and contributed to the drafting and/or critical revision of the manuscript. R.G. contributed to the study design/conception. J.M.B. oversaw the project, contributed to the study design/conception, and contributed to the drafting and/or critical revision of the manuscript. M.W.K. researched and analyzed the data and contributed to the drafting and/or critical revision of the manuscript. All authors have reviewed and approved the final version of the manuscript. A.D.M.-H. is the guarantor of this work and, as such, had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.