A variety of symptoms may be associated with type 2 diabetes and its complications. Symptoms in chronic diseases may be described in terms of prevalence, severity, and trajectory and often co-occur in groups, known as symptom clusters, which may be representative of a common etiology. The purpose of this study was to characterize type 2 diabetes–related symptoms using a large nationwide electronic health record (EHR) database.
We acquired the Cerner Health Facts, a nationwide EHR database. The type 2 diabetes cohort (n = 1,136,301 patients) was identified using a rule-based phenotype method. A multistep procedure was then used to identify type 2 diabetes–related symptoms based on International Classification of Diseases, 9th and 10th revisions, diagnosis codes. Type 2 diabetes–related symptoms and co-occurring symptom clusters, including their temporal patterns, were characterized based the longitudinal EHR data.
Patients had a mean age of 61.4 years, 51.2% were female, and 70.0% were White. Among 1,136,301 patients, there were 8,008,276 occurrences of 59 symptoms. The most frequently reported symptoms included pain, heartburn, shortness of breath, fatigue, and swelling, which occurred in 21–60% of the patients. We also observed over-represented type 2 diabetes symptoms, including difficulty speaking, feeling confused, trouble remembering, weakness, and drowsiness/sleepiness. Some of these are rare and difficult to detect by traditional patient-reported outcomes studies.
To the best of our knowledge, this is the first study to use a nationwide EHR database to characterize type 2 diabetes–related symptoms and their temporal patterns. Fifty-nine symptoms, including both over-represented and rare diabetes-related symptoms, were identified.
It is estimated that 34.2 million adults in the United States (13.0% of all adults in the United States) are living with diabetes (1). Up to 90% of these adults have a diagnosis of type 2 diabetes, which is characterized by a progressive loss of adequate β-cell insulin secretion, frequently on the background of insulin resistance, and leads to microvascular and macrovascular complications and profound psychological and physical distress (2,3). Diabetes is the seventh leading cause of death in the United States, with an estimated 4 million deaths related to diabetes and complications of diabetes in 2017 (1). People with type 2 diabetes have a 15% increased risk of all-cause death compared with those who do not have diabetes and are at increased risk of developing coronary artery disease, stroke, cancer, nephropathy, and retinopathy (3,4). The risks of developing macrovascular complications and death increase 13 and 15%, respectively, for every 5 years an individual lives with diabetes (5). Emergency department visits for diabetes-related events totaled 16 million in 2016, with 7.8 million hospitalizations for diabetes-related events (6). Comorbidities associated with type 2 diabetes include cancer, cognitive impairment, nonalcoholic fatty liver disease, hepatitis C, HIV, fractures, obstructive sleep apnea, chronic kidney disease, hypertension, heart failure, peripheral vascular disease, obesity, coronary heart disease, mental illness, depression, and hearing impairment (1,7,8). The burden of comorbid conditions increases with age, and women experience more comorbid conditions than do men (7). Chronic complications of type 2 diabetes include myocardial infarction, stroke, skin infections, neuropathy, amputation, kidney disease/dialysis, and retinopathy (9). Clinical and preclinical complications of diabetes occur in up to 50% of people newly diagnosed with type 2 diabetes and may manifest with a unique set of symptoms (10).
A variety of symptoms may be associated with type 2 diabetes and its complications, although characterizations of type 2 diabetes–related symptoms and symptom clusters are lacking in the literature (11). People with type 2 diabetes experience multiple symptoms, including symptoms of hyperglycemia (such as change in appetite and polyuria), symptoms of hypoglycemia (such as dizziness, fatigue, pain, sleep disturbance, sensory symptoms, cognitive impairments, depression, and anxiety), and changes in vision as a result of their disease or its treatments (12–16). Symptoms of chronic diseases, including type 2 diabetes, may be described in terms of prevalence, severity, and trajectory and often co-occur in groups, known as symptom clusters, which may be representative of a common etiology (17).
Historically, symptom cluster research has been based on symptom data with numerous dimensions (e.g., timing, intensity or severity, distress, location, exacerbating or alleviating factors, and impact) collected using validated patient-reported outcome (PRO) measures (18–20). These prospective studies may be limited to a sample size of dozens to hundreds or thousands, and only frequently occurring symptoms or symptom clusters can be identified. Cluster analysis, factor analysis, principal component analysis, path analysis, structural equation modeling, other multivariate analysis, and, recently, network analysis are often used to characterize symptom patterns and symptom clusters. Symptom clusters have been characterized in some chronic disease conditions, including cancer, chronic kidney disease, cardiovascular disease, and HIV/AIDS (17,21–35), and two of these studies characterized symptom clusters using prospectively captured PRO data from subpopulations of people with type 2 diabetes (36,37). Little is known about the symptom patterns and symptom clusters among the general population of people with type 2 diabetes.
Characterizing symptoms, co-occurring symptoms, and symptom clusters among the general type 2 diabetes population is crucial to determine shared etiology of symptoms, including phenotypic and genotypic characteristics; identify risk factors for symptoms; and develop personalized approaches for symptom management (17,38). The ability to characterize and identify symptom clusters in people with type 2 diabetes will allow for identification of individuals who may be at higher risk for specific symptom experiences and associated poor outcomes so clinicians may target intervention to those most at risk for severe and distressing symptoms, provide symptom management education before and during treatment for diabetes, and prepare early for specific symptom management later in the course of the disease. Understanding the phenotypic profile of symptoms would allow the development and testing of new resource-conserving symptom management approaches to prevent or ameliorate complex symptoms and related outcomes.
In recent years, electronic health record (EHR) data, in conjunction with advanced statistical methods and cutting-edge machine learning algorithms, have been used for disease onset and health care outcome predictions (39–46). EHR data have also been used to predict diabetes diagnosis and complications (47–50). Using large, nationwide EHR databases for disease and health care outcome analysis and predictions has many advantages, including the ability to capture large sample sizes that allow investigators to identify rare clinical events, train more sophisticated and accurate predictive models, and report generalizable and reproducible findings.
To the best of our knowledge, large, nationwide EHR databases have not been used to characterize symptoms and symptom clusters related to chronic disease (51,52). The study of symptom clusters is evolving rapidly with the current focus on methods for collecting symptom data and evaluation of the mechanisms that underlie symptom clusters (18,20,53,54). Although symptom clusters have been successfully identified using prospectively collected PRO symptom data, it is important to explore different data sources for symptom cluster research. In particular, it remains unknown how information in large EHR databases with rich data and extremely large sample sizes can be used to characterize symptoms and symptom clusters in chronic diseases such as type 2 diabetes. Although lacking systematic collection of symptom data with multiple dimensions and high granularity, EHR systems contain rich data, including longitudinal symptom diagnoses with long-term follow-up, diagnosis of comorbidities, medications, procedures, laboratory tests, vital signs, and other clinical events, in addition to basic information such as demographics and personal characteristics. Thus, it is crucial to evaluate whether EHR data can be used to characterize symptoms and symptom clusters for people with chronic conditions such as type 2 diabetes. The purpose of this study was to characterize type 2 diabetes–related symptoms using a large, nationwide EHR database.
Research Design and Methods
Data Source and Identification of Cohort With Type 2 Diabetes
We acquired the Cerner Health Facts, a multiethnic, de-identified, Health Insurance Portability and Accountability Act–compliant, nationwide electronic medical record database for research purposes. The use of the Cerner EHR database for research was approved by the institutional review board of the University of Texas Health Science Center at Houston. The Cerner EHR database contains all health care records from 85 health care systems with 750 health care facilities across the United States from 2000 to 2018 (55), which includes longitudinal visits with detailed records of diagnoses, medications, procedures, and laboratory tests, representing a total of 69 million unique patients across the United States. In total, the database includes 939 million diagnoses coded with International Classification of Diseases, 9th revision (ICD-9) and 10th revision (ICD-10), codes; 674 million medication records; 5.3 billion clinical events; and 4.2 billion laboratory tests.
To identify patients with type 2 diabetes, we used a rule-based phenotyping algorithm called eMERGE, which was developed by investigators at Northwestern University (56–58). To exclude people with type 1 diabetes and other diabetes-related diseases from actual type 2 diabetes cases, eMERGE requires an A1C laboratory test result >6.5%, at least one type 2 diabetes diagnosis code, and prescription of at least one antihyperglycemic medication. We updated the eMERGE criteria by having three clinicians review and update diabetes-related medications and diagnosis codes (ICD-9 and ICD-10) (Supplementary Figure S1). We applied the eMERGE phenotyping algorithm and identified 1,136,301 patients with confirmed type 2 diabetes status from the nationwide Cerner EHR database from 494 hospitals and clinical facilities across nine census divisions of the United States (Supplementary Figure S2).
Identification of Type 2 Diabetes Symptoms Based on Diagnosis Codes in the EHR System
We identified type 2 diabetes–related symptoms based on the diagnosis codes in the Cerner EHR database using a six-step process, as follows. 1) We identified 51,029 unique diagnosis ICD-9/ICD-10 codes from the cohort with type 2 diabetes in the EHR database. 2) All ICD-10 codes were mapped to a total of 15,880 ICD-9 codes belonging to 1,178 disease categories. 3) Three clinical experts with extensive experience in chronic disease and symptom management reviewed the description of these 1,178 disease groups, of which 70 were identified as potentially symptom-related. In addition, we also reviewed 441 ICD-9 codes that were identified as symptoms, signs, and ill-defined conditions (ICD 780–799 code families, as listed in chapter 16 of the ICD-9 manual) (59,60). A total of 1,590 five-digit ICD-9 codes were identified as being potentially symptom-related. 4) The three clinical experts carefully reviewed the description of 1,590 ICD-9 codes, of which 598 five-digit ICD-9 codes were confirmed as symptoms. 5) In addition, through the keyword search to the description of all ICD-9 codes, three clinical experts identified and confirmed an additional 333 symptom-related ICD-9 codes. 6) Finally, the three clinical experts grouped a total of 931 symptom-related ICD-9 codes into 59 symptom categories, which were used for analyses in this study. Supplementary Table S1 lists the 59 symptom categories with corresponding ICD-9 codes.
Data Analysis Methods
Traditional symptom cluster research is often based on data collected prospectively from validated PRO questionnaires, which may allow for collection of multidimensional symptom data, including timing, severity, location, distress, and other influential factors (18). Cluster analysis, factor analysis, principal component analysis, path analysis, structural equation modeling, other multivariate analysis, and, recently, network analysis are then used to characterize the symptom clusters and patterns (18–20,53,61–63). In our EHR data, multiple symptoms were often diagnosed at the same time during a clinic or hospital visit so that we could easily identify co-occurrence of symptoms based on the diagnosis codes at admission or discharge of an encounter from the EHR database. Because of our large sample size, we could reliably characterize the symptoms and symptom clusters by the number of occurrences or by the number of unique patients with the symptoms and symptom clusters (Supplementary Table S2). In particular, we could also characterize the temporal patterns of symptom occurrence since the time of type 2 diabetes diagnosis.
Statistical Methods
The hypergeometric test (Fisher exact test) (64,65) was used to identify significant over-represented and under-represented symptoms and calculate the risk ratios (RRs) for the symptoms in the cohort with type 2 diabetes compared with those in patients without diabetes in the Cerner EHR database (see details in the Supplementary Material). Over-represented and under-represented symptoms were ranked based on their RRs (Supplementary Table S3).
Type 2 diabetes symptom data were then extracted and prepared in a longitudinal format. For each patient, we extracted co-occurring symptoms at the time of admission and discharge from all encounters. We then characterized co-occurring symptoms by time frequency and patient frequency. We also identified co-occurring symptoms using the Fisher exact test and χ2 test (65) (see details in the Supplementary Material).
Methods to Estimate the Temporal Trajectory of Prevalence Rates and Average Occurrences of Type 2 Diabetes– Related Symptoms
We defined the first report of a type 2 diabetes diagnosis, a type 2 diabetes–related medication, or the earliest report of an abnormal type 2 diabetes laboratory test as the estimated time of type 2 diabetes onset (t0). The first occurrence time (Tk) of a symptom was defined as the time of admission or discharge for the encounter if the symptom code was identified as a diagnosis code at admission or discharge.
We calculated the temporal trajectory of prevalence rates for each of the 59 symptoms as follows. For each year (365 days) after the time of type 2 diabetes onset (t0), we identified the number of patients with type 2 diabetes whose follow-up time covered the full 365 days for the particular year. Those with a follow-up time of <1 year (<365 days) were down-weighted by T/365, where T was the duration (days) of the partial follow-up. The total number of patients with type 2 diabetes, as the denominator (denoted by ) in the prevalence rate calculation, was the summation of the total number of full-follow-up patients and partial-follow-up patients for the follow-up year i = 1, 2, . . . , 10 after t0. The total number of patients with type 2 diabetes in whom the k-th symptom (among 59 symptoms) occurred for the first time during the follow-up year i = 1, 2, . . . , 10 was denoted by . Then, the temporal trajectory of the prevalence rate for symptom k for year i = 1, 2, . . . , 10 after type 2 diabetes onset could be calculated as the ratio of .
We also calculated the temporal trajectory of occurrences for each of the 59 symptoms per patient for year i = 1, 2, . . . , 10 after type 2 diabetes onset (t0). Similar to that as described above, we denoted Oki as the total number of occurrences of symptom k at the follow-up year i = 1, 2, . . . , 10 after type 2 diabetes onset. Then, the temporal trajectory of occurrences for symptom k per patient for year i = 1, 2, . . . , 10 could be calculated as Oki/, where was the total number of patients with type 2 diabetes who experienced symptom k at the follow-up year i after type 2 diabetes onset, as described above. Subjects with a follow-up time less than the full year i were down-weighted in the calculation. Notice that, in the calculation of the average occurrences for symptom k per patient for year i, we only considered the patients who experienced at least one occurrence of symptom k during year i.
Results
Patient Characteristics
Demographic and clinical characteristics of the cohort (n = 1,136,301) with type 2 diabetes are presented in Table 1. Patients had a mean age of 61.4 years (SD 14.8 years), 51.2% were female, and 70.0% were White. The average follow-up time was 60 months (5 years), with an average of >30 encounters per patient. Twenty-five percent of patients in our cohort had a follow-up time of >92 months (>7 years). Within our cohort, there were a total of 110,845,535 disease diagnosis (ICD-9/ICD-10) codes, 89,829,676 medications (by generic names), 1,012,646,718 laboratory test results, 233,795,322 clinical events (including vital signs), 11,035,132 procedures, and 54,364 surgical procedures.
Demographics and Clinical Characteristics of the Type 2 Diabetes Cohort (N = 1,136,301)
Characteristic . | Value . |
---|---|
Sex (122 removed for unknown sex) Female Male | 581,276 (51.2) 554,903 (48.8) |
Race Caucasian African American Asian Hispanic Native American Others/unknown race | 795,957 (70.0) 201,051 (17.7) 25,039 (2.2) 16,090 (1.4) 14,138 (1.2) 84,026 (7.4) |
Age, years | 61.4 ± 14.8 |
Major comorbid conditions Kidney disease Heart failure Liver disease Stroke Retinopathy Myocardial infarction Cancer | 365,409 (32.2) 247,463 (21.8) 107,447 (9.5) 122,665 (10.8) 42,393 (3.7) 133,101 (11.7) 231,648 (20.4) |
Characteristic . | Value . |
---|---|
Sex (122 removed for unknown sex) Female Male | 581,276 (51.2) 554,903 (48.8) |
Race Caucasian African American Asian Hispanic Native American Others/unknown race | 795,957 (70.0) 201,051 (17.7) 25,039 (2.2) 16,090 (1.4) 14,138 (1.2) 84,026 (7.4) |
Age, years | 61.4 ± 14.8 |
Major comorbid conditions Kidney disease Heart failure Liver disease Stroke Retinopathy Myocardial infarction Cancer | 365,409 (32.2) 247,463 (21.8) 107,447 (9.5) 122,665 (10.8) 42,393 (3.7) 133,101 (11.7) 231,648 (20.4) |
Data are n (%) except for age, which is mean ± SD.
Type 2 Diabetes–Related Symptoms
Type 2 diabetes–related symptoms were identified and grouped into 59 symptom categories (Supplementary Table S1). Among 1,136,301 identified patients with type 2 diabetes, there were a total of 8,008,276 occurrences of the 59 symptoms, among which 936,364 patients with type 2 diabetes had at least one symptom. No symptoms were reported for 199,937 (17.6%) of the patients in our cohort.
The most frequently reported type 2 diabetes–related symptoms in our EHR database (reported as the number of unique patients with these symptoms) are presented in Table 2 (with a more comprehensive list in Supplementary Table S2), with the top five most frequently reported symptoms being pain, heartburn, shortness of breath, fatigue, and swelling, which occurred among 21–60% of patients with type 2 diabetes. We also observed some rare symptoms, including other gastrointestinal symptoms, urethral discharge, pallor, change in taste or smell, and flushing, which occurred in <1,000 to a few thousand of the 1,136,301 patients with type 2 diabetes (Supplementary Table S3). Some of the symptoms recurred many times. For example, on average, pain, heartburn, shortness of breath, disturbed sleep, skin changes, depression, and anxiety reoccurred five or more times per patient.
Top 20 Most Frequent Type 2 Diabetes Symptoms
Symptom Category . | Unique Patients, n* . | Proportion of Patients With Type 2 Diabetes, % . | Occurrences, n . | Mean Occurrences per Patient, n† . |
---|---|---|---|---|
Pain | 681,453 | 59.97 | 5,852,082 | 8.6 |
Heartburn | 287,649 | 25.31 | 2,068,549 | 7.2 |
Shortness of breath | 272,033 | 23.94 | 1,551,141 | 5.7 |
Fatigue | 263,633 | 23.20 | 746,994 | 2.8 |
Swelling | 243,227 | 21.41 | 938,008 | 3.9 |
Change in bowel patterns | 229,048 | 20.16 | 610,547 | 2.7 |
Disturbed sleep | 205,653 | 18.10 | 1,289,701 | 6.3 |
Skin changes | 199,848 | 17.59 | 1,071,183 | 5.4 |
Depression | 192,015 | 16.90 | 1,456,498 | 7.6 |
Dizziness | 187,476 | 16.50 | 563,754 | 3.0 |
Anxiety | 170,474 | 15.00 | 891,975 | 5.2 |
Nausea/vomiting | 168,059 | 14.79 | 496,848 | 3.0 |
Cough | 158,571 | 13.96 | 601,311 | 3.8 |
Headache | 137,240 | 12.08 | 434,469 | 3.2 |
Feeling confused | 102,742 | 9.04 | 378,187 | 3.7 |
Fever | 93,625 | 8.24 | 181,823 | 1.9 |
Racing heartbeat | 90,220 | 7.94 | 290,642 | 3.2 |
Difficulty swallowing | 70,151 | 6.17 | 163,708 | 2.3 |
Other musculoskeletal symptoms | 68,589 | 6.04 | 169,194 | 2.5 |
Difficulty walking | 63,759 | 5.61 | 218,658 | 3.4 |
Symptom Category . | Unique Patients, n* . | Proportion of Patients With Type 2 Diabetes, % . | Occurrences, n . | Mean Occurrences per Patient, n† . |
---|---|---|---|---|
Pain | 681,453 | 59.97 | 5,852,082 | 8.6 |
Heartburn | 287,649 | 25.31 | 2,068,549 | 7.2 |
Shortness of breath | 272,033 | 23.94 | 1,551,141 | 5.7 |
Fatigue | 263,633 | 23.20 | 746,994 | 2.8 |
Swelling | 243,227 | 21.41 | 938,008 | 3.9 |
Change in bowel patterns | 229,048 | 20.16 | 610,547 | 2.7 |
Disturbed sleep | 205,653 | 18.10 | 1,289,701 | 6.3 |
Skin changes | 199,848 | 17.59 | 1,071,183 | 5.4 |
Depression | 192,015 | 16.90 | 1,456,498 | 7.6 |
Dizziness | 187,476 | 16.50 | 563,754 | 3.0 |
Anxiety | 170,474 | 15.00 | 891,975 | 5.2 |
Nausea/vomiting | 168,059 | 14.79 | 496,848 | 3.0 |
Cough | 158,571 | 13.96 | 601,311 | 3.8 |
Headache | 137,240 | 12.08 | 434,469 | 3.2 |
Feeling confused | 102,742 | 9.04 | 378,187 | 3.7 |
Fever | 93,625 | 8.24 | 181,823 | 1.9 |
Racing heartbeat | 90,220 | 7.94 | 290,642 | 3.2 |
Difficulty swallowing | 70,151 | 6.17 | 163,708 | 2.3 |
Other musculoskeletal symptoms | 68,589 | 6.04 | 169,194 | 2.5 |
Difficulty walking | 63,759 | 5.61 | 218,658 | 3.4 |
Unique patients indicates the number of patients with at least one diagnosis code of the symptom.
Mean occurrences per patient indicates the ratio of occurrences over the unique number of patients.
Because other comorbid conditions may also cause the symptoms reported in our cohort of patients with type 2 diabetes, the most frequent symptoms (reported as the number of patients) may not reflect unique type 2 diabetes–specific symptoms. Table 3 lists the top 20 over-represented type 2 diabetes–related symptoms compared with the general population in the EHR database, identified using a hypergeometric test. The top over-represented type 2 diabetes–specific symptoms included difficulty speaking, feeling confused, trouble remembering, weakness, and drowsiness/sleepiness, which differ from the most frequent symptoms described in Table 2. In fact, some of these over-represented type 2 diabetes–related symptoms (e.g., drowsiness/sleepiness, chills, pallor, and other gastrointestinal symptoms) were rare (occurring in only a few thousand or <1% of the 1,136,301 patients with type 2 diabetes) and therefore may not be easily identified in traditional symptom studies using validated PRO measures with a limited sample size but were reported in the EHR database. However, the likelihoods of these rare symptoms (their RRs) were ∼2.6- to 3.3-fold higher among patients with type 2 diabetes compared with the general population in the EHR database (Table 3). Interestingly, we also observed that the most frequent symptoms—heartburn, fatigue, and shortness of breath—were also among the over-represented type 2 diabetes–specific symptoms, with RRs of 2.7–3.1 compared with those of the general population in the EHR database. We also show the under-represented type 2 diabetes symptoms in Table 3, from which the most frequent type 2 diabetes symptom—pain—was among the most under-represented type 2 diabetes symptoms, with an RR of 0.6 compared with that of the general population in the EHR database. This finding indicates that the likelihood of experiencing pain among patients with type 2 diabetes is 60% lower than that of the general population who were not diagnosed with type 2 diabetes in the EHR database.
Top 20 Over-Represented Type 2 Diabetes Symptoms and Under-Rated Type 2 Diabetes Symptoms, Ranked by RRs
Symptom Category . | Rank by RR . | Frequency in Patients With Type 2 Diabetes, n (Prevalence [%]) . | Frequency in Patients Without Type 2 Diabetes, n (Prevalence [%]) . | RR . | P . |
---|---|---|---|---|---|
Over-represented symptoms | |||||
Difficulty speaking | 1 | 31,158 (2.74) | 243,468 (0.639) | 4.29 | 0 |
Feeling confused | 2 | 102,742 (9.04) | 843,558 (2.214) | 4.08 | 0 |
Trouble remembering | 3 | 22,834 (2.01) | 192,746 (0.506) | 3.97 | 0 |
Weakness | 4 | 49,214 (4.33) | 491,100 (1.289) | 3.36 | 0 |
Drowsiness/sleepiness | 5 | 6,973 (0.61) | 71,245 (0.187) | 3.28 | 0 |
Heartburn | 6 | 287,649 (25.32) | 3,066,940 (8.049) | 3.14 | 0 |
Chills | 7 | 8,612 (0.76) | 92,055 (0.242) | 3.14 | 0 |
Change in vision | 8 | 38,505 (3.39) | 449,179 (1.179) | 2.87 | 0 |
Trouble with coordination | 9 | 14,943 (1.32) | 177,719 (0.466) | 2.82 | 0 |
Feeling thirsty | 10 | 17,571 (1.55) | 210,252 (0.552) | 2.80 | 0 |
Fatigue | 11 | 263,633 (23.20) | 3,226,020 (8.467) | 2.74 | 0 |
Change in appetite | 12 | 17,495 (1.54) | 214,156 (0.562) | 2.74 | 0 |
Shortness of breath | 13 | 272,038 (23.94) | 3,352,931 (8.8) | 2.72 | 0 |
Pallor | 14 | 1,070 (0.09) | 13,272 (0.035) | 2.70 | 0 |
Other respiratory symptoms | 15 | 19,392 (1.71) | 248,513 (0.652) | 2.62 | 0 |
Other gastrointestinal symptoms | 16 | 73 (0.01) | 942 (0.002) | 2.60 | 0 |
Urinary frequency | 17 | 58,367 (5.14) | 773,193 (2.029) | 2.53 | 0 |
Disturbed sleep | 18 | 205,653 (18.10) | 2,732,175 (7.171) | 2.52 | 0 |
Change in weight | 19 | 48,401 (4.26) | 652,075 (1.711) | 2.49 | 0 |
Nosebleeds | 20 | 18,279 (1.61) | 249,690 (0.655) | 2.45 | 0 |
Under-represented symptoms | |||||
Sore throat | 1 | 13,114 (1.15) | 819,741 (2.151) | 0.54 | 0 |
Fever | 2 | 93,625 (8.24) | 5,585,192 (14.658) | 0.56 | 0 |
Pain | 3 | 681,455 (59.97) | 38,513,305 (101.078) | 0.59 | 0 |
Change in menstruation | 4 | 30,014 (2.64) | 1,601,526 (4.203) | 0.63 | 0 |
Change in voice | 5 | 10,160 (0.89) | 500,471 (1.313) | 0.68 | 0 |
Headache | 6 | 137,240 (12.08) | 6,064,562 (15.916) | 0.76 | 0 |
Urethral discharge | 7 | 787 (0.07) | 32,208 (0.085) | 0.82 | 0 |
Symptom Category . | Rank by RR . | Frequency in Patients With Type 2 Diabetes, n (Prevalence [%]) . | Frequency in Patients Without Type 2 Diabetes, n (Prevalence [%]) . | RR . | P . |
---|---|---|---|---|---|
Over-represented symptoms | |||||
Difficulty speaking | 1 | 31,158 (2.74) | 243,468 (0.639) | 4.29 | 0 |
Feeling confused | 2 | 102,742 (9.04) | 843,558 (2.214) | 4.08 | 0 |
Trouble remembering | 3 | 22,834 (2.01) | 192,746 (0.506) | 3.97 | 0 |
Weakness | 4 | 49,214 (4.33) | 491,100 (1.289) | 3.36 | 0 |
Drowsiness/sleepiness | 5 | 6,973 (0.61) | 71,245 (0.187) | 3.28 | 0 |
Heartburn | 6 | 287,649 (25.32) | 3,066,940 (8.049) | 3.14 | 0 |
Chills | 7 | 8,612 (0.76) | 92,055 (0.242) | 3.14 | 0 |
Change in vision | 8 | 38,505 (3.39) | 449,179 (1.179) | 2.87 | 0 |
Trouble with coordination | 9 | 14,943 (1.32) | 177,719 (0.466) | 2.82 | 0 |
Feeling thirsty | 10 | 17,571 (1.55) | 210,252 (0.552) | 2.80 | 0 |
Fatigue | 11 | 263,633 (23.20) | 3,226,020 (8.467) | 2.74 | 0 |
Change in appetite | 12 | 17,495 (1.54) | 214,156 (0.562) | 2.74 | 0 |
Shortness of breath | 13 | 272,038 (23.94) | 3,352,931 (8.8) | 2.72 | 0 |
Pallor | 14 | 1,070 (0.09) | 13,272 (0.035) | 2.70 | 0 |
Other respiratory symptoms | 15 | 19,392 (1.71) | 248,513 (0.652) | 2.62 | 0 |
Other gastrointestinal symptoms | 16 | 73 (0.01) | 942 (0.002) | 2.60 | 0 |
Urinary frequency | 17 | 58,367 (5.14) | 773,193 (2.029) | 2.53 | 0 |
Disturbed sleep | 18 | 205,653 (18.10) | 2,732,175 (7.171) | 2.52 | 0 |
Change in weight | 19 | 48,401 (4.26) | 652,075 (1.711) | 2.49 | 0 |
Nosebleeds | 20 | 18,279 (1.61) | 249,690 (0.655) | 2.45 | 0 |
Under-represented symptoms | |||||
Sore throat | 1 | 13,114 (1.15) | 819,741 (2.151) | 0.54 | 0 |
Fever | 2 | 93,625 (8.24) | 5,585,192 (14.658) | 0.56 | 0 |
Pain | 3 | 681,455 (59.97) | 38,513,305 (101.078) | 0.59 | 0 |
Change in menstruation | 4 | 30,014 (2.64) | 1,601,526 (4.203) | 0.63 | 0 |
Change in voice | 5 | 10,160 (0.89) | 500,471 (1.313) | 0.68 | 0 |
Headache | 6 | 137,240 (12.08) | 6,064,562 (15.916) | 0.76 | 0 |
Urethral discharge | 7 | 787 (0.07) | 32,208 (0.085) | 0.82 | 0 |
Co-Occurring Type 2 Diabetes–Related Symptoms
Co-occurring symptom clusters were defined as two or more related symptoms noted at the same encounter (66). Here, we present the co-occurring symptoms diagnosed at the time of admission or discharge for each encounter for all patients with type 2 diabetes in the EHR system. The size of a symptom cluster is defined as the number of symptoms co-occurring at a given time point.
We identified 19 different sizes of co-occurring symptoms (sizes 2–19 and 27) (Supplementary Table S4). As expected, the smaller sizes of co-occurring symptoms occurred more frequently. For example, sizes 2, 3, and 4 of co-occurring symptoms occurred in 48, 25, and 11% of patients with type 2 diabetes, respectively, while sizes 5 and 6 occurred among 2 and 5% of patients with type 2 diabetes, respectively, and sizes ≥7 occurred in <1% of patients with type 2 diabetes. Some large sizes of co-occurring symptoms only occurred among a few patients with type 2 diabetes. We also noticed that repeated occurrences of co-occurring symptoms were rare, ranging between one and three occurrences per patient. More details on all sizes of co-occurring symptoms are summarized in Supplementary Table S4.
The hypergeometric test was used to identify significant co-occurring symptoms for size 2, and the χ2 test was used for co-occurring symptoms for sizes ≥3 (see details in methods). Many of the co-occurring symptoms were identified as statistically significant because of the extremely large sample size of our study. For example, 631 of 1,516 size 2 co-occurring symptoms were significant (P <0.05 after Bonferroni multiple testing adjustment). We ranked the significant size 2 co-occurring symptoms based on the average of the two RRs for symptom 1 and symptom 2 (indicating the association strength between the two symptoms) and reported the top 20 of them in Table 4. We also reported the top 20 most frequent size 2 co-occurring symptoms in Supplementary Table S5. We observed that the most frequent size 2 co-occurring symptoms were different from the top-ranked co-occurring symptoms by RR (i.e., the most strongly related symptom pairs).
Top 20 Significant Co-Occurring Symptoms for Size 2, Ranked by RRs
Symptom 1* . | Symptom 2* . | Occurrence for All Sizes, n† . | Occurrence for Symptom 1, n‡ . | Occurrence for Symptom 2, n‡ . | RR§§ . | P‖ . |
---|---|---|---|---|---|---|
Other abdominal symptoms | Other gastrointestinal symptoms | 1,768 | 4,171 | 17,318 | 1,596.96 | 0 |
Other head and neck symptoms | Other respiratory symptoms | 7,461 | 12,703 | 38,636 | 1,383.97 | 0 |
Feeling thirsty | Pallor | 116 | 28,627 | 1,716 | 95.17 | 0 |
Change in hair | Other general symptoms | 2,902 | 49,074 | 31,949 | 76.46 | 0 |
Feeling thirsty | Flushing | 121 | 28,627 | 3,177 | 51.97 | 0 |
Change in appetite | Change in weight | 3,169 | 27,346 | 106,325 | 46.23 | 0 |
Other head and neck symptoms | Shortness of breath | 7,785 | 12,703 | 1,551,141 | 36.80 | 0 |
Chills | Feeling thirsty | 301 | 11,258 | 28,627 | 36.06 | 0 |
Difficulty speaking | Weakness | 6,328 | 95,999 | 92,708 | 28.62 | 0 |
Hearing loss | Tinnitus | 2,721 | 120,657 | 34,168 | 26.88 | 0 |
Trouble with coordination | Weakness | 2,416 | 41,362 | 92,708 | 25.10 | 0 |
Change in appetite | Change in taste or smell | 43 | 27,346 | 2,405 | 25.02 | 0 |
Difficulty walking | Weakness | 10,717 | 218,658 | 9,2708 | 22.35 | 0 |
Change in taste or smell | Feeling thirsty | 35 | 2,405 | 28,627 | 19.39 | 0 |
Other musculoskeletal symptoms | Weakness | 6,685 | 169,194 | 92,708 | 17.20 | 0 |
Other respiratory symptoms | Pallor | 29 | 38,636 | 1,716 | 16.72 | 0 |
Difficulty walking | Trouble with coordination | 3,569 | 218,658 | 41,362 | 16.15 | 0 |
Change in sexual interest or activity | Flushing | 7 | 5,310 | 3,177 | 15.64 | 0.000737 |
Chills | Nausea/vomiting | 1,916 | 11,258 | 496,848 | 15.32 | 0 |
Change in voice | Sore throat | 273 | 31,729 | 22,363 | 14.64 | 0 |
Symptom 1* . | Symptom 2* . | Occurrence for All Sizes, n† . | Occurrence for Symptom 1, n‡ . | Occurrence for Symptom 2, n‡ . | RR§§ . | P‖ . |
---|---|---|---|---|---|---|
Other abdominal symptoms | Other gastrointestinal symptoms | 1,768 | 4,171 | 17,318 | 1,596.96 | 0 |
Other head and neck symptoms | Other respiratory symptoms | 7,461 | 12,703 | 38,636 | 1,383.97 | 0 |
Feeling thirsty | Pallor | 116 | 28,627 | 1,716 | 95.17 | 0 |
Change in hair | Other general symptoms | 2,902 | 49,074 | 31,949 | 76.46 | 0 |
Feeling thirsty | Flushing | 121 | 28,627 | 3,177 | 51.97 | 0 |
Change in appetite | Change in weight | 3,169 | 27,346 | 106,325 | 46.23 | 0 |
Other head and neck symptoms | Shortness of breath | 7,785 | 12,703 | 1,551,141 | 36.80 | 0 |
Chills | Feeling thirsty | 301 | 11,258 | 28,627 | 36.06 | 0 |
Difficulty speaking | Weakness | 6,328 | 95,999 | 92,708 | 28.62 | 0 |
Hearing loss | Tinnitus | 2,721 | 120,657 | 34,168 | 26.88 | 0 |
Trouble with coordination | Weakness | 2,416 | 41,362 | 92,708 | 25.10 | 0 |
Change in appetite | Change in taste or smell | 43 | 27,346 | 2,405 | 25.02 | 0 |
Difficulty walking | Weakness | 10,717 | 218,658 | 9,2708 | 22.35 | 0 |
Change in taste or smell | Feeling thirsty | 35 | 2,405 | 28,627 | 19.39 | 0 |
Other musculoskeletal symptoms | Weakness | 6,685 | 169,194 | 92,708 | 17.20 | 0 |
Other respiratory symptoms | Pallor | 29 | 38,636 | 1,716 | 16.72 | 0 |
Difficulty walking | Trouble with coordination | 3,569 | 218,658 | 41,362 | 16.15 | 0 |
Change in sexual interest or activity | Flushing | 7 | 5,310 | 3,177 | 15.64 | 0.000737 |
Chills | Nausea/vomiting | 1,916 | 11,258 | 496,848 | 15.32 | 0 |
Change in voice | Sore throat | 273 | 31,729 | 22,363 | 14.64 | 0 |
Columns 1 and 2 show two co-occurring symptoms.
Number of co-occurrences for the two symptoms, including in those with other symptoms.
Occurrence for symptom 1 and occurrence for symptom 2 are the number of symptom 1 and 2 occurrences, respectively, among all patients with type 2 diabetes.
RR indicates the maximum RR of the two symptoms co-occurring versus one of the symptoms occurring alone.
P value from the hypergeometric test after Bonferroni multiple testing adjustment.
Supplementary Tables S5–S10 report the top 20 most frequent co-occurring symptom clusters of sizes 2–6 ranked by the number of patients. The top five most frequent size 2 co-occurring symptoms (in number of patients) were pain and heartburn, pain and swelling, pain and shortness of breath, pain and nausea/vomiting, and pain and change in bowel patterns, each occurring in 4–5% of total patients with type 2 diabetes.
For size 3 co-occurring symptoms, 2,388 of 11,202 were statistically significant (P <0.05 after multiple testing adjustment) based on the χ2 test (65), and 2,484 of 25,839 were significant for size 4 co-occurring symptoms. We report the most significant size 3 and size 4 co-occurring symptoms in Supplementary Tables S11 and S12 based on the χ2 statistic values.
Temporal Trajectory of Prevalence Rates and Average Occurrences per Patient of Type 2 Diabetes–Related Symptoms
The temporal trajectories of prevalence rates for the top 20 most frequent type 2 diabetes symptoms are shown in Figure 1, where the time (x-axis) is the first occurrence of the symptom after type 2 diabetes onset (t0). We observed that the prevalence rates for all symptoms slowly increased after type 2 diabetes onset. For the most prevalent symptom—pain—the prevalence rate gradually increased from 65 to 70% among patients with type 2 diabetes from year 1 to year 10 after type 2 diabetes diagnosis. The prevalence rates of all other symptoms also showed a gradually increasing trend (Supplementary Figure S3). The temporal trajectories of average occurrences per patient for the top 20 most frequent type 2 diabetes symptoms are shown in Figure 2 from year 1 to year 10 after type 2 diabetes diagnosis. Interestingly, all the symptoms showed a decreasing trend in occurrence average per patient from year 1 to year 10 (Supplementary Figure S4). For example, the average occurrence of pain decreased from eight per patient at year 1 to six per patient at year 10 after type 2 diabetes onset.
Temporal trajectory of prevalence rates for top 20 most frequent symptoms.
Temporal trajectory of average occurrences per patient for top 20 most frequent symptoms.
Temporal trajectory of average occurrences per patient for top 20 most frequent symptoms.
Discussion
We present prevalent symptoms and co-occurring symptoms identified from the medical records of a large cohort of patients with type 2 diabetes using ICD-9/ICD-10 codes. Our cohort of patients with type 2 diabetes was 70% non-Hispanic White, 17.7% non-Hispanic Black, and 1.4% Hispanic. Based on the data from the 2019 U.S. Census Bureau report, this cohort differed from the general population, which was 76.3% non-Hispanic White, 18.5% Hispanic, and 13.4% non-Hispanic Black (67). With respect to sex, our cohort (51.2% female, 48.8% male) mirrored that of the United States (50.8% female, 49.2% male) (68). Under-representation of the Hispanic population among our cohort is noted, and the reasons for this warrant further investigation. Although the average age of onset for type 2 diabetes is 45 years (2), the average age of onset among our EHR diabetes cohort was 61.4 years. Generally, patients with type 2 diabetes are advised to have follow-up visits every 3–6 months (two to four times/year) (2). Among our cohort, the average number of all types of encounters was six per year, suggesting that this population visited a health care provider frequently. The average follow-up time among the patients in our cohort was 5 years, with 25% having a follow-up >7 years. Although 50% of patients may have clinical/preclinical complications at the time of diagnosis, most often complications of diabetes occur after ∼5 years’ duration of the disease (10). The major serious comorbid conditions among our cohort of patients with type 2 diabetes were kidney disease, heart failure, cancer, and myocardial infarction (Table 1). These findings differ somewhat from the findings reported by Nowakowska et al. (7), who reported a higher prevalence of hypertension, depression, kidney disease, and asthma among their cohort of patients with type 2 diabetes, although they reported all prevalent comorbid conditions among patients in their cohort, whereas here we report the major serious comorbid conditions among patients in our cohort (7).
Symptoms related to type 2 diabetes are well documented. The most commonly occurring symptoms at the time of diagnosis are polyuria, polyphagia, polydipsia, fatigue, blurred vision, weight loss, slow-healing wounds, tingling, and pain/numbness of hands and feet (2). As the disease progresses, additional symptoms may present as type 2 diabetes–related complications develop and/or patients may be advised to initiate antihyperglycemic agents, resulting in hypoglycemic episodes. The most common symptoms of hypoglycemia are headache, dizziness, blurred vision, palpitations, sweating, shakiness, anxiety, lightheadedness, chills, clamminess, confusion, hunger, nausea, pallor, weakness, clumsiness, nightmares, irritability/impatience, tingling or numbness of lips/tongue/cheek, and fatigue (2). Among our cohort of patients with type 2 diabetes, the highest percentage of patients reported symptoms of pain, heartburn, shortness of breath, fatigue, and swelling. However, pain, depression, heartburn, and disturbed sleep were documented on average six to nine times per patient. Although pain and fatigue are commonly reported among people with type 2 diabetes in the general population, heartburn, shortness of breath, and swelling have not been well documented (69). On the other hand, we found that gastrointestinal symptoms were rarely reported, which differs from previous reports (37,70). We also found several over-represented symptoms (difficulty speaking, feeling confused, trouble remembering, weakness, and drowsiness/sleepiness) among patients with type 2 diabetes. These symptoms have all been shown to occur as a result of hypoglycemia, which may occur as a result of treatment with antihyperglycemic agents (2). Our cohort of patients were four times more likely than the general population in the EHR system to report difficulty speaking, feeling confused, and having trouble remembering.
We examined co-occurring symptoms in groups of two and three. For a cluster of two co-occurring symptoms, the most common co-occurring symptoms were pain and heartburn, pain and swelling, and pain and shortness of breath. Although these are not known to be established symptom clusters, pain, heartburn, swelling, and shortness of breath are symptoms that can occur with heart disease (34). This fact is noteworthy in that people with type 2 diabetes develop heart disease at younger ages and are twice as likely to die of heart disease or stroke as their counterparts without diabetes (2). The reporting of these symptoms could be indicative of underlying or established heart disease. On the other hand, the well-established co-occurring symptoms of depression and anxiety among people with diabetes was the most frequently documented symptom pairing among our cohort. Patients in our cohort were 4.5 times more likely to report co-occurring symptoms of anxiety and depression. For the three-symptom groups, change in bowel patterns, nausea/vomiting, and pain; disturbed sleep, heartburn, and pain; and anxiety, depression, and pain were the most commonly occurring. A review by Miaskowski et al. (53) of 158 symptom cluster articles revealed that the most common symptom clusters were 1) depression, pain, fatigue, and sleep disturbance; 2) nausea and vomiting; and 3) depression and anxiety. Our findings suggest the potential for unique symptom clusters among patients with type 2 diabetes.
The temporal trajectory of prevalence rates of type 2 diabetes symptoms slowly increased after type 2 diabetes onset (Figure 1). However, the temporal trajectory of average reoccurrence of symptoms per patient identified in our dataset showed a decreasing trend (Figure 2). Much like the trends in type 2 diabetes–related complications reported by Gregg et al. (71), this decline in the temporal trajectory of symptoms may be the result of improved care, increased health promotion efforts, and patient education regarding disease management. It may also be the result of a decrease in reporting of symptoms over time.
Limitations
Using the EHR data and symptoms identified through ICD-9/ICD-10 codes among a large cohort of patients with type 2 diabetes, we report prevalent, over-represented, under-represented, and co-occurring symptoms. We recognize the limitations of EHR data for scientific research. In particular, EHR data are collected to support clinical practice, documentation, and billing and not for research analyses. Patients in commercial EHR databases are usually unhealthier and older compared with the general population. Uninsured people are under-represented in commercial EHR databases. EHR data are also “noisy,” with high uncertainty compared with data from well-designed observational and clinical studies. Although EHR systems may not capture all symptoms and other clinical events, they do likely capture the most important symptoms and clinical events that require clinical attention, particularly those related to patient care charges. Although we applied a rule-based phenotyping approach (56–58) to identify patients with type 2 diabetes from the EHR database, we may not have identified all major type 2 diabetes–related complications and comorbid conditions using the rigorous phenotyping approaches. Finally, only data on symptom occurrence were available in our dataset, as symptom severity and distress information were not collected in the structured EHR database. Although such information may have been buried in unstructured clinical notes, the retrieval of these data were beyond the scope of this project. We expect that the symptoms and symptom cluster patterns derived from the large EHR database can be further confirmed by future studies using PRO measures.
We have characterized the type 2 diabetes–related symptoms and symptom clusters, as well as their temporal trajectories of prevalence and average reoccurrence, using a large EHR database. It is also important to identify the risk factors associated with these symptoms and symptom patterns. These potential risk factors may include demographic factors, comorbidities, medications, clinical events, laboratory tests, and procedures, which are also available from the EHR database. This is an ongoing follow-up project, from which the results will be reported in a separate article in the near future.
V.B., M.W., and X.W. contributed equally to this study.
Article Information
Funding
This project is partially supported by the Center for Big Data in Health Sciences (CBD-HS) at the School of Public Health, University of Texas Health Science Center at Houston.
Duality of Interest
No potential conflicts of interest relevant to this article were reported.
Author Contributions
V.B., M.W., and X.W. contributed equally and should be considered co-first authors. V.B. and M.W. proposed and designed the scientific questions, performed diagnosis code review for symptom clustering, and drafted the manuscript. X.W. performed the main data analyses and drafted the manuscript. V.K.L. and G.Z. performed the initial data extraction, data cleaning, and phenotyping. D.A. provided clinical insights and interpretation of the results. H.W. conceived, initiated, and directed this research project and drafted the manuscript. All coauthors read and approved the manuscript. H.W. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Supplementary Material
This article contains supplementary material online at https://doi.org/10.2337/figshare.17180819.