Using the newly created University of California (UC) Health Data Warehouse, we present the first study to analyze antihyperglycemic treatment utilization across the five large UC academic health systems (Davis, Irvine, Los Angeles, San Diego, and San Francisco).
This retrospective analysis used deidentified electronic health records (EHRs; 2014–2019) including 97,231 patients with type 2 diabetes from 1,003 UC-affiliated clinical settings. Significant differences between health systems and individual providers were identified using binomial probabilities with cohort matching.
Our analysis reveals statistically different treatment utilization patterns not only between health systems but also among individual providers within health systems. We identified 21 differences among health systems and 29 differences among individual providers within these health systems, with respect to treatment intensifications within existing guidelines on top of either metformin monotherapy or dual therapy with metformin and a sulfonylurea. Next, we identified variation for medications within the same class (e.g., glipizide vs. glyburide among sulfonylureas), with 33 differences among health systems and 86 among individual providers. Finally, we identified 2 health systems and 55 individual providers who more frequently used medications with known cardioprotective benefits for patients with high cardiovascular disease risk, but also 1 health system and 8 providers who prescribed such medications less frequently for these patients.
Our study used cohort-matching techniques to highlight real-world variation in care between health systems and individual providers. This demonstrates the power of EHRs to quantify differences in treatment utilization, a necessary step toward standardizing precision care for large populations.
Many existing guidelines for treating type 2 diabetes (1) involve reducing known risk factors, avoiding drugs that can aggravate insulin resistance or dyslipidemia, and considering factors such as age, life expectancy, comorbidities, and insurance (2). These guidelines offer flexibility in therapeutic choice at many steps. However, without guideline implementation and tools for decision support, clinicians often make complex pharmacologic treatment decisions in isolation, leading to heterogeneity in practice patterns. While several studies have analyzed type 2 diabetes prescribing pattern diversity (3–8), most focused on monotherapies or on specific drugs. Moreover, few studies attempted to correct for factors that may influence medication choice, such as cardiovascular risk, comorbidities, or disease severity.
With the rise of electronic health records (EHRs), an opportunity exists for using real-world data to understand and quantify existing heterogeneity in practice. However, utility of EHR data is often limited by lack of expertise or institutional interest in harmonizing and analyzing billions of data points from disparate sources.
In this study, we analyzed type 2 diabetes treatments across the University of California (UC) using the UC Health Data Warehouse (UCHDW), including five UC health systems: Davis, Irvine, Los Angeles, San Diego, and San Francisco. The UCHDW is suitable for this analysis, comprising real-world EHR data spanning multiple large university-affiliated health systems and ambulatory care clinics.
Rather than focusing on monotherapy or a small selection of drugs, our study analyzes treatment utilization for patients with established type 2 diabetes intensifying an existing treatment, as this is the step at which current guidelines enable much of the pharmaceutical variation. Our study presents a framework to help correct for potential reasons underlying a given medication choice by using three cohort-matching techniques to compare patients with similar HbA1c, Framingham Cardiovascular Risk Score (FCRS), or propensity score. This framework enables not only comparisons between health systems, but also between providers within these institutions. Our findings demonstrate the power of EHRs to automatically identify statistically different patterns of treatment utilization, a necessary step toward optimizing or standardizing precision care for large populations being actively managed. We expect this framework to be useful in examining prescribing patterns for any health system, for other clinical decisions made in the context of diabetes, and for other diseases in the future.
Research Design and Methods
Extracting Patients With Type 2 Diabetes From the EHR
Structured data elements were extracted for 5,374,136 patients from the UCHDW spanning 5 health systems encompassing 1,003 UC-affiliated inpatient and outpatient clinical settings. Encounter, medication, vital signs, and laboratory tests were available from June 2014 to June 2019. These elements were deidentified prior to receipt by the authors, and no clinical notes or images were used in this study, per the institutional review guidelines (Institutional Review Board #19-28263).
To build cohorts of patients being treated for type 2 diabetes, three filters were applied (Fig. 1). First, we required patients to have hyperglycemia or prediabetes at least once in their record by including patients with HbA1c >5.7% or an ICD-10 code for type 2 diabetes (E11*) or prediabetes (R73.03), resulting in 531,258 patients. This initial search identified patients who may be receiving treatment for hyperglycemia. Second, we removed patients with ICD-10 codes for type 1 diabetes or pregnancy any time (E10 and O00-O9A). Finally, to identify only patients being actively managed, we required that the outpatient medical history include at least one UC-initiated biguanide, sulfonylurea, dipeptidyl peptidase 4 inhibitor (DPP-4i), glucagon-like peptide 1 receptor agonist (GLP-1RA), sodium–glucose cotransporter 2 inhibitor (SGLT-2i), thiazolidinedione (TZD), meglitinide, or α-glucosidase inhibitor (AGi), resulting in 97,231 patients with 1,259,189 antihyperglycemic prescription records (243,179 of these prescriptions being UC-initiated).
Within the larger landscape of UC-wide type 2 diabetes treatments, we identified subcohorts of patients intensifying treatment to analyze treatment utilization. First, we defined two “out-class” cohorts representing a choice between two different medication classes (e.g., DPP-4i or SGLT-2i) (Supplementary Table 1). The first out-class cohort included patients intensifying treatment “post-metformin monotherapy” (either intensifying metformin monotherapy to dual therapy or to insulin) for the first time. The second included patients for whom existing dual therapy with metformin and sulfonylurea was intensified either by addition of another agent (triple therapy) or insulin for the first time (“post-metformin and sulfonylurea dual therapy”).
Next, we defined six “in-class” cohorts representing a choice among medications within a given class (e.g., sitagliptin or linagliptin among DPP-4is) (Supplementary Table 2). In this study, the six cohorts included patients being prescribed sulfonylurea, DPP-4i, GLP-1RA, SGLT-2i, meglitinide, or AGi for the first time. We did not analyze in-class TZDs since there was only one dominant option (pioglitazone). In-class analysis for sulfonylureas was limited to glimepiride, glipizide, and glyburide, as other agents were too infrequently used.
Finally, to study utilization of cardioprotective medications, we included only patients with an FCRS (9) in the top quintile of each health system (Supplementary Table 3). Among the treatment decisions made for these individuals, we studied the frequency of choosing canagliflozin (10), dapagliflozin (11), empagliflozin (12), semaglutide (12), or liraglutide (13), which have known cardioprotective effects. For this subcohort, only the most recent medication change for each patient was used in order to reflect a current population of UC patients being actively managed who could benefit today from cardioprotective effects (i.e., patients who could be targeted for interventions).
For all subcohorts, we required that the patients had BMI, blood pressure, HbA1c, and plasma lipid (LDL-cholesterol, HDL-cholesterol, and triglycerides) measurements prior to treatment and that the prescribed treatment was not canceled or replaced for at least 30 days.
Significance Testing and Cohort Matching
Throughout this study, we compared frequency of treatment utilization not only between health systems to each other, but also, separately, individual providers to each other. For each health system or provider, frequency of out-class or in-class treatment utilization was compared with all other health systems or providers using binomial probabilities in R v3.4.1. False discovery rate was used to control for multiple hypothesis testing using a false discovery rate threshold of 0.05. In order to be statistically significant, we further required binomial significance in three separate tests using three cohorts “matched” by HbA1c, FCRS, or propensity score. These three matched cohorts were created by selecting the closest patient by each metric (HbA1c, FCRS, or propensity score) for each patient, resulting in cohorts of similar patients of equal size.
Matching cohorts by HbA1c compares patients with similar disease severity, while matching by FCRS compares patients with similar cardiovascular risk. We also performed matching using a propensity score, obtained by training a logistic regression model to distinguish between patients in the cohort of interest versus patients not in that cohort. In addition to HbA1c, the propensity scoring model included age, systolic blood pressure, sex/gender, BMI, plasma lipid levels (LDL-cholesterol, HDL-cholesterol, and triglycerides), smoking status, and ICD-10–based diagnoses made on or before the day of treatment, including chronic kidney disease (I12*, D63.1*, N17*, N18*, N19*, and E11.21*), chronic obstructive pulmonary disease (J44*), depression (F33* and F32*), disorder of lipoprotein or lipidemia (E78*), gastroesophageal reflux disease (K21.9), insomnia or sleep apnea (G47.00 and G47.33), osteoporosis (M80* and M81*), rheumatic disease (M05*, M06*, and M79.0), cardiovascular disease (I2*, I3*, I4*, I5*, I6*, I7*, I8*, and I9*), and, separately, heart failure (I50*), heart attack (I21*, I22*, and I23.0), heart block (I44*), peripheral artery disease (I73*), stroke (I63), arrhythmia (I48*, I49*, R00.1, and R94.31), cardiomyopathy (I42.9), and myocarditis (I51.4).
Data and Resource Availability
Our study uses EHR data, which were deidentified prior to receipt by the authors. Accessing the deidentified EHR data is only possible through institutional review by UC.
Diversity of Type 2 Diabetes Treatment Trajectories Across Five UC Health Systems
Among 5,374,136 patients in the UCHDW (Fig. 1), we found 97,231 with type 2 diabetes being prescribed type 2 diabetes treatments by 8,518 providers at any of the five large UC Academic Health Systems, for which the names have been masked: UC-A (12,914), UC-B (24,454), UC-C (32,298), UC-D (11,026), and UC-E (16,539).
Tracking medication changes for these 97,231 patients over time revealed an exceptionally diverse array of treatment trajectories. In Fig. 2, a visualization inspired by previous work (4), the first four medication changes for each patient are depicted as a series of circular rings, with each radius representing treatments for a single patient and each ring representing a treatment change. Colors that stay the same between rings represent changes in dosage, but not in medication class, whereas white rings represent no change.
Even tracking only the first four medication changes for 97,231 patients, we identified 12,134 unique treatment trajectories. Remarkably, 8,988 of these trajectories were unique to only one patient each. When analyzing the most expensive options for dual therapies, we found 894 instances in which patients started dual therapy with a GLP-1RA, SGLT-2i, or DPP-4i and who were later switched to a different, cheaper dual therapy, which may be indicative of a more cost-effective treatment pathway.
Out-Class Treatment Patterns Post-Metformin Monotherapy
To analyze treatment utilization, we created two “out-class” cohorts to compare treatment utilization between different medication classes (e.g., DPP-4i or SGLT-2i; see 2research design and methods). For the first out-class cohort, we first identified 8,449 patients who are intensifying treatment after being treated with metformin monotherapy (referred to as “post-metformin monotherapy”). Unexpectedly, we found that each of the UC health systems was quite different with respect to prescribing patterns post-metformin monotherapy, even when patients were matched by HbA1c, FCRS, or propensity score (Fig. 3A and enumerated in Supplementary Table 4) (see 2research design and methods). Assessing the heat maps and dendrograms post-metformin monotherapy revealed that UC-C was the most unique, preferring DPP-4i, GLP-1RA, meglitinide, or SGLT-2i compared with other health systems (odds ratios: 1.53, 2.72, 3.04, and 4.78, respectively) while using significantly less insulin (odds ratio: 0.61). Notably, post-metformin monotherapy meglitinide utilization was the most variable among health systems, with UC-A, UC-C, UC-D, and UC-E each being significantly different from one another in this regard (odds ratios: 0.21, 3.04, 2.39, and 0.17, respectively).
Initial binomial probabilities revealed significant differences across each of the UC health systems with respect to post-metformin sulfonylurea usage (Supplementary Table 4). However, cohort matching nullified these differences, since binomial probabilities were no longer significant when comparisons occurred in the context of HbA1c-matched (UC-A, UC-B, and UC-D) or propensity-matched (UC-E) cohorts. This indicates that disease severity and comorbidities play major roles in decision-making regarding sulfonylurea usage and that cohort-matching techniques can be useful for identifying and removing these effects.
Separately, treatment utilization was also examined among individual providers. Comparing individual health providers making at least three post-metformin monotherapy treatment decisions, we identified 23 with significantly different prescribing patterns: 4 at UC-A, 2 at UC-B, 13 at UC-C, 2 at UC-D, and 2 at UC-E (Supplementary Fig. 1 and Supplementary Table 5). This indicates that significant differences in treatment utilization exist among individual providers even after correcting for differences in HbA1c, FCRS, or propensity score.
Overall, when intensifying metformin monotherapy to dual therapy or insulin, we identified 15 significant differences among health systems and, separately, 23 among individual providers (Fig. 4A).
Out-Class Treatment Utilization Post-Metformin and Sulfonylurea Dual Therapy
For the second out-class cohort, 3,364 patients underwent treatment intensification beyond metformin and sulfonylurea dual therapy (referred to as post-metformin and sulfonylurea dual therapy) (see 2research design and methods and Supplementary Table 6). We found several significant differences among health systems (Fig. 3B and Supplementary Table 6), with GLP-1RA prescription frequency as triple therapy on top of metformin and sulfonylurea being significantly different at all five UC health systems (odds ratios: 0.21 for UC-A, 0.24 for UC-B, 1.60 for UC-C, 2.29 for UC-D, and 1.79 for UC-E). In the heatmap dendrogram, UC-A and UC-C were the most different from each other in several ways. UC-A used insulin (odds ratio: 1.84) or triple therapy including TZD (odds ratio: 1.98) more so than UC-C, while UC-C more frequently prescribed triple therapy including GLP-1RA, meglitinide, or SGLT-2i (odds ratios: 1.60, 4.34, and 4.10, respectively) than UC-A.
Comparing individual providers who intensified the treatment of at least three individual patients receiving dual therapy with metformin and a sulfonylurea, we found six who were statistically different: one at UC-A, two at UC-C, one at UC-D, and two at UC-E (Supplementary Fig. 2 and Supplementary Table 7).
In total, we identified 16 significant differences among health systems and 6 among individual providers, with respect to their treatment choice when adding either insulin or a third agent to intensify existing dual therapy with metformin and a sulfonylurea (Fig. 4B).
In-Class Treatment Utilization
Our tracking also revealed a striking number of differences in in-class treatment utilization, including 10,584 for sulfonylureas, 7,709 for DPP-4is, 3,932 for GLP-1RAs, 3,461 for SGLT2is, 1,515 for meglitinides, and 361 for AGi prescribed for the first time in any context.
Among sulfonylureas, UC-A prescribed more glipizide (odds ratio: 1.24) and less glimepiride and glyburide (odds ratios: 0.53 and 0.46, respectively) compared with other health systems (Supplementary Fig. 3A and Supplementary Table 8). Conversely, UC-C prescribed more glimepiride and less glipizide than other sites (odds ratios: 1.77 and 0.89, respectively). Among individual providers prescribing sulfonylureas (Supplementary Fig. 3B and Supplementary Table 9), we identified 6 significantly different providers at UC-A, 8 at UC-B, 20 at UC-C, 2 at UC-D, and 9 at UC-E.
Examining DPP-4i usage across each site (Supplementary Fig. 3C and Supplementary Table 10) showed that providers at UC-A prescribed comparatively more alogliptin and linagliptin and less saxagliptin or sitagliptin (odds ratios: 17.28, 2.32, 0.26, and 0.78, respectively), whereas patients at UC-B and UC-D received significantly more sitagliptin (odds ratios: 1.18 and 1.09, respectively) and less alogliptin (odds ratios: 0.08 and 0.07, respectively) or linagliptin (odds ratios: 0.01 and 0.51, respectively). For individual providers, we identified 15 providers with unique prescribing patterns using DPP-4i: 11 at UC-A, 2 at UC-C, and 2 at UC-E (Supplementary Fig. 3D and Supplementary Table 11).
Analyzing first usage of SGLT-2i (Supplementary Fig. 3E and Supplementary Table 12) showed that UC-A used more empagliflozin (odds ratio: 1.15), whereas UC-D used less (odds ratio: 0.66). Individual provider analysis (Supplementary Fig. 3F and Supplementary Table 13) revealed nine providers with significant differences among SGLT-2is (six at UC-C, two at UC-D, and one at UC-E). We could not assess individual provider treatment utilization for SGLT2is at UC-B, since only 1 provider out of 60 had prescribed an SGLTi2i at least 3 times, and this provider prescribed different SGLT-2i at similar rates.
The heat map dendrogram for GLP-1RA (Supplementary Fig. 3G and Supplementary Table 14) indicated that UC-C was the most unique compared with the other health systems, likely because UC-C was the only site to use more dulaglutide and lixisenatide (odds ratios: 1.92 and 6.83, respectively) and less albiglutide, exenatide, and liraglutide (odds ratios: 0.21, 0.33, and 0.82, respectively). UC-D and UC-E used less dulaglutide (odds ratios: 0.79 and 0.50, respectively), and both UC-B and UC-E used more exenatide (odds ratios: 4.70 and 1.50, respectively). Interestingly, liraglutide was used significantly more at UC-A and UC-D, but it was never used at UC-B (odds ratios: 1.64, 1.56, and <0.01, respectively). For individual providers (Supplementary Fig. 3H and Supplementary Table 15), we identified 13 providers with significantly different treatment utilization for GLP-1RAs: 1 at UC-A, 1 at UC-B, 3 at UC-C, 2 at UC-D, and 6 at UC-E.
For meglitinide (Supplementary Fig. 3I and Supplementary Table 16), UC-B and UC-E were the most different, with UC-B using significantly less nateglinide and more repaglinide (odds ratios: 0.52 and 1.19, respectively), and UC-E conversely using more nateglinide and less repaglinide (odds ratios: 2.13 and 0.59, respectively). For individual providers (Supplementary Fig. 3J and Supplementary Table 17), only four had significantly different meglitinide utilization patterns, and they were at UC-C.
Thus, for in-class patterns of treatment utilization overall, we identified 33 significant differences among health systems and 86 among individual providers (Fig. 4C).
Utilization Patterns for Cardioprotective Diabetes Agents in High-Risk Patients
When analyzing the utilization of cardioprotective diabetes drugs for patients in the highest quintile of FCRS at each site, we identified 3 sites and 63 individual providers with significantly different utilization patterns (Fig. 3C and Supplementary Table 20). Specifically, UC-A and UC-C used more cardioprotective medications for such patients while UC-B used fewer (odds ratios: 1.29, 1.91, and 0.01, respectively). Although UC-D also appeared to use fewer cardioprotective medications for their high-risk patients, propensity matching negated the significance of this observation, indicating that differences in the patient population may be the cause of the differences for UC-D. In addition, our method revealed that the relative use of cardioprotective medications for these high-risk patients was increased by 55 individual providers and decreased by 8 providers (Fig. 4D and Supplementary Table 21).
Although EHRs are becoming more ubiquitous, lack of expertise or institutional interest often prevents their consolidation of data across health systems, hindering analyses of differences among health systems or providers. However, the newly created UCHDW is ideal for this analysis, being a central repository with real-world medical records from >5.3 million patients across five major UC health systems: Davis, Irvin, Los Angeles, San Diego, and San Francisco. In this study, we present the first use of the UCHDW to analyze type 2 diabetes treatment utilization patterns within and across these health systems. In this work, we define an important statistical framework using binomial probabilities and cohort-matching techniques to find differences in treatment utilization while helping to correct for factors that may impact treatment decisions, such as cardiovascular risk, comorbidities, and disease severity. This novel framework can statistically identify different utilization patterns automatically in any EHR, which is an essential first step to standardizing or optimizing clinical practices within established guidelines at large institutions.
The results in this study highlight and quantify sharp differences in medication utilization patterns among the five UC health systems we studied, a finding we believe will extend to the vast majority of health systems and could be used to enable more standardized care. This finding is likely due directly to the exceptional diversity of prescribing patterns evident among individual providers, even, at times, within a single health system.
Our findings do not suggest that a highly diverse approach to the pharmacologic management of type 2 diabetes is associated with inadequate care. On the contrary, diverse practice patterns may represent decisions by experienced providers, including highly trained diabetes specialists, designed to improve type 2 diabetes care for individual patients. For example, any given treatment decision might be the result of specific expertise in that medication or the result of cutting-edge treatment methodology by endocrinologists or diabetologists. Such decisions are also informed by the provider’s direct knowledge of the individual being treated. Despite this, the ability to rigorously analyze medication usage patterns across the UC Health landscape provides an opportunity to offer a data-driven approach to guideline adherence, which may foster a more organized approach to providing effective, simplified, cost-conscious, or otherwise optimized medication treatment trajectories that we believe would easily translate to other health systems.
There are important limitations in our work. The prevalence of individuals with diabetes and prediabetes in the UCDHW (9.9%) is lower than population expectations, since the American Diabetes Association estimates 34.5% of the U.S. to have at least prediabetes (14). However, such differences are expected since our population consists of patients seeing a variety of specialties and sometimes only for brief consultations, so their diabetes status might not be coded or available only via clinical notes. Indeed, we find a similar discrepancy in prevalence in a separate academic medical system, with a 5.2% prevalence of patients with type 2 diabetes (SNOMED-CT code 44054006) at Columbia University. The difference between EHR-based cohorts and population-wide cohorts is important, and our study helps enumerate this fact. Despite differences in population prevalence, our results are not affected since our study focused on defining the capacity of the EHR to follow the prescribing trajectories at the level of specific subsets of patients, specific health systems or medical centers, and even specific providers. Additionally, our deidentified EHR currently does not include information about health insurance or payer, which are key components in type 2 diabetes treatment decision-making and should be included in future work. Nonetheless, this study identifies several distinct prescribing patterns that could not have resulted simply from differences in insurance coverage or other external factors. For example, among sulfonylureas, all providers used glipizide, but some had an obvious preference for glyburide versus glimepiride. Thus, while it is important to consider individual providers’ reasoning for preferring one drug to another, this study represents an important step forward in the ability to identify significant differences in treatment utilization, which could be used to optimize and standardize care strategies in the future.
Beyond the context of treatment intensification, insights obtained using our novel statistical framework hold promise to organize, standardize, and thus improve institution-wide care management for type 2 diabetes and other diseases in which there exists a wide diversity of pharmacologic treatment choices and guidelines. As EHRs spread in use and data availability, these types of analyses will become more powerful and useful to clinicians, providers, and payers. In the future, we expect this EHR-based framework will be used to aid in the elucidation of prescribing patterns and the standardization or optimization of care via the identification of statistical differences for a variety of medical decisions.
This article contains supplementary material online at https://doi.org/10.2337/figshare.13549142.
Funding. Some of the authors were partially supported by National Institute of General Medical Sciences grant R01 GM079719 (to T.A.P. and A.J.B.), the National Institute of Diabetes and Digestive and Kidney Diseases (P30 DK098722 and R01 DK112304 to S.K.K.), the National Heart, Lung, and Blood Institute (K23 HL136899 to V.F.), and the Leon Lowenstein Foundation (T.A.P. and A.J.B.). Methods used to deidentify EHR data were supported in part by funding from the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1 TR001872.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Duality of Interest. S.K.K. is a consultant and equity holder for Suggestic, Inc. and Yes Health Inc. A.J.B. is a cofounder and consultant to Personalis and NuMedii, Inc.; has served as consultant to Samsung, Mango Tree Corporation, and, in the recent past, 10x Genomics, Helix Biopharma, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehrman Group, AlphaSights, Covance, Novartis, Genentech, Merck, and Roche; is a shareholder in Personalis and NuMedii, Inc.; is a minor shareholder in Apple, Facebook, Google, Microsoft, Sarepta Therapeutics, Regeneron Pharmaceuticals, Moderna, AstraZeneca, 10x Genomics, Amazon, Biogen Inc., CVS Pharmacy, Illumina, Snap Medical Industries, Nuna Health, Assay Depot, Vet24seven, Inc., and Sutro Biopharma, Inc., and several other mutual funds and non–health-related companies and mutual funds; has received honoraria and travel reimbursement for invited talks from Genentech, Takeda Pharmaceutical Company, Varian Medical Systems, Roche, Pfizer, Merck, Eli Lilly and Company, Mars Therapeutics, Siemens, Optum, Abbott Laboratories, Celgene, AstraZeneca, AbbVie Inc., Johnson & Johnson, Westat, and many academic institutions, medical or disease-specific foundations and associations, and health systems; receives royalty payments through Stanford University for several patents and other disclosures licensed to NuMedii, Inc. and Personalis, Northrup Grumman (as the prime on a National Institutes of Health contract), Genentech, Johnson & Johnson, L’Oreal, and Progenity. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. All authors were responsible for the development of the methodology and the manuscript. A.P. was responsible for obtaining the consolidated EHR records across institutions. T.A.P. performed the analysis. T.A.P. is the guarantor of this work and, as such, had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.