The past decade of population research for diabetes has seen a dramatic proliferation of the use of real-world data (RWD) and real-world evidence (RWE) generation from non-research settings, including both health and non-health sources, to influence decisions related to optimal diabetes care. A common attribute of these new data is that they were not collected for research purposes yet have the potential to enrich the information around the characteristics of individuals, risk factors, interventions, and health effects. This has expanded the role of subdisciplines like comparative effectiveness research and precision medicine, new quasi-experimental study designs, new research platforms like distributed data networks, and new analytic approaches for clinical prediction of prognosis or treatment response. The result of these developments is a greater potential to progress diabetes treatment and prevention through the increasing range of populations, interventions, outcomes, and settings that can be efficiently examined. However, this proliferation also carries an increased threat of bias and misleading findings. The level of evidence that may be derived from RWD is ultimately a function of the data quality and the rigorous application of study design and analysis. This report reviews the current landscape and applications of RWD in clinical effectiveness and population health research for diabetes and summarizes opportunities and best practices in the conduct, reporting, and dissemination of RWD to optimize its value and limit its drawbacks.
Introduction
Background: The Rationale and Landscape for Real-World Data and Evidence Generation
A multidecade evolution of epidemiologic and intervention effectiveness studies for diabetes has led to diverse options in prevention and care that have increased life spans, reduced complications, and improved quality of life for people with diabetes (1–4). Despite these advances, diabetes remains an enormous global burden. Adults with diabetes have 10 times the risk of amputation, 5 times the risk of renal failure, and double the risk of cardiovascular disease and premature death, and they account for most cases of preventable blindness (3). These risks will likely be magnified by stagnation in delivery of high-quality care and increases in early-onset type 2 diabetes with a more aggressive and less responsive course of disease (5–7). Reducing the preventable burden and economic cost of diabetes and its complications will depend on new efforts across diverse research disciplines to better prioritize patients, populations, interventions, and policies.
Alongside these challenges, there has been a proliferation of systems for accumulating real-world data (RWD) and using it to identify real-world evidence (RWE) from non-research settings, including both health-related and non-health–related sources. The resulting collection and analysis of big data are increasing the options for evidence-based guidance of management and prevention (8,9). These new data often are not collected for research purposes yet have the potential to enrich the body of health-related information regarding the characteristics of individuals, risk factors, interventions, and health effects. Accordingly, the role of subdisciplines in handling large collections of data has expanded. These areas and methods of inquiry include comparative effectiveness research (CER) and precision medicine, new quasi-experimental study designs, new research platforms like distributed data networks, and new analytic approaches for clinical prediction of prognosis or treatment response (10–14). These developments have the potential to improve diabetes prevention and treatment by increasing the range of populations, settings, interventions, outcomes, and clinical settings that can efficiently be examined. However, these changes also carry an increased threat of bias and misleading findings. The level of evidence that may be derived from RWD is ultimately a function of the quality of data and the rigor of study design and analysis.
In this report, we review the current landscape and applications of RWD in clinical effectiveness and population health research for diabetes. We then summarize opportunities and best practices in the conduct, reporting, and dissemination of information derived from RWD to optimize its value and limit its drawbacks.
Applications of RWD: Purposes, Opportunities, and Needs
Progress in diabetes care and prevention has benefited from diverse population-based scientific disciplines that are being transformed by RWD. These disciplines include clinical effectiveness research to prioritize and individualize interventions, health services research to shape treatment delivery and models of care, and health policy research to guide population-wide approaches to promote health. They also include observational epidemiological studies to quantify risk factors and individual risk as well as surveillance to monitor and target subpopulations for health interventions (Table 1). To date, these disciplines have relied upon a traditional set of study designs and approaches for data collection, with randomized controlled trials (RCTs) serving as the gold standard to prioritize treatments, health services, and policies; cohort studies for risk factor identification or risk prediction; and regular surveys for monitoring trends. These disciplines are already benefiting from RWD, but all require a new evidence assessment paradigm (15).
Discipline/application . | Core purpose . | Dominant design . | New data sources being integrated . | Dominant RWE designs . |
---|---|---|---|---|
Clinical epidemiology | Test treatments | RCT | Primary care visits | Parallel groups |
Hospital claims | ||||
Health services research | Test health services | Cluster RCT | Laboratory data | Differences-in-difference |
Pharmacy data | ||||
Health policy research | Test health policies | Cluster RCT | Health monitoring devices | Interrupted time series |
Behavioral dietary data | ||||
Observational epidemiology | Identify and prioritize risk factors | Cohort | Physical activity data | Dynamic electronic cohorts |
Social media | ||||
Population surveillance | Monitor population health | Population survey | Geographic information | Electronic registries |
Product registries | ||||
Implementation research | Determine reach, adoption, implementation, and sustainability | Pragmatic RCT | Disease registries | Diverse quasi-experiments |
Discipline/application . | Core purpose . | Dominant design . | New data sources being integrated . | Dominant RWE designs . |
---|---|---|---|---|
Clinical epidemiology | Test treatments | RCT | Primary care visits | Parallel groups |
Hospital claims | ||||
Health services research | Test health services | Cluster RCT | Laboratory data | Differences-in-difference |
Pharmacy data | ||||
Health policy research | Test health policies | Cluster RCT | Health monitoring devices | Interrupted time series |
Behavioral dietary data | ||||
Observational epidemiology | Identify and prioritize risk factors | Cohort | Physical activity data | Dynamic electronic cohorts |
Social media | ||||
Population surveillance | Monitor population health | Population survey | Geographic information | Electronic registries |
Product registries | ||||
Implementation research | Determine reach, adoption, implementation, and sustainability | Pragmatic RCT | Disease registries | Diverse quasi-experiments |
The pathway to evidence generation and synthesis has traditionally taken a linear path, starting with a research question and study design and followed by deliberate data collection, analysis, inference, and translation (15). RWE assessment may also start with a research question but expands the scope of research objectives due to the often broader range of variables included than in studies with direct research measurements. Both the addressability of a research question and the specifics of study design depend upon the kind of data that can be collected and the limitations imposed by the real-world settings in which the data are collected. Intensive analytic approaches are often required to identify comparable conditions or patient groups as well as in the inference phase to determine effectiveness or identify risk groups. These parallel and diverse inputs of data, design, and analytic elements can require complex decisions in the research process for which guidelines are lacking. Ideally, RWD should be deployed using newer and more suitable methods in all aspects of the research process.
This evolution of data collection and analysis can complement traditional approaches in at least four fundamental ways. The extended applications of RWD work come about through variation in design, population characteristics, intervention, and outcomes (Fig. 1). For example, it first seeks to understand the effectiveness of evidence-based practices as they are modified for delivery outside the research setting, whether by changes in delivery mode, intensity, or diverse settings, including clinical settings, organizational settings, and communities. Second, it extends evaluation to population subgroups underrepresented in trials, including people at lower and higher risk, at younger and older ages, and at different levels of socioeconomic status. Third, it aims to assess the effectiveness of interventions and policies for which conventional experimentation or randomization approaches are either impractical, costly, or unethical to execute. Fourth, it permits examination of either beneficial or adverse outcomes that are beyond the practical scope of traditional survey, trial, cohort, or other measurement methods. In the sections that follow, we elaborate on these scenarios in relation to contemporary efforts to prevent and control diabetes and its morbidity.
Clinical Research and CER to Improve Diabetes Care
Studies of the effectiveness of clinical interventions, particularly pharmacological therapies, have garnered the most attention in RWD studies (8). RCTs are widely recognized as the gold standard for determining treatment efficacy and establishing recommendations for clinical guidelines. However, the tendency to study effects under ideal circumstances of health care practice and patient behavior, often without representative populations, results in findings that are not always equivalent or generalizable to what occurs in practice.
Several shortcomings can limit the utility of RCTs. They are typically costly, take a long time to plan and execute, expose participants to experimentation, and are usually designed to answer a narrow hypothesis. They include a highly selected population managed in tightly controlled settings. This has often led to an exclusion of key populations, such as higher-risk elderly people with multiple comorbidities and polypharmacy, even though these populations are frequent candidates for treatment in primary care (16,17). For example, many of the trials of new agents and continuous glucose monitoring have excluded populations at high risk of hypoglycemia (18). The PROactive (Prospective Pioglitazone Clinical Trial in Macrovascular Events), ADVANCE (Action in Diabetes and Vascular Disease: Preterax and Diamicron MR Controlled Evaluation), and EMPA-REG OUTCOME [BI 10773 (Empagliflozin) Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patients] trials included samples with clinical characteristics that would have represented only 3.5%, 15.7%, and 35%, respectively, of the real-world population, thus findings may not be generalizable to the wider population (18,19). Similarly, RCTs, with protocols intentionally standardizing treatment dosage and timing and incentivizing adherence (or including patients likely to be adherent), may not reflect management in a broader range of settings. They also may not address questions that arise in daily practice with respect to relevant comparator agents, concurrent illness and medications, and other common aspects of care. Collectively, these factors may lead to a “voltage drop” or “scale-up penalty” wherein the effectiveness and risks in the real world may be quite different from those of trial settings (20,21). Furthermore, RCTs often lack the statistical power or duration to examine effects on less common outcomes or adverse drug events (ADEs) or to consider benefits and risks beyond the periods of time assessed in the trials.
These limitations result in large gaps in knowledge and call for RWE as a tool to complement evidence from RCTs. Current health care systems produce a large amount of longitudinal patient-level electronic data that can be used to assess exposures of interest and associated health outcomes in clinical practice. Data sources include (but are not limited to) insurance claims with information based on diagnostic and procedure codes and pharmacy records, electronic health records (EHR) with substantial clinical details, registries with information on patients with a particular disease or treatment type, and vital records with information on the date and cause of death. These RWD can be linked to create large databases that can be used for clinical research and CER to improve care of patients with diabetes.
Most published uses of RWD have compared the effectiveness of pharmacological interventions by relying on data electronically generated by the health care system to emulate target RCTs (22). For example, recent cardiovascular outcome trials of sodium–glucose cotransporter 2 (SGLT2) inhibitors have demonstrated substantial reduction in cardiorenal outcomes in patients with type 2 diabetes (23–27). These findings have been confirmed and extended by noninterventional studies based on RWD through the inclusion of unrestricted populations as treated in actual clinical settings. These studies have resulted in effectiveness estimates for previously unstudied populations in the context of clinically varying treatments, dosages, comorbidities, and dilutions of effect that can occur due to suboptimal adherence. In this setting, RWD may offer more useful evidence for health care policymakers and delivery system administrators when weighing the tradeoffs (i.e., the business case) for implementation of a treatment strategy (28,29). In another example, comparative effectiveness analyses based on community-acquired data have provided support for the broader relevance of the Cardiovascular Outcome Study of Linagliptin Versus Glimepiride in Patients With Type 2 Diabetes (CAROLINA) study, a randomized trial that showed similar efficacy and cardiovascular safety of a dipeptidyl peptidase 4 inhibitor and a sulfonylurea (30,31). In other cases, as in the long-term benefits versus risks of insulin therapy, both randomized trials and analyses based on RWD have faced similar difficulties. These include the need to collect data over long periods and to account for residual confounding by factors related to the longer exposure to chronic hyperglycemia that is characteristic of people who require insulin. In such cases there clearly is further work to do using the newer tools for RWE generation.
Thanks to the very large populations represented by RWD collected during the provision of routine care, RWE has also been increasingly used by regulatory agencies to assess the safety of medical products, particularly with respect to rare ADEs that RCTs are not powered to examine. As an example of the importance of RWE, diabetic ketoacidosis among users of SGLT2 inhibitors was identified as a safety concern postmarketing by RWD studies (32,33). Finally, RWD studies can provide important insights into the natural history of diabetes, as occurs in studies that assess the association of glucose and outcomes in older adults (34), risk factors for hypoglycemia, and the association of severe hypoglycemia with cardiovascular events and mortality. These studies have extended our understanding of the risks associated with hypoglycemia beyond what was found in RCTs. As an example, the Hypoglycemia Assessment Tool (HAT) study showed rates of overall, nocturnal, and severe hypoglycemia that were much higher than those seen in RCTs. Whether the association between hypoglycemia and risk in unselected populations is a causal one or instead involves other factors, such as concurrent illness and frailty, this information may allow early identification of individuals at risk and requires focused attention. In general, studies have also shown similarities in cardiovascular risk factors for people with type 1 and type 2 diabetes (34–37).
RWE and Regulatory Decision-Making
Regulatory agencies have traditionally relied on RWE to assess the safety of regulated medications (38). Although real-world databases of health care information are increasingly available, only a few countries in North America and Europe are actively involved in drug surveillance (39). The U.S. Food and Drug Administration (FDA) Sentinel Initiative, launched in 2008 in the U.S., created a national system of large electronic health care databases that can be used for rapid assessment of drug safety signals (40,41). Similarly, the Network for Observational Drug Effect Studies in Canada and the Research Network of Pharmacovigilance and Pharmacoepidemiology in Europe have been developed over the past 15 years. In the U.K., spontaneous reporting of suspected adverse reactions by health care professionals (since 1964) and by patients (since 2005) has been available through the Yellow Card Scheme through the Medicines and Health Care Products Regulatory Agency (42). RWE safety studies can respond to postmarketing or postauthorization requirements that arise as part of the risk management strategy at the time of medication approval or as part of a rapid regulatory response to a new safety signal that arises at any time after marketing. Nevertheless, some postmarketing safety assessments still rely on RCTs. In 2008, the FDA issued guidance for the pharmaceutical industry that mandated trials to demonstrate the cardiovascular safety of newly approved medications for type 2 diabetes (43), which has resulted in more than 20 complete or ongoing cardiovascular outcome trials, with some demonstrating unexpected cardiovascular and renal benefits (44). Based on these findings, several subsequent RCTs have been initiated on the most promising medications with the secondary aim of demonstrating efficacy in preventing cardiorenal events in order to receive supplemental approval for that indication (26,27). A key reason that regulators are comfortable with RWD to answer questions on medication safety is that because most ADEs are not anticipated by the prescriber, there is thought to be little selection associated with risk of these events at the time of prescribing (45).
More recently, regulatory agencies have further reconsidered the value of RWE to support effectiveness claims for drug approval decisions, label expansion, coverage decisions, clinical guideline writing, and prescribing decisions in clinical practice (45,46). Although RCTs generally provide the strongest level of evidence of efficacy, there are circumstances under which a nonrandomized study may be acceptable. To improve transparency, prespecification, and reproducibility of RWE, professional societies have agreed on the study parameters that need to be disclosed in detail to make an RWE study reproducible and fully reviewable (46). More recently, a consortium of representatives from academia, regulatory agencies, and industry have prepared a structured template that helps investigators implement and report on RWE studies (47). An integral component of these efforts is the preregistration of RWE studies on specialized sites (e.g., encepp.eu) or generic study registration sites (e.g., ClinicalTrials.gov) before an RWE study is undertaken.
Applications of RWD to Research of Primary Prevention of Diabetes
There is a need for population-based evidence to advance the primary prevention of type 2 diabetes because of the efficacy-to-effectiveness gap with population-level interventions. RCTs have shown that both lifestyle and pharmacological interventions can reduce the risk of progression to type 2 diabetes in high-risk populations with a high magnitude of effect (48–50). These studies were followed by implementation trials showing that lifestyle change interventions by nutritionists, exercise specialists, or specifically trained community health workers (or allied health professionals) result in about 3–4% weight loss over 1–2 years, about two-thirds of the effect size seen in the primary RCTs and applied across diverse communities and health care settings (51).
Despite the large number of primary prevention trials demonstrating efficacy, translational research and implementation studies of their effectiveness in actual practice or public programs are relatively few. Nevertheless, RWE is important because when these primary prevention programs are implemented, they often have diverged or extended beyond the characteristics of the original trials in terms of setting (e.g., community fitness or gathering centers, churches, and virtual/online approaches) and target population (e.g., risk level and age). These differences in approaches could result in a different magnitude of effect size or affect different outcomes in ways that depend upon the characteristics of the participants.
The few available RWE studies have generally examined the impact of large-scale employer-based or health system–based policies to support diabetes prevention. Ackermann et al. (52) examined the effectiveness and costs of a program that screened and referred adults with prediabetes from work settings to a YMCA-delivered structured counseling–based lifestyle program. The study matched participants based on propensity scores to identify a comparable reference group, and it found a modest screening yield, with 29% achieving the weight loss goal and neutral effects on health care costs. Moin et al. (53) used a similar design to compare members of a health plan with an insurance benefit offering free lifestyle counseling for prediabetes with a group lacking such benefits. The study compared 563 employer groups receiving the benefit to 554 propensity score–matched control groups and found a significant but modest (8%) reduction in absolute incidence of diabetes.
The lack of data that limits interpretation of studies in this area stems from several factors. First, the partnerships between health systems and community programs that now provide services have not yet resulted in broadly accessible or usable data for research and have rarely been used to report on long-term health outcomes. Further, health system data sets for population research generally have lacked information on health behaviors or referrals and uptake of behavioral or community-based interventions. Where such data exist in EHR, the amount of missing data and data quality raise concerns about bias. The rapid expansion of data from wearable devices and personal phone applications expands the potential for assessment of free-living physical activity and dietary behavior. However, the integration of such data into both epidemiologic and intervention effectiveness data is limited by a lack of validation, data access and security challenges, burdensome data processing, and a lack of linkage with administrative health outcomes data.
The scale-up of national prevention programs in the U.S. and U.K. provides new opportunities and an imperative to understand and quantify the gap between trial and real-world effectiveness (54–57). In the English diabetes prevention program, of the over 300,000 people referred, 53% attended the initial assessment, 36% attended at least one group education session, and 19% attended more than 60% of the sessions (56). The evaluation showed that there was a small reduction in HbA1c (0.12%) and a mean weight loss of 3.3 kg for those who attended >60% of sessions. To date, most studies of the effectiveness of these translation studies have been limited to effects on risk factors and have not assessed either long-term sustainability or effects on diabetes incidence. However, a recent analysis of the National Health Service Diabetes Prevention Programme used a difference-in-differences design with a staggered rollout to compare practices and found that Diabetes Prevention Programme practice populations experienced a 7% reduced incidence relative to expected trends (58). Such analyses have rarely examined the effect of pharmacological-based practice for diabetes prevention except to show that metformin prescription for diabetes prevention is rare (59).
Policies around testing and screening for diabetes and prediabetes are closely linked to prevention policy, and they have been controversial because of the dearth of clinical trials testing effectiveness, and, where trials exist, benefits on long-term health outcomes have been modest at best. The value and benefit of screening further depend on the setting, context, and health system capacity. To date, the most influential studies of screening effectiveness have involved long-term follow-up studies from RCTs of screen-and-treat intervention studies. Observational studies have also contributed information on the sensitivity, specificity, and positive predictive value of screening criteria, risk scores, and diagnostic thresholds as well as yield and attrition (60,61). However, as screening is frequently a population-wide policy, often carrying political pressure, it is an important potential area for natural experimental studies (61,62). Albu et al. quantified the impact of installation of a health system–wide EHR and decision support system on testing, screening, and yield (62), while others have used health systems data to examine the implications of current recommendations on yield and accuracy. To date, however, observational studies have not been used for direct assessment of the health impact of screening (59,63).
Applications of Natural Experiments and Population-Wide Interventions
Determining the effectiveness of health policies or population-wide interventions is one of the frontiers in population research for diabetes and is considered crucial to reducing incidence and improving care (64–66). In contrast to clinical or behavioral interventions applied to individuals, a population-targeted approach implements a policy, new structure, or new guidelines uniformly across population segments as a collective unit. This makes for challenging assessment of effectiveness, because the most appropriate control condition is not a specific individual but rather other population segments not receiving the policy or intervention.
There are ongoing debates surrounding policy-level and population-level interventions that lack effectiveness evidence (66–68). For example, U.S. health care reform has attempted to improve insurance access and mandate selected preventive services. Clinical associations recommend diagnostic or screening policies and guidelines and recommendations to improve the delivery and organization of care. Further, public policy and food policy, such as taxation of sugared beverages, community-targeted incentives for healthier foods or supportive programs, or approaches targeting social determinants of health, could alter the course of diabetes risk and outcomes (69). However, it is usually impractical to apply experimental, randomized designs to such large-scale interventions, leaving most policy-level action either untested or underevaluated. The proliferation of administrative health data sets as well as non-health data, such as geographic, marketing, or social networking data, is creating new possibilities for evaluation and fueling interest in natural experiments to study the effectiveness of policies on whole populations.
Natural experimental studies are designed to assess effects of naturally occurring events, interventions, or policies on health-related outcomes and have the advantage of assessing effectiveness on larger, less selected populations than can be assessed using controlled experimental methods. They capitalize on unplanned variation in exposure to determine the impact of the interventions or policies under study. However, natural experiments have many challenges in common with comparative effectiveness. The typical lack of randomization in natural experiments brings similar threats due to confounding, selection, and measurement biases and can make causal inference more challenging. Thus, designs need to account for these biases, background trends in the outcome in a comparable population, and cointerventions. These studies often use analytic features to improve the strength of causal inference, including multilevel matching of comparable controls, a difference-in-difference framework or time series with multiple preintervention and postintervention data points, concurrent subgroups and inclusion of both intermediate and long-term (and theoretically unrelated) outcomes as additional perspectives on hypothesized outcomes.
Several natural experiments have been conducted for diabetes that used different forms of RWD (70,71). The NEXT-D (Natural Experiments in Translation for Diabetes) consortium has used aggregated health system data to assess the effect of changing benefit designs, health care coverage, and health care reform on diabetes care and outcomes. A second stream of natural experimental research assessed the effects of changes in the built environment, food deserts, crime, and community planning on behaviors and health (70,72,73). A third form of natural experiments studies the effect of changes in health care benefits, government policies, and legislative mandates for reimbursement (e.g., value-based insurance designs and changes in government policy regarding reimbursement for testing supplies for self-monitoring) (74–76). A fourth form of natural experiments has studied the impact of external events, ranging from economic crises to disruption of health care systems due to events like the ongoing coronavirus disease 2019 (COVID-19) epidemic (77). These studies typically assess large multicomponent health systems data sets with extensive individual-level data and use multilevel modeling to identify comparable controls.
Progress and Needs in Population-Level Surveillance and Monitoring
Surveillance programs can involve structured, ongoing data collection to identify trends and differences between subgroups and regions, allowing generation of new hypotheses and analyses. Surveillance estimates of risk factors, burden (such as incidence and prevalence), complications, and death among people with diabetes inform public health efforts and policy, resource allocation, and clinical decision-making (67). Surveillance also has the important function of monitoring levels of appropriate and inappropriate care, social disparities in care access, and adverse events. Although traditionally based on surveys and registries, surveillance can now leverage disparate sources of RWE, including diagnoses from claims and/or EHR in well-defined health care delivery system databases and diabetes registries (e.g., Veterans Administration and Kaiser Permanente) (78) or pooling projects that combine RWD from multiple health systems (e.g., SUPREME-DM [Surveillance, Prevention, and Management of Diabetes Mellitus]) (79,80). Other important sources of RWD include claims data from U.S. federal insurance programs (e.g., Medicare and Medicaid), the U.S. Renal Data System (USRDS), and commercial health insurance programs (e.g., Optum Clinformatics). These data, collected during the routine provision of care to large populations, can provide passive surveillance estimates that offer a valuable complement to traditional active surveillance using probability-based samples, such as the U.S. Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System (BRFSS), National Health Interview Survey (NHIS), National Hospital Discharge Survey (NHDS), and the National Health and Nutrition Examination Survey (NHANES) (67,81).
Because electronic health care data, such as EHR and health insurance, are reflective of real-world encounters, they allow identification of outcomes of interest as the numerator and the at-risk population as the denominator. For example, these data can facilitate the identification of diabetes type using clinical characteristics (e.g., clinical diagnoses, age of onset, and early initiation of insulin monotherapy) or biomarkers (e.g., autoantibodies or C-peptide) and fill important surveillance gaps not addressed by traditional sources. Furthermore, health insurance claims data (e.g., Medicare) contain payments issued for each encounter, which will be important in assessing health care utilization and costs alongside disease surveillance. As ADE estimates (e.g., rate of diabetic ketoacidosis among SGLT2-treated type 2 diabetes patients) need to be linked to cohorts receiving a given pharmacotherapy, they benefit from cohorts constructed from longitudinal health care databases with well-characterized inpatient, outpatient, and pharmacy dispensing records. Recently, the concerns about certain glucose-lowering therapies in the context of COVID-19 have been evaluated using national population-level data (82). Use of census-based indicators of socioeconomic status (e.g., percent living below the poverty level) as well as indices that combine several of these contextual factors, e.g., neighborhood deprivation index (83) have supported investigation regarding social disparities in health when individual markers of socioeconomic status are not available.
Often, the ideal data set is a linkage between data from health systems (i.e., EHR and claims data) and one or more additional sources. For example, end-stage renal disease can be identified by the USRDS. Death records are available from the National Vital Statistics System under the National Center for Health Statistics. Linkage currently requires the agreement of the health care systems that hold the required data and the approval and oversight of an institutional review board that judges the research to be important and not achievable without a waiver of individual patient informed consent. Major complications, such as myocardial infarction, stroke, and amputation, based on diagnoses or procedures are deposited in hospital discharge surveys such as the National Inpatient Sample (NIS) and the National Hospital Ambulatory Medical Care Survey (NHAMC). Hypoglycemia is an example wherein RWD-based administrative claims data and EHR have filled important surveillance gaps not addressed by traditional sources. Data from Medicare fee-for-service has shown that hospitalizations for severe hypoglycemia surpassed those for acute hyperglycemia in 2011, with particular risk of severe hypoglycemia among older patients and substantially elevated risk in African American individuals (47,84).
Outcome surveillance based on RWD, however, requires a deep understanding of the data source at hand and its challenges with respect to outcome measurement. For example, surveillance based on electronic health care information works well for outcomes that always require care, particularly in emergent settings such as the emergency department (ED) or the hospital (e.g., acute myocardial infarction or diabetic ketoacidosis), but may not work as well for outcomes that may not always come to medical attention, such as those being managed by pharmacists. For example, surveillance of severe hypoglycemia based only on ED visits or hospitalizations was estimated to miss about 95% of hypoglycemia episodes because they were cared for outside of the health care system (e.g., by a spouse) (83) or treated by ambulance personnel but not transported to the ED (85). Furthermore, surveillance of less severe outcomes is likely underestimated among people with reduced access to care. Changes in diagnostic coding (e.g., the switch to ICD-10-CM coding), diagnostic criteria (e.g., inclusion of HbA1c >6.5% for diabetes diagnosis), and screening recommendations (e.g., who and how often to screen) will also impact outcome ascertainment when algorithms rely on diagnostic codes. Finally, despite the expansion of the range of RWD sources for diabetes surveillance, the sources remain limited in their ability to assess behaviors and patient-centered outcomes and in their ability to assess public health outcomes.
Whereas much of the strength in surveillance of U.S. data lies in publicly supported surveys, much of the precedence and proof of concept for national health systems–based registries stems from early work in Scotland, the U.K., and Scandinavia (86–90). The presence of single-payer systems in these and other countries has facilitated comprehensive, linked registries encompassing primary care encounters and laboratory, pharmacy, hospitalization, and mortality data. These data systems are regularly used to track trends in risk factors, care, and outcomes while also providing a basis for studies of effectiveness of interventions. This progress stands in stark contrast, however, to the status of low- and middle-income countries, as well as of high-income countries that lack single-payer systems, where national or population-level health systems data are rare.
Challenges in the Use of Evidence Based on RWD
Challenges in Inference of Effectiveness
Since RWD generally are not collected for research purposes, their use for evidence generation presents challenges. As an initial step, implementing a correctly framed question may be more difficult in the context of an RWD study compared with studies with primary data collection, because the population characteristics and measurements are limited by the available data sources and are less under the control of the investigator. For example, to study a causal effect, the following features will need to be measurable in an RWD source: population inclusion and exclusion criteria to characterize the target population and its generalizability, exposure status, outcome, and key confounders and potential effect modifiers (91). Often, RWD sources will also need to ensure a clear temporality in capturing these elements to ensure longitudinal assessment. The choice of study design presents an initial challenge that depends on the study question, data structure, and availability of data. For comparative effectiveness studies and health services research, the goal is often to emulate a target trial such as an RCT (8,22,92). In this scenario, key design and analytic decisions rest in selecting the appropriate analytic denominator (e.g., prevalent or incident cases; new or prevalent users), identification of appropriate comparison groups (choosing among comparators and sampling, matching, and weighting approaches), and specification of exposures, outcomes, and basic temporality. For example, for comparative effectiveness studies, the new-user, active-comparator cohort design, which focuses on new users of alternative treatments with similar indications of use, is a frequent choice. This study design is often superior to other designs and is easier to explain, as it is analogous to an RCT but without baseline randomization (8). Similar design options often apply to health policy research. With the active and persistent cooperation of health systems, cluster randomized trials or stepped-wedge designs are powerful alternatives to individual randomization and may avoid many of the pitfalls of individual RCTs. In other cases, interrupted time series with or without comparison groups is often the strongest design available.
In the absence of randomization, differences in risk and health status affect the tendency to receive treatment (e.g., confounding by indication or severity or by structural features of the health care system). In addition to study design strategies, analytic approaches such as propensity score adjustment, inverse probability weighting, or negative outcome controls are used to improve causal inference and emulate trials with observational data (8,22,93,94). The most commonly used of these for matching, stratification, and weighting is a propensity score (i.e., the estimated probability of an exposure versus a comparator, conditional on measured covariates) or a multivariable summary score that can balance large numbers of potential confounders and indirectly their proxies, even when the outcome is rare. Propensity score mimics the randomization process based on observed data with the goal of creating exchangeability and thus emulates the target trial paradigm (22). Once a propensity score is estimated, typical strategies to reduce confounding include adjustment via matching, fine stratification, or weighting by propensity score. Recent advances in statistical methods for observational CER allow researchers to introduce an almost complete balance in observed covariates, as expected with a randomized exposure, and thus emulate a target randomized pragmatic trial. In some cases, a large RWD analysis can achieve a better covariate balance across treatment arms than small RCTs being emulated, although residual confounding in nonrandomized studies cannot be completely eliminated. In this setting, the use of robustness checks can help predict the likely reliability of the findings from RWD analyses (8,31,95,96). Despite the progress in analytic methods to emulate trials, the strength of these methods still relies upon and cannot overcome limitations in the measurement of potential confounders (8,22,93).
Challenges in Applications for Population Monitoring
When selecting RWD for epidemiologic and surveillance studies, the goals are accuracy, completeness, and population representativeness. Compared with cohort studies based on primary data collection, RWD-derived cohorts often have favorable representativeness and strong ability to assess care, control, and outcomes. As such studies often have large numbers of participants, they permit the estimation of condition-to-condition transition incidence in ways that other studies cannot. However, they are frequently hampered by missing data, nonstandardized exposures, and lack of linked behavioral, genetic, or patient-centered outcomes. Single-payer health systems, such as those in Scandinavia and the U.K., tend to have an advantage in constructing representative population-based data sets relative to other countries. RWD are still heavily dominated by diagnostic and service data related to processing of payments and utilization and generally lack patient-reported data on behavior, risk, function, and quality of life that could dramatically improve the utility of RWD. Despite the broad literature on standardized metrics of patient-reported outcomes and quality-of-life measures, their use has tended to lie in traditional survey and trial formats rather than in routinely collected, real-world health care (97,98).
Several important individual-level social determinants of health (e.g., income, education, social care, and voluntary sector interventions) can strongly impact surveillance outcomes. These may not be routinely collected in RWD sources, although contextual variables collected at the neighborhood level offer a viable alternative, albeit with notable shortcomings due to within-neighborhood heterogeneity. As a result, there are large gaps in understanding the levels of, drivers of, and progress in achieving health equity. Data harmonization across different sources can be challenging due to differences in definitions and missing or inconsistent data collection. For example, some settings merge race and ethnicity while others follow the census protocol for identifying race and ethnicity as separate variables. Validated instruments are needed to facilitate collection of patient-reported outcomes in health care settings, which would facilitate future surveillance efforts (99,100). Other emerging RWD sources are on the horizon. For example, some health care settings are initiating remote glucose monitoring programs and cloud-based continuous glucose monitoring data electronically downloaded to the EHR; this facilitates new models of care and oversight as well as research opportunities.
Recommendations for Standardized Reporting
Final challenges lie in the reporting of RWD to ensure transparency and credibility. The ISPOR and the International Society for Pharmacoepidemiology created a task force to make recommendations regarding good practices that would boost confidence in RWE (46,91,93,101,102). In particular, they aimed to tackle concerns of biased RWE, such as data mining that results in overexamination of a data set with unplanned statistical testing, with a tendency for only positive results to be published, resulting in misleading conclusions (93). In the report the authors address the planning, implementation, and dissemination of hypotheses evaluating treatment effectiveness (HETE) in RWD studies, particularly those examining pharmaceutical or biologic therapies. Key recommendations of this report include the following:
A priori, declare that the study is an HETE (or an exploratory study) that intends to test a particular hypothesis in a specific population
Post study protocol and analysis plans on a public study registration site prior to conducting the analysis
Perform full and complete reporting of HETE studies (including initial hypothesis and registration of the study protocol) in medical journal or public website
Permit replication of HETE studies (i.e., allow other researchers to reproduce findings using the same data and approach)
Perform HETE studies on a different data source and population, if possible
Acknowledge and address any methodological criticisms
Include key stakeholders in designing, conducting, and disseminating the research.
In a companion article, the taskforce outlined the specific parameters that should be reported to increase reproducibility and permit assessment of validity (103). If these proposals are implemented, there will be increased confidence and use of RWE for decision-making in health care.
Conclusions
RCTs demonstrate necessary causal evidence of efficacy; however, they are typically costly, take a long time to be conducted, expose participants to experimentation, and are usually designed to answer a narrow hypothesis in highly selected populations. Thus, they cannot fully ensure that, once launched, the intervention will be effective and safe in practice for a wider population. RWE can help provide additional complementary evidence of effectiveness and safety. For a chronic nontransmissible disease, such as diabetes, RWE derived from analysis of more representative RWD is becoming increasingly relevant and important, with its role having expanded in the space of a few short years. It is commonly used in clinical effectiveness research, complementing RCT results, with the major advantages being low cost and greater external validity. RWD are now considered crucial sources of evidence for regulatory purposes, influence clinical decision-making and guidelines, and even contribute to broadening clinical indications for drugs. Furthermore, RWD have been used in primary prevention research, health economics, and health services and population-level research, including natural experiments and long-term surveillance and monitoring.
However, there continue to be concerns around real-world studies, in particular the risk of confounding, poor data quality, and inconsistent methodology affecting the reliability of findings. Standardizing the structure, registration, and reporting of such research is essential to improve transparency and reproducibility.
Numerous other challenges exist beyond the scope of this report, including how to facilitate easier data linkage and broader data access across more diverse settings without compromising security, as well as a need for analytical methods to evolve to keep up with the difficulties of newer RWD sources (e.g., patient apps). Nevertheless, RWE has the potential to revolutionize patient care in diabetes. Such an ambition is only possible if acceptance and credibility of this type of research can be achieved.
A.T.M. is currently affiliated with the Office of Physician-Scientist Development, Duke University School of Medicine, Durham, NC.
This article is featured in a podcast available at diabetesjournals.org/care/pages/diabetes_care_on_air.
Article Information
Funding and Duality of Interest. This article was originally conceptualized in conjunction with a workshop, “Use of Real-World Data to Improve the Prevention and Care of Diabetes-Related Outcomes,” sponsored by the American Diabetes Association via support from Sanofi, 16–18 November 2018, in Washington, DC. However, the article does not represent proceedings of the symposium and contains additional perspectives and studies that developed after the symposium. E.W.G. is supported by Science Foundation Ireland, the U.K. National Institute of Health Research (NIHR), and the U.K. Royal Society. K.K. is supported by the NIHR Applied Research Collaboration East Midlands and the NIHR Leicester Biomedical Research Centre. E.P. was supported by career development grant K08AG055670 from the National Institute on Aging and research grants from the Patient-Centered Outcomes Research Institute (DB-2020C2-20326) and the U.S. Food and Drug Administration (5U01FD007213). K.K. also reports other financial or nonfinancial interests as director of the Leicester Real World Evidence Unit, University of Leicester. E.P. is an investigator on a research grant to the Brigham and Women’s Hospital from Boehringer Ingelheim that is not related to the topic of this work. M.W. is supported by a program grant from the U.K. Medical Research Council to the Medical Research Council Epidemiology Unit, Universtiy of Cambridge [MC/UU/00006/7], and project grants from the U.K. Biology & Biotechnology Research Council and the U.K. Economic & Social Research Council and the U.K. National Institute of Health & Care Research. A.J.K. is supported by National Institutes of Health (R01-AG063391, P30-DK092924, and R56 AG074986). Effort for W.T.C. for this work was initiated and conducted while employed by the American Diabetes Association. K.K. reports receiving consultancy fees, speaker fees, or grants from AstraZeneca, Novartis, Novo Nordisk, Sanofi, Lilly, Merck Sharpe & Dohme, Boehringer Ingelheim, Bayer, Abbott, Amgen, Napp Pharmaceuticals, Oramed Pharmaceuticals, and Applied Therapeutics. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. E.W.G. and K.K. conceptualized the manuscript. E.W.G. led the research and writing of the manuscript. E.P., A.J.K., and K.K. each wrote sections of the manuscript. R.M., E.H., M.W., C.P., A.T.M., W.C., J.S., and M.C.R. all contributed to researching, editing, and revising key content. E.W.G. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.