The emerging availability of genomic and electronic health data in large populations is a powerful tool for research that has drawn interest in bringing precision medicine to diabetes. In this article, we discuss the potential application of genomics to the prediction, prevention, and treatment of diabetes, and we use examples from other areas of medicine to illustrate some of the challenges involved in conducting genomics research in human populations and implementing findings in practice. At this time, a major barrier to the application of genomics in diabetes care is the lack of actionable genomic findings. Whether genomic information should be used in clinical practice requires a framework for evaluating the validity and clinical utility of this approach, an improved integration of genomic data into electronic health records, and the clinical decision support and educational resources for clinicians to use these data. Efforts to identify optimal approaches in all of these domains are in progress and may help to bring diabetes into the era of genomic medicine.
Introduction
Many anticipated that the completion of the Human Genome Project over 10 years ago would mark the beginning of a new era of genomic medicine, in which new approaches to discovery research, disease prediction, and treatment would develop from an improved understanding of the genetic causes of human disease. In some areas of medicine, genomic discoveries have led to important new treatments. Genetic association studies have demonstrated that loss-of-function mutations in PCSK9 result in low levels of LDL cholesterol and a reduced risk of coronary heart disease (1,2), This discovery led to a new class of drugs with dramatic lipid-lowering effects (3,4). In oncology, there has been a shift from using older drugs with broad cytotoxic effects to therapies that target specific mutations in driver genes (5), resulting in impressive reductions in mortality for some cancers (6).
Beyond the discovery of new drug targets, genomic information can be used to predict the occurrence of disease and to identify subgroups of patients for whom existing therapies or interventions will have the greatest efficacy or the least adverse effects. These are key elements of an approach that is now called precision medicine (7). Successes in oncology and other technological developments—the rapidly decreasing cost of whole-genome sequencing (8), improvements in informatics, and the widespread adoption of electronic health records (9–11)—have galvanized interest in applying various forms of big data, including genomics, to diseases such as diabetes (12). In this article, we discuss the application of genomics to diabetes, with a focus on some of the challenges involved in conducting genomics research in human populations and implementing findings in practice.
Genomics in the Prediction, Prevention, and Diagnosis of Diabetes
The incidence and prevalence of diabetes have doubled over the past two decades (13), and there are now about 30 million adults in the U.S. living with this condition, 95% of whom have type 2 diabetes (14). Genome-wide association (GWA) studies test hundreds of thousands or even millions of common (minor allele frequency [MAF] >5%) and low-frequency (MAF 1–5%) variants across both protein coding (exonic) and noncoding (intronic) regions of the genome. Large GWA studies have identified more than 50 genetic loci associated with various glycemic traits and at least 90 loci associated with type 2 diabetes (15–18). These genetic variants, which may explain as much as 10% of the variance in disease susceptibility, have advanced our understanding of the biology of diabetes, but each genetic locus confers only a small increase in risk. For example, the common variant from these GWA studies most strongly associated with type 2 diabetes, an intronic variant in TCF7L2 (rs7903146), is associated with a 37% increased relative risk per copy of the variant allele (19). Rare variants (MAF <1%) and variants that are common only in specific ancestral populations have been associated with a greater increase in diabetes risk, but they account for less of the overall burden of diabetes (20–22).
Genetic risk scores (GRSs) that combine information from multiple genetic variants have been evaluated as a tool for the prediction of type 2 diabetes. Meigs et al. (23) found that a GRS with 18 variants was significantly associated with the risk of developing type 2 diabetes in the Framingham Heart Study (FHS) (odds ratio [OR] 1.12 per variant allele) and that persons in the highest out of three risk categories had an OR of 2.6 for developing type 2 diabetes compared with persons in the lowest risk category. However, this GRS did not improve the prediction of diabetes beyond traditional nongenetic risk factors (23), and the same was true for an updated GRS that included 65 variants (24). To put this into perspective, a prognostic marker with an OR of 3.0 that correctly identifies 80% of persons who will develop diabetes would incorrectly classify 60% of persons who will not develop diabetes (25); this degree of discrimination is not useful clinically (26).
Biologic pathways other than heritable changes in DNA sequence may also be important predictors of diabetes and account for some of the variability in diabetes susceptibility not explained by genetics or traditional environmental factors. For example, DNA methylation at CpG sites, a key epigenetic mechanism for the regulation of gene expression, has been associated with the risk of type 2 diabetes (27,28). Metabolomic profiles of amino acids and other small molecules may also play a role (29), particularly among younger adults (30). However, these new types of “omics” suffer from the traditional epidemiologic limitations of confounding and reverse causality and will require rigorous evaluation before their clinical validity and utility are understood.
Risk prediction tools are most useful when there are effective and safe prevention measures, which may include behavioral interventions or drug therapies. For high-risk adults, the Diabetes Prevention Program (DPP) lifestyle intervention reduces the risk of type 2 diabetes by more than half (31), and this intervention is now offered throughout the country at programs recognized by the Centers for Disease Control and Prevention. Although the highly effective DPP lifestyle intervention has few or no adverse effects, the identification of persons who benefit the most from the intervention could help to prioritize its deployment in resource-limited settings. Florez and colleagues (32,33) have evaluated whether several genetic variants associated with diabetes risk modified the effectiveness of the lifestyle intervention in the original DPP study, and they found little evidence of effect modification based on genetic risk. Some have argued that communicating genetic information on disease risk might help to motivate healthy behaviors, but current evidence does not support this claim (34). For example, in a small randomized controlled trial (RCT) of participants with type 2 diabetes who all underwent an intensive lifestyle intervention directed at weight loss, those who received information on their genetic risk for diabetes had the same self-reported motivation and adherence to the intervention as those who did not (35).
If genetic tests are not helpful in the prediction and prevention of diabetes, they could have a role in discriminating between type 1 and type 2 diabetes. The epidemic of obesity (36) has made it more difficult to distinguish diabetes type because many children and young adults with type 1 diabetes are also obese (37). Misclassification poses significant risks; an incorrect diagnosis of type 2 diabetes may lead to inappropriate treatment with oral glucose-lowering drugs, and an incorrect diagnosis of type 1 diabetes may lead to unnecessary insulin treatment. In a recent cross-sectional study of type 1 and type 2 diabetes, Oram et al. (38) evaluated a GRS that included high-risk HLA genotypes and 31 genetic loci for type 2 diabetes. They found that this GRS improved the discrimination between strictly defined type 1 and type 2 diabetes when added to clinical factors and autoimmune antibody tests, and it also helped to predict who would require insulin treatment within 3 years of diagnosis (38). One advantage of a diagnostic tool based on genotype is that, unlike islet cell antibodies, the result does not change over time. However, before this type of genetic testing can be recommended for routine use in the clinic, further evaluation in prospective studies will be necessary to demonstrate not only accurate discrimination between type 1 and type 2 diabetes but also improved use of appropriate glucose-lowering treatment.
Most cases of diabetes have multiple genetic and environmental causes and are classified according to the presumed pathophysiologic defect—autoimmune destruction of β-cells leading to insulin deficiency for type 1 diabetes and varying degrees of insulin resistance and deficiency for type 2 diabetes. In other words, the vast majority of diabetes is polygenic, and despite the growth in knowledge about the various genetic causes of diabetes in recent years, classification of individual cases into meaningful subtypes based on the underlying genetics has been difficult. On the other hand, genetic testing may be useful for the diagnosis of certain forms of diabetes caused by defects in a single gene, such as HNF1A mutations for maturity-onset diabetes of the young (MODY) (39) and activating KCNJ11 mutations for neonatal diabetes (40), both of which are highly responsive to sulfonylurea therapy. These monogenic forms of diabetes account for ∼1–2% of diabetes cases (41,42), and they typically present at a young age (<25 years) and follow an autosomal dominant pattern of inheritance. Targeted genotyping could also play a role in the diagnosis of type 2 diabetes in specific populations. For example, a rare missense variant in HNF1A (p.E508K) that increased the risk of diabetes fivefold was present among 2% in a study of Latinos in the southern U.S. with type 2 diabetes (20); additional studies are needed to determine whether this functional variant shares the sulfonylurea-responsiveness of the HNF1A variants that cause MODY.
Pharmacogenomics of Therapies for Type 2 Diabetes
Although, among persons with diabetes, the rates of microvascular (retinopathy, neuropathy, kidney disease) and cardiovascular complications have decreased by about half over the past two decades, they still occur more often among persons with diabetes than among individuals without diabetes (43). Reducing these risks is the major goal of glucose-lowering therapy. For type 1 diabetes, the long-term benefits of intensive insulin therapy are well established (44,45). For type 2 diabetes, intensive glucose-lowering therapy prevents microvascular complications, and postrandomization follow-up data from several RCTs suggest there may be long-term cardiovascular benefits as well (46–48).
There are now several classes of medications approved for the treatment of diabetes (Table 1). Most oral therapies have similar average effects on hemoglobin A1c (HbA1c), but they differ in their contraindications and side effects (49,50). There is surprisingly little information about the comparative benefits and harms for different drugs (51), and treatment guidelines for type 2 diabetes (52) permit the use of most approved drug classes as second-line therapy after metformin, which is the recommended first-line therapy for most patients because of its good safety profile and potential cardiovascular benefits (53,54). There is also substantial interindividual variability in drug response (55), and many patients eventually fail to achieve recommended levels of glycemic control with their initial therapy (56,57). For example, in the U.K. General Practice Research Database, only half of patients who initiated therapy with metformin or a sulfonylurea achieved an HbA1c level of <7% (58). The factors that account for this variation are not well understood (59). Because of the wide range of side effects from different therapies and because of the person-to-person variability in treatment response and adverse effects, pharmacogenomic testing for genetic variants that define subgroups of patients who are most likely to benefit from or least likely to be harmed by specific drugs is an attractive potential application of genomics in diabetes.
Drug classes . | Oral . | Average HbA1c reduction . | Other benefits . | Adverse effects . | Cost . |
---|---|---|---|---|---|
Metformin | Yes | ∼1.0–1.5 | Weight loss, possible cardiovascular benefit | Lactic acidosis (rare), gastrointestinal side effects | Low |
Sulfonylureas | Yes | ∼1.0–1.5 | Hypoglycemia, weight gain, potential cardiovascular risk | Low | |
Meglitinides | Yes | ∼1.0 | Hypoglycemia, weight gain | Moderate | |
Thiazolidinediones | Yes | ∼1.0 | Heart failure, myocardial infarction (rosiglitazone), bone loss and fractures | Moderate | |
α-Glucosidase inhibitors | Yes | ∼0.8 | Flatulence, diarrhea | Moderate | |
Amylin analog | No | ∼0.6 | Hypoglycemia, nausea | High | |
DPP-4 inhibitors | Yes | ∼0.6–0.8 | Potential heart failure (saxagliptin) | High | |
GLP-1 receptor agonists | No | ∼1.0 | Weight loss | Nausea, vomiting, diarrhea | High |
SGLT2 inhibitors | Yes | ∼0.6–0.8 | Possible cardiovascular benefit | Genitourinary infections, ketoacidosis (rare) | High |
Insulin | No | Unlimited | Most potent treatment | Hypoglycemia, weight gain | Low-high |
Drug classes . | Oral . | Average HbA1c reduction . | Other benefits . | Adverse effects . | Cost . |
---|---|---|---|---|---|
Metformin | Yes | ∼1.0–1.5 | Weight loss, possible cardiovascular benefit | Lactic acidosis (rare), gastrointestinal side effects | Low |
Sulfonylureas | Yes | ∼1.0–1.5 | Hypoglycemia, weight gain, potential cardiovascular risk | Low | |
Meglitinides | Yes | ∼1.0 | Hypoglycemia, weight gain | Moderate | |
Thiazolidinediones | Yes | ∼1.0 | Heart failure, myocardial infarction (rosiglitazone), bone loss and fractures | Moderate | |
α-Glucosidase inhibitors | Yes | ∼0.8 | Flatulence, diarrhea | Moderate | |
Amylin analog | No | ∼0.6 | Hypoglycemia, nausea | High | |
DPP-4 inhibitors | Yes | ∼0.6–0.8 | Potential heart failure (saxagliptin) | High | |
GLP-1 receptor agonists | No | ∼1.0 | Weight loss | Nausea, vomiting, diarrhea | High |
SGLT2 inhibitors | Yes | ∼0.6–0.8 | Possible cardiovascular benefit | Genitourinary infections, ketoacidosis (rare) | High |
Insulin | No | Unlimited | Most potent treatment | Hypoglycemia, weight gain | Low-high |
Pharmacogenomic studies have typically focused on candidate genes involved in pharmacokinetics (absorption, distribution, metabolism, and elimination) or pharmacodynamics (the biologic effect of a drug on its target) (60). The pharmacogenomics of oral diabetes therapies has been reviewed in detail elsewhere (61–63); selected findings for the most extensively studied drugs, sulfonylureas and metformin, are listed in Table 2. Sulfonylureas undergo metabolism by cytochrome P450 (CYP) enzyme 2C9, and loss-of-function variants in the CYP2C9 gene have been associated with greater glucose-lowering effects and an increased risk of hypoglycemia (64,65). The genes encoding the sulfonylurea receptor, KCNJ11 and ABCC8, have also been associated with increased sulfonylurea response in some studies (66,67) but not others (68,69), and in one study the association was in the opposite direction (70). Unlike sulfonylureas, metformin does not undergo metabolism by CYP enzymes; it is excreted intact by the kidneys (71). Genes encoding several transporters that facilitate the movement of metformin into the bloodstream, into target tissues, and into renal tubular cells have been associated with effects on serum levels of metformin, glucose-lowering effect, and drug intolerance (72,73).
Drug . | Locus . | Phenotype . | N . | Effect** . | Refs. . |
---|---|---|---|---|---|
Sulfonylureas | |||||
Pharmacokinetic | CYP2C9 | HbA1c response | 1,073 | 0.5% absolute greater reduction in HbA1c (homozygous for variant alleles) | 64 |
FBG response | 475 | No association | 75 | ||
Hypoglycemia | 357 | OR 5.2 for hypoglycemia (homozygous for variant alleles) | 65 | ||
Pharmacodynamic | KCNJ11/ABCC8* | HbA1c and FBG response | 1,268 | 3.5% relative greater reduction in HbA1c and 7.7% relative greater reduction in FBG (homozygous for variant alleles) | 66 |
HbA1c response | 101 | 0.2% absolute greater reduction in HbA1c (per variant allele) | 67 | ||
Insulin treatment | 525 | No association | 68 | ||
FBG response | 228 | No association | 69 | ||
HbA1c response | 97 | Less reduction in HbA1c | 70 | ||
TCF7L2 | On treatment HbA1c <7% | 901 | OR 1.9 for treatment failure (homozygous for variant allele) | 164 | |
On treatment HbA1c <7% | 189 | OR 1.6 for treatment failure (per variant allele) | 165 | ||
Metformin | |||||
Pharmacokinetic | SLC22A1 | HbA1c response | 102 | 0.3% absolute lower reduction in HbA1c (per variant allele) | 166 |
HbA1c response | 371 | 1.1% absolute lower reduction in HbA1c (per variant allele) | 77 | ||
On treatment HbA1c <7% | 1,531 | No association | 76 | ||
Drug intolerance | 2,166 | OR 2.4 for discontinuation (homozygous for variant alleles) | 73 | ||
SLC47A1 | HbA1c response | 116 | 0.3% absolute lower reduction in HbA1c (per variant allele) | 167 | |
HbA1c response | 371 | No association | 77 | ||
Risk of type 2 diabetes | 2,994 | Less reduction in diabetes risk | |||
SLC47A2 | HbA1c response | 253 | 0.1% absolute lower reduction in HbA1c (any variant allele) | 168 | |
HbA1c response | 371 | No association | 77 | ||
GWA studies | ATM | HbA1c response, on treatment HbA1c <7% | 2,896 | 0.1% absolute greater reduction in HbA1c and OR 1.4 for treatment success (per variant allele) | 80 |
HbA1c response, on treatment HbA1c <7% | 1,366 | No association with HbA1c response, OR 1.2 for treatment success (per variant allele) | 81 |
Drug . | Locus . | Phenotype . | N . | Effect** . | Refs. . |
---|---|---|---|---|---|
Sulfonylureas | |||||
Pharmacokinetic | CYP2C9 | HbA1c response | 1,073 | 0.5% absolute greater reduction in HbA1c (homozygous for variant alleles) | 64 |
FBG response | 475 | No association | 75 | ||
Hypoglycemia | 357 | OR 5.2 for hypoglycemia (homozygous for variant alleles) | 65 | ||
Pharmacodynamic | KCNJ11/ABCC8* | HbA1c and FBG response | 1,268 | 3.5% relative greater reduction in HbA1c and 7.7% relative greater reduction in FBG (homozygous for variant alleles) | 66 |
HbA1c response | 101 | 0.2% absolute greater reduction in HbA1c (per variant allele) | 67 | ||
Insulin treatment | 525 | No association | 68 | ||
FBG response | 228 | No association | 69 | ||
HbA1c response | 97 | Less reduction in HbA1c | 70 | ||
TCF7L2 | On treatment HbA1c <7% | 901 | OR 1.9 for treatment failure (homozygous for variant allele) | 164 | |
On treatment HbA1c <7% | 189 | OR 1.6 for treatment failure (per variant allele) | 165 | ||
Metformin | |||||
Pharmacokinetic | SLC22A1 | HbA1c response | 102 | 0.3% absolute lower reduction in HbA1c (per variant allele) | 166 |
HbA1c response | 371 | 1.1% absolute lower reduction in HbA1c (per variant allele) | 77 | ||
On treatment HbA1c <7% | 1,531 | No association | 76 | ||
Drug intolerance | 2,166 | OR 2.4 for discontinuation (homozygous for variant alleles) | 73 | ||
SLC47A1 | HbA1c response | 116 | 0.3% absolute lower reduction in HbA1c (per variant allele) | 167 | |
HbA1c response | 371 | No association | 77 | ||
Risk of type 2 diabetes | 2,994 | Less reduction in diabetes risk | |||
SLC47A2 | HbA1c response | 253 | 0.1% absolute lower reduction in HbA1c (any variant allele) | 168 | |
HbA1c response | 371 | No association | 77 | ||
GWA studies | ATM | HbA1c response, on treatment HbA1c <7% | 2,896 | 0.1% absolute greater reduction in HbA1c and OR 1.4 for treatment success (per variant allele) | 80 |
HbA1c response, on treatment HbA1c <7% | 1,366 | No association with HbA1c response, OR 1.2 for treatment success (per variant allele) | 81 |
FBG, fasting blood glucose.
*Loss-of-function variants rs757110 in ABCC8 and rs5219 in KCNJ11 are in near-complete linkage disequilibrium.
**Effect listed only if reported as statistically significant in cited article.
Candidate gene studies of sulfonylureas and metformin have not typically accounted for false positives from multiple comparisons, and most findings from these studies have failed to replicate (68,74–77). Unfortunately, this low replication rate is consistent with the rate in other candidate gene studies, which may be as low as 1–3% (78,79). The potential reasons include small sample sizes, heterogeneity of genetic architecture across different populations, and publication bias. In contrast, large GWA studies that use rigorous, prespecified statistical methods have produced more valid and reproducible results. For example, the only GWA study of treatment response to a diabetes therapy, conducted in a Scottish observational cohort with nearly 3,000 participants, identified a locus for glucose-lowering response to metformin near the ATM gene that met a stringent threshold for statistical significance (80), and this finding has replicated in other populations (81). Well-powered pharmacogenomic studies that use rigorous statistical methods may identify genetic variants that result in greater glucose-lowering effects or fewer adverse effects from other diabetes therapies.
GWA studies rely on information about the nonrandom association of alleles (linkage disequilibrium) to “tag” common and low-frequency variation throughout the genome in a subset of directly genotyped variants. Because these studies have identified only a small portion of the known heritability of most complex traits (82) and because theory predicts that rare variants are more likely to be damaging to gene function than common variants (83,84), there has been growing interest in studying rare variants. Advances in technology and informatics have made possible the high-throughput sequencing of whole exomes and whole genomes, which provide a complete assessment of both common and rare variation (85). A recent study that sequenced 202 genes encoding drug targets in 14,000 individuals identified on average one rare variant per 17 base pairs, and 90% of these variants were newly discovered (86). Another exome-sequencing study of 12 CYP genes, which are responsible for about 75% of all known oxidative drug metabolism, identified 1,006 variants in these genes and found that 73% were rare, a third were predicted to affect protein structure, and 9% of individuals had at least one newly discovered functional variant (87). Whether this abundance of rare variation explains some of the interindividual variability in drug response is unknown at this time, but the topic merits further evaluation.
Challenges in Conducting Genomic Research
Study Power and Sample Size
One major barrier to genomic discovery in diabetes research has been the limited power to detect associations, which is a function of the frequency of the genetic variant, the magnitude of the effect to be detected, and the sample size (88). Large GWA studies with study populations of tens of thousands or more have helped to unravel the biology underlying many complex diseases, but most genetic loci identified by these studies have small effects. For example, the largest GWA study of type 2 diabetes, which included nearly 35,000 case and 115,000 control subjects, identified 65 genetic loci; all but the TCF7L2 locus had ORs of 1.2 or lower per copy of the variant allele (15).
In pharmacogenomic studies, restriction to users of a particular drug further limits the available study population (89). Rare immune-mediated adverse drug reactions (ADRs) such as drug-induced liver injury or severe skin reactions may be caused by pathogenic variants with ORs of 100 or greater, and these associations can be detected with fewer than 50 cases (90–92). For diabetes therapies, rare adverse effects such as metformin-associated lactic acidosis or ketoacidosis related to sodium–glucose cotransporter 2 inhibitors may be amenable to pharmacogenomic discovery in similarly small studies.
It has been more difficult to identify pharmacogenomic associations with complex phenotypes such as myocardial infarction or stroke (93) or with quantitative traits such as QT interval prolongation or cholesterol lowering. For example, a large GWA study with over 30,000 participants from 10 observational cohorts evaluated QT interval prolongation from various drugs and failed to identify any pharmacogenomic loci at genome-wide levels of significance (94), and a GWA study with over 40,000 statin-using participants discovered and replicated two new loci for LDL cholesterol lowering, but each variant allele resulted in a relative change in LDL lowering effect of less than 2% (95). While these sorts of findings may reveal new information about human population biology, these effect sizes, which cannot be distinguished from intraindividual variation or measurement error in individual patients, are so small that the effort to genotype these variants can be safely omitted from clinical practice. As pharmacogenomic efforts move to whole-exome and whole-genome sequencing, even larger sample sizes may be required to identify associations with rare variants.
Individual pharmacogenomic variants often have small effects or fail to reach stringent thresholds of statistical significance, but they can be combined within a gene or within a pathway of several genes to identify clinically important effects. As an example, Dujic et al. (73) evaluated the relationship between metformin intolerance and four reduced-function variants in the gene SLC22A1, which transports metformin into the intestine and may mediate some of the gastrointestinal side effects from this drug. They found that the presence of any two reduced-function alleles increased the risk of metformin discontinuation by 2.4-fold (73).
To assess associations with the large number of variants identified by whole-exome and whole-genome sequencing, more innovative approaches are necessary. Annotation tools have been used to restrict analyses to variants that are likely to be functional, on the basis of expected protein structure, associations with gene expression levels (96,97), and information from the Encyclopedia of DNA Elements (ENCODE) project, which has systematically mapped nonprotein coding regulatory function—including transcription factor binding sites, chromatin structure, and histone modifications—throughout the genome (98). Various statistical methods have been developed that aggregate rare functional variants for gene-based association tests, which can improve the power to detect an association if there are multiple damaging variants within a gene (99,100). Data from whole-exome and whole-genome sequencing have not yet been systematically evaluated in pharmacogenomic studies, but for some complex traits these methods have been used to identify rare variants with large effects, both in new and previously unidentified loci (101).
The studies of statin response and drug-induced QT prolongation described in the preceding paragraphs represent one model for conducting genomic research in large populations: local analysis of deeply phenotyped cohorts followed by a meta-analysis of summary results in large research consortia (15,102). The increasing availability of electronic health data and a recognition of the large sample sizes required for genomic discovery research have led to the emergence of another model: biobank studies that genotype tens or hundreds of thousands of individuals and link these genetic data with participants’ electronic health data to create large data repositories. Some of these studies include a baseline visit for physical measurements, the collection of specimens, and imaging tests (103,104), similar to the traditional cohort studies, and they all rely on electronic health databases for longitudinal information on health care encounters, laboratory tests, vital status, and medication use. Two of the largest biobank studies, the Million Veteran Program (105) and the UK Biobank (104), have recruited close to 500,000 individuals each.
Phenotyping With Electronic Health Data
Electronic health data have been immensely useful for research, but they have important limitations. Results from laboratory tests that are measured in the course of clinical care, such as HbA1c or cholesterol levels, are likely to be recorded accurately and completely, but they may be related to the clinical indications for the tests and lack standardization across sites. Diagnosis codes associated with health care encounters, an important source of information about disease status, are assigned for clinical and billing rather than research purposes; geographic location (106), changes in reimbursement (107,108), and other factors can influence the assignment of these codes. For a given study design, different databases can produce different estimates of association between a drug exposure and a health outcome, sometimes in opposite directions (109). Nonetheless, diagnosis code–based algorithms have been used to identify some acute diabetes complications, such as hypoglycemia (110,111) and myocardial infarction (112,113), with a reasonably high degree of accuracy (positive predictive value [PPV] 80–90%).
For other diabetes complications, such as heart failure, the accuracy of a diagnostic algorithm may vary substantially by the diagnosis code used and even the position of the code (primary vs. secondary) (114,115). The use of low-PPV algorithms can attenuate estimate of associations toward the null (116), sometimes dramatically. When a new diagnosis code was introduced for rhabdomyolysis, a severe ADR related to statin use, we evaluated the accuracy of this code in a large health care system by reviewing electronic medical records to validate potential cases, and we estimated the risk of rhabdomyolysis associated with the 80-mg dose of simvastatin, which was known to be high from a recent RCT (117). The PPV of the rhabdomyolysis code for the statin-related ADR was only 8%. Moreover, the relative risk for rhabdomyolysis associated with high-dose versus low-dose simvastatin was 12.2 when validated cases were evaluated, replicating the RCT estimate, but only 1.8 when the diagnosis code was used without validation (118). This marked attenuation of a genuine association is not an isolated finding; the quality (119) and severity (120) of disease phenotypes have also been shown to impact the magnitude of pharmacogenomic associations. For studies that evaluate genetic associations with disease outcomes, including ADRs, it may sometimes be necessary to conduct validation studies to assess the accuracy of claims-based data (121). Despite these limitations, the linkage of genomic data with electronic health data in large study populations holds promise as a powerful tool for research.
Framework for Evaluating a Genomic Test
For the regulatory approval of new therapies directed at specific genetic defects, such as ivacaftor for the treatment of cystic fibrosis caused by a specific CFTR mutation (122), the U.S. Food and Drug Administration (FDA) requires the same standard used to evaluate all new drugs: substantial evidence of efficacy and safety from well-controlled studies, typically rigorously conducted RCTs. For therapies already in use that have demonstrated efficacy and safety in nongenetically determined populations, the evidence standard for using genomic information to select the best drug or dose is less certain. The FDA currently includes pharmacogenomic information in the prescribing information for over 160 drugs (123), but its oversight of laboratory-developed genetic tests is concerned primarily with analytic validity (i.e., ability to reliably measure a genetic variant or a biomarker) (124), which is inadequate to guide treatment decisions. A recent study of pharmacogenomic information in drug labels found convincing evidence of clinical validity (i.e., how accurately and consistently genetic variation predicts a phenotype) for only 36% of these drugs, and evidence of clinical utility (i.e., ability to improve outcomes for patients) for only 15% (125). The Centers for Disease Control and Prevention Office of Public Health Genomics has developed a more comprehensive framework for the evaluation of a genetic test that includes questions about analytic validity, clinical validity, and clinical utility, as well as ethical, legal, and social implications (ACCE), which are summarized in Table 3 (126,127). These standards are important to protect the health of the public.
. | Key questions . | Example applications to diabetes . |
---|---|---|
Analytic validity | How often does the genetic test fail to give a useable result? | For a GRS for type 2 diabetes risk, how often does the test result in a genotype at each locus? |
What is the sensitivity and specificity of the test for a genetic variant? | For the rare HNF1A missense variant associated with diabetes risk, how often does the test detect the variant when it is present and how often does the test result in a false positive when the variant is not present? | |
What is the within- and between-laboratory precision? | Are standardized methods used in different laboratories? How does the accuracy of the test vary from laboratory to laboratory? | |
Is confirmatory testing required? | When a pathogenic variant in a MODY gene is identified from high-throughput sequencing, is confirmatory genotyping by another method required? | |
Clinical validity | What is the quality of the disease phenotype or treatment response measurement? What is the quality of the study designs used to evaluate these outcomes? | For a GRS that discriminates type 1 from type 2 diabetes, are standardized definitions of diabetes type used? Was the study design cross-sectional or longitudinal? |
What is the prevalence of the phenotype or the distribution of treatment response in the studied populations? | For a pharmacogenomic test for metformin treatment response, how is treatment failure/success defined and what is the rate of treatment failure/success in the studied population? | |
What is the sensitivity and specificity of the test for the disease phenotype or treatment response? What is the magnitude and precision of the genotype-phenotype relationship? | What is the relative change in HbA1c lowering associated with the ATM locus for metformin treatment response? | |
What are the genetic or environmental modifiers of the genotype-phenotype relationship? | Do diabetes severity, obesity, the use of additional drug therapies, or other factors modify the pharmacogenomic associations for metformin treatment response? | |
Has the test been adequately validated on all populations in which it may be offered? | For a GRS for type 2 diabetes risk, has the test been validated among obese individuals and among persons from different racial/ethnic populations? | |
Clinical utility | What is the impact of a positive or negative test on patient care in terms of health outcomes? | What is the absolute change in HbA1c lowering associated with the ATM locus for metformin treatment response? Does this pharmacogenomic test result in projected or tangible health benefits in terms of clinical complications, such as a reduced risk of blindness or cardiovascular disease? |
What are the financial costs associated with testing and the economic benefits associated with actions resulting from testing? | What is the cost to test for the common TCF7L2 variant for type 2 diabetes risk, including laboratory, reporting, educational, and counseling costs? What is the cost per case of diabetes predicted? Does this test result in cost-effective screening or prevention of diabetes later in life? | |
What educational materials for patients have been developed and validated? | Do educational materials clearly explain the magnitude of the increased risk of type 2 diabetes associated with the common TCF7L2 variant in relation to other known risk factors? | |
Ethical, legal, and social implications | What is known about how this test could lead to stigmatization, discrimination, privacy/confidentiality, and personal/family social issues? | Does the identification of a pathogenic variant responsible for a hereditary form of early-onset diabetes have implications for family planning? |
Are there legal issues regarding consent, ownership of data and/or samples, patents, licensing, proprietary testing, disclosure, or reporting requirements? | Are physicians providing sufficient information about the potential risks and benefits from genetic testing so that patients can make informed decisions? | |
What safeguards have been described and are these safeguards effective? |
. | Key questions . | Example applications to diabetes . |
---|---|---|
Analytic validity | How often does the genetic test fail to give a useable result? | For a GRS for type 2 diabetes risk, how often does the test result in a genotype at each locus? |
What is the sensitivity and specificity of the test for a genetic variant? | For the rare HNF1A missense variant associated with diabetes risk, how often does the test detect the variant when it is present and how often does the test result in a false positive when the variant is not present? | |
What is the within- and between-laboratory precision? | Are standardized methods used in different laboratories? How does the accuracy of the test vary from laboratory to laboratory? | |
Is confirmatory testing required? | When a pathogenic variant in a MODY gene is identified from high-throughput sequencing, is confirmatory genotyping by another method required? | |
Clinical validity | What is the quality of the disease phenotype or treatment response measurement? What is the quality of the study designs used to evaluate these outcomes? | For a GRS that discriminates type 1 from type 2 diabetes, are standardized definitions of diabetes type used? Was the study design cross-sectional or longitudinal? |
What is the prevalence of the phenotype or the distribution of treatment response in the studied populations? | For a pharmacogenomic test for metformin treatment response, how is treatment failure/success defined and what is the rate of treatment failure/success in the studied population? | |
What is the sensitivity and specificity of the test for the disease phenotype or treatment response? What is the magnitude and precision of the genotype-phenotype relationship? | What is the relative change in HbA1c lowering associated with the ATM locus for metformin treatment response? | |
What are the genetic or environmental modifiers of the genotype-phenotype relationship? | Do diabetes severity, obesity, the use of additional drug therapies, or other factors modify the pharmacogenomic associations for metformin treatment response? | |
Has the test been adequately validated on all populations in which it may be offered? | For a GRS for type 2 diabetes risk, has the test been validated among obese individuals and among persons from different racial/ethnic populations? | |
Clinical utility | What is the impact of a positive or negative test on patient care in terms of health outcomes? | What is the absolute change in HbA1c lowering associated with the ATM locus for metformin treatment response? Does this pharmacogenomic test result in projected or tangible health benefits in terms of clinical complications, such as a reduced risk of blindness or cardiovascular disease? |
What are the financial costs associated with testing and the economic benefits associated with actions resulting from testing? | What is the cost to test for the common TCF7L2 variant for type 2 diabetes risk, including laboratory, reporting, educational, and counseling costs? What is the cost per case of diabetes predicted? Does this test result in cost-effective screening or prevention of diabetes later in life? | |
What educational materials for patients have been developed and validated? | Do educational materials clearly explain the magnitude of the increased risk of type 2 diabetes associated with the common TCF7L2 variant in relation to other known risk factors? | |
Ethical, legal, and social implications | What is known about how this test could lead to stigmatization, discrimination, privacy/confidentiality, and personal/family social issues? | Does the identification of a pathogenic variant responsible for a hereditary form of early-onset diabetes have implications for family planning? |
Are there legal issues regarding consent, ownership of data and/or samples, patents, licensing, proprietary testing, disclosure, or reporting requirements? | Are physicians providing sufficient information about the potential risks and benefits from genetic testing so that patients can make informed decisions? | |
What safeguards have been described and are these safeguards effective? |
As described earlier, most reported pharmacogenomic findings fail to replicate and therefore lack clinical validity. Similarly, rare variants identified by sequencing and initially identified as “pathogenic” are often downgraded to “uncertain significance” or “benign” after further study (128,129). For example, in the FHS and the Jackson Heart Study (JHS), seven genes responsible for MODY were sequenced and rare variants were identified that had previously been identified as causal for MODY or were predicted to be damaging to protein function by using annotation tools. These variants, present among 2% of persons in these population-based cohorts, were not associated with the risk of diabetes, and only one variant carrier out of 68 met the criteria for MODY (130).
Of the genomic findings for diabetes that have clinical validity, most have effects on disease outcomes or surrogate end points that are uncertain or too small to be clinically meaningful. Genetic risk panels for the prediction of type 2 diabetes are one example (131). Another example is the ATM locus for metformin treatment response, the most widely replicated diabetes pharmacogenomic finding; each copy of the variant allele was associated with only a 0.1% greater reduction in HbA1c (81). Even if patients had whole-genome sequence data already available in their medical records, this treatment effect is still too small to be clinically useful. For MODY, which often has a clear genetic cause, most cases appear to be undiagnosed, in part because of uncertainty from clinicians about the clinical benefits of genetic testing (132). Making treatment decisions based on genetic tests that lack clinical validity, clinical utility, or both can have unintended consequences, including withholding beneficial treatments, an unnecessary increase in costs, or the use of drugs with harmful effects.
For examples of actionable genetic tests that meet the criteria for clinical validity and clinical utility, areas of medicine other than diabetes are instructive. Abacavir is a first-line treatment for HIV that is effective and well tolerated in most people, but about 5% of persons exposed to this drug develop a serious hypersensitivity reaction characterized by rash, fever, and damage to multiple organ systems (133). Initial studies of men of mostly European ancestry identified the cause of this ADR to be the HLA-B*57:01 variant (91,134), with an OR of well over 100 (119), and this finding was later extended to other populations (135). A double-blind RCT demonstrated that genetic testing for the HLA-B*57:01 variant prevented all cases of abacavir-induced hypersensitivity (136), and genetic testing is now routinely conducted before initiating treatment with this drug.
For another severe immune-mediated ADR, the translation of genetic testing into practice followed a different course. Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) related to carbamazepine use was found to be strongly associated with the HLA variant HLA-B*15:02 in several Asian populations (92,137), and screening for this variant can effectively prevent the ADR (137). In 2008, Hong Kong instituted a policy of routine genetic screening before initiating carbamazepine treatment, but providers simply avoided prescribing this medication and instead prescribed other antiepileptic drugs that can also cause severe skin reactions. As a result, the overall rate of SJS/TEN in Hong Kong did not change (138). This unintended result of a policy decision on a genetic test with clinical validity and clinical utility demonstrates that other concerns, such as convenience or cost, may influence the harms or benefits for a population.
Even for well-replicated pharmacogenomic findings with large effects, there can be disagreement about what constitutes clinical utility. Warfarin is an anticoagulant drug that is notoriously difficult to dose because of wide interindividual variability in the anticoagulant effect, much of which is explained by variation in the pharmacokinetic genes CYP2C9 and VKORC1. The benefits of testing for variants in these genes have even been evaluated in three large well-designed RCTs, two of which demonstrated that genetic testing resulted in a shorter time to therapeutic effect but no reduction in thrombotic or bleeding complications (139–141). Opinions differ about whether pharmacogenomic testing has a role in warfarin treatment (142–144), and there has been a broader debate about whether RCTs are required to demonstrate the utility of a genetic test (79,145,146). Nonetheless, genetic testing for warfarin treatment has been minimal in practice. In the absence of consistent recommendations about when to use genetic information in prescribing and other treatment decisions, health care providers, patients, and also payers and health care systems will have an important voice in this discussion.
Additional Barriers to the Implementation of Genomics
The implementation of genomics in practice has generally involved a decision about whether to order a genetic test before starting a new treatment. For patients who have already undergone preemptive genotyping or sequencing, which is not yet common but likely to increase in the future (147–149), the decision may be about whether to use a patient’s existing genetic information to guide treatment. With either approach, the main barrier to the implementation of genomics for diabetes care at this time is a robust lack of actionable findings in the areas of diabetes and cardiovascular medicine. There are also practical issues that must be addressed, which include the storage and integration of complex genomic data into the electronic health record, the interpretation of these data in an accessible format, and the creation of clinical decision support (CDS) tools that aid health care providers in using this information to make treatment decisions at the point of care (150). Various groups have made important contributions in developing model systems that address these problems (147,148,151,152).
CDS tools, which have been in use for many years for diabetes management, provide automated testing or treatment recommendations based on information in the electronic health record (153). The CDS tools for genomics that are under development rely on externally curated data resources, such as PharmGKB (154) and ClinVar (155), to determine which potentially actionable variants to incorporate and how to translate genotypes into expected phenotypes. PharmGKB currently includes information about pharmacogenomic associations for metformin and sulfonylureas; however, none of the associations meet the criteria for the highest level of evidence, and there is no recommendation for pharmacogenomic testing for these drugs (156). In drug-specific evidence summaries and guidelines, the Clinical Pharmacogenetics Implementation Consortium has begun to provide standardized terms, sample text for electronic health record documentation and point of care alerts, and clinical implementation workflows, which will be a valuable resource as the role of pharmacogenomics expands in clinical practice (157,158).
The increasing availability of CDS tools and online databases with information on actionable genomic findings will facilitate the use of genomic testing for diabetes by clinicians. However, because most clinicians are not yet comfortable using genomic information to make clinical decisions (159,160), improved education for both trainees and practicing clinicians is needed, and opportunities for both are improving (161,162). Professional societies and specialty boards will play an important role in the integration of genomics in physician training and continuing medical education; credentialing requirements for ordering genomic tests could help to ensure that such tests are used appropriately (163). If the use of genomic information ultimately proves useful in the care of patients with diabetes, even with adequate educational and training opportunities for clinicians, financial incentives may be required to spur widespread implementation, as was the case with the increased adoption of electronic health records after legislation was passed that provided reimbursements to hospitals and providers for this purpose (9–11).
Conclusions
At this time, there are few if any actionable genomic findings for diabetes that are ready for implementation. However, the increasing availability of genomic data in large populations linked with electronic health data may become a powerful resource for genomic discovery, and examples from other areas of medicine offer lessons about the limitations of these data that can help guide the direction of future research. Whether genomic information should be used in clinical practice requires a framework for evaluating the validity and clinical utility of this approach, an improved integration of genomic data into electronic health records, and the clinical decision support and educational resources for clinicians to use these data. Efforts to identify optimal approaches in all of these domains are creating a growing body of evidence that may help to bring diabetes into the era of genomic medicine.
Article Information
Funding. J.S.F. was supported by National Heart, Lung, and Blood Institute grant K08HL116640.
Funding agencies did not influence the design and conduct of the study, collection, management, analysis, and interpretation of the data or preparation, review, or approval of the manuscript.
Duality of Interest. B.M.P. serves on the Data and Safety Monitoring Board of a clinical trial of a device funded by the manufacturer (Zoll LifeCor) and on the Steering Committee of the Yale University Open Data Access Project funded by Johnson & Johnson. No other potential conflicts of interest relevant to this article were reported.