OBJECTIVES—Genome-wide association studies have dramatically increased the number of common genetic variants that are robustly associated with type 2 diabetes. A possible clinical use of this information is to identify individuals at high risk of developing the disease, so that preventative measures may be more effectively targeted. Here, we assess the ability of 18 confirmed type 2 diabetes variants to differentiate between type 2 diabetic case and control subjects.
RESEARCH DESIGN AND METHODS—We assessed index single nucleotide polymorphisms (SNPs) for the 18 independent loci in 2,598 control subjects and 2,309 case subjects from the Genetics of Diabetes Audit and Research Tayside Study. The discriminatory ability of the combined SNP information was assessed by grouping individuals based on number of risk alleles carried and determining relative odds of type 2 diabetes and by calculating the area under the receiver-operator characteristic curve (AUC).
RESULTS—Individuals carrying more risk alleles had a higher risk of type 2 diabetes. For example, 1.2% of individuals with >24 risk alleles had an odds ratio of 4.2 (95% CI 2.11–8.56) against the 1.8% with 10–12 risk alleles. The AUC (a measure of discriminative accuracy) for these variants was 0.60. The AUC for age, BMI, and sex was 0.78, and adding the genetic risk variants only marginally increased this to 0.80.
CONCLUSIONS—Currently, common risk variants for type 2 diabetes do not provide strong predictive value at a population level. However, the joint effect of risk variants identified subgroups of the population at substantially different risk of disease. Further studies are needed to assess whether individuals with extreme numbers of risk alleles may benefit from genetic testing.
Recent genome-wide association (GWA) studies, which assay >300,000 single nucleotide polymorphisms (SNPs) across many thousands of individuals, have led to the discoveries of variants predisposing to many common complex diseases, including type 2 diabetes (1–6), coronary artery disease (7–9), prostate cancer (10,11), Crohn's disease (12–14), and many others (see http://www.genome.gov/26525384 for an up to date list of all GWA studies). The variants identified by these GWA studies are common in the general population (minor allele frequency >1%), but most have, individually, only small effects on disease risk, with odds ratios (ORs) typically <1.3.
Despite the relatively small predisposing effects conferred, these variants provide important, novel insights into disease biology. For example, variants of a number of genes, such as HHEX, CDKN2A/B, and CDKAL1, implicate defects in pancreatic β-cell development and function as important in type 2 diabetes etiology (4,15,16), whereas the discovery that variants in FTO are associated with BMI opened up novel areas of investigation for obesity biology (17–19). By gaining further knowledge of the underlying biology, and promoting potential therapeutic and preventative approaches, these insights are likely to be the most important outcome from these GWA studies.
A more immediate clinical utility may be to use the identified risk variants to aid the determination of an individual's risk of developing a particular disease. Several companies, such as deCODE genetics and 23andme, have begun to use SNPs identified from these GWA studies, offering up to 1 million SNP GWA scans (http://www.decodeme.com; https://www.23andme.com) or individual disease-associated SNP tests (http://www.decodediagnostics.com). It is, however, unclear how useful the currently identified variants will be in predicting disease.
One of the disease traits for which the GWA approach has been most successful is type 2 diabetes. Together with candidate gene approaches, 18 common variants, including FTO and two independent signals in the CDKN2A/B region, have now been convincingly shown to associate with the disease (1–6,20–26). In this study, we aimed to assess the combined discriminatory power of these common, modest effect variants, using >4,900 individuals from the Genetics of Diabetes Audit and Research Tayside Study (GoDARTS).
RESEARCH DESIGN AND METHODS
SNP selection and genotyping.
We only included variants that have been convincingly shown to associate with type 2 diabetes. We used variants reviewed by Frayling (27) and those described by Zeggini et al. (5,6), except for the E23K (rs5219; r2 with GWA-SNP rs5215 = 0.89) variant of KCNJ11 (22) and rs7903146 (r2 with GWA-SNP rs7901695 = 0.80) of TCF7L2 (23,28), where we genotyped a SNP shown to have stronger association with type 2 diabetes, but which were not genotyped on the genome-wide association chips; the TCF2 locus, where we used rs757210 (26), instead of rs4430796 (24) (r2 = 0.61); and the ADAM30/NOTCH2 locus, where we used rs2641348 in ADAM30 as a proxy for rs2934381 (r2 = 0.92).
Genotyping was performed by KBioscience (Hertsfordshire, U.K.), which designed and used assays based on either their proprietary competitive allele-specific PCR (KASPar) method or a modified TaqMan-based assay, details of which are available on the company website (www.kbioscience.co.uk/chemistry/index.htm). Genotyping quality control measures for the SNPs are as described previously (5,6,25).
GoDARTS study and participants.
The GoDARTS study is a substudy of the Diabetes Audit and Research Tayside Study (DARTS) (29), which aims to identify all known diabetes patients in the Tayside region of Scotland using electronic database retrieval. The samples used in this study are a subsample of the type 2 diabetes patients identified and have been described in detail previously (6). Briefly, the GoDARTS study includes individuals of white European descent, living in the Tayside region when recruited. The diagnosis of diabetes in case subjects was based on either current treatment with diabetes-specific medication or laboratory evidence of hyperglycemia if treated with diet alone. Patients with confirmed diagnosis of monogenic diabetes and those treated with regular insulin therapy within 1 year of diagnosis were excluded. Case subjects in this study had an age at diagnosis between 35 and 70 years, inclusive. Control subjects had not been diagnosed with diabetes at the time of recruitment or subsequently and were excluded if there was evidence of hyperglycemia during recruitment (fasting glucose >7.0 mmol/l, A1C >6.4%) or if they were >80 years old. The study was approved by the Tayside Medical Ethics Committee. Informed consent was obtained from all study participants. Table 1 presents the clinical characteristics of subjects used in this study.
Statistical analysis.
All statistical analyses were performed in StataSE v10.0 for Windows (StataCorp, Brownsville, TX). We used logistic regression for all individual SNP analyses. To test for deviation from a within-loci additive model, we performed likelihood ratio test of an additive model against a general 2 degrees of freedom model. To test for gene-gene interaction across all pairs of loci, we used likelihood ratio tests to compare an additive model to a model with an interaction term. We combined information from multiple SNPs by using an allele count model, where we summed the number of risk alleles carried by each individual. This assumes that each of the alleles has an equal and additive effect on type 2 diabetes risk.
We used logistic regression on the general model (i.e., individual SNP genotypes as indicator variables) to construct the receiver-operating characteristic (ROC) curves and calculate the areas under the curve (AUCs). We also performed these ROC analyses on the allele count model for comparison with the general model.
RESULTS
Genotyping data on all of the variants were available for 2,309 type 2 diabetic case subjects and 2,598 control subjects. Characteristics of these participants are shown in Table 1. Supplementary Table 1, available in an online appendix at http://dx.doi.org/10.2337/db08-0504, presents a comparison of clinical characteristics for these subjects against the 1,739 who were not successfully genotyped across all SNPs. Individually, the variants have similar effect sizes in this study compared with those reported in other large studies (Table 2) (1–6,20–26), and the range of ORs from 1.00 to 1.36 most likely reflects stochastic variation. Several variants are not associated at P < 0.05 in the sample used here but are still included in the analyses because they are confirmed type 2 diabetes risk variants, and the lack of significance is the result of relatively low power in this number of subjects. Based on these and larger datasets, all of the variants appear to have an additive mode of inheritance (1–6,20–26). The CDKAL1 locus was reported by Steinthorsdottir et al. (4) to fit a recessive model, but other large studies do not support this. There is no evidence of interaction between any of the SNPs based on these data (supplementary Table 2) or on the larger analyses previously published. Therefore, we assumed an additive genetic model. We found no evidence of any interaction between the individual variants and BMI or age (lowest interaction P values = 0.14 and 0.02, respectively). We performed the analysis with and without the FTO variant, the one variant shown to predispose to type 2 diabetes through a primary effect on BMI (18).
Comparing extremes.
The proportion of case and control subjects grouped according to the number of risk alleles that they carry is shown in Fig. 1. The distribution of risk alleles follows a normal distribution in both case and control subjects, with a shift toward a higher number of risk alleles in the case subjects. There is an increase in ORs for type 2 diabetes with the increasing number of risk alleles against the baseline group of 1.8% of individuals carrying 10–12 risk alleles. Of individuals with ≥25 risk alleles, 1.2% have an OR of 4.2 (95% CI 2.11–8.56) against the baseline reference group. Similarly, 11.5% of this study population carrying ≥22 risk alleles had an OR of 2.3 (1.73–2.93) for type 2 diabetes compared with the 8.2% of individuals with ≤14 risk alleles.
Figure 2 plots the ORs relative to the median number of 18 risk alleles. Those with ≥25 risk alleles were more than twice as likely to have type 2 diabetes (OR 2.18 [95% CI 1.24–3.81]) compared with those with the median number of risk alleles. The TCF7L2 variant had a stronger effect than the other variants (OR 1.36 compared with 1.00–1.25 for the rest), so these results may be slight underestimates, because the additive model used for the allele counting assumes equal effects across all SNPs.
We performed the same analyses for two subgroups of the cohort, one including only obese individuals (with BMI of ≥30 kg/m2, n = 1,803), the other nonobese individuals (BMI <30 kg/m2, n = 3,083). The results were similar across these subgroups. For example, the 1.4% of obese individuals with >24 risk alleles had an OR of 5.5 (95% CI 2.11–14.36) compared with the 1.9% of obese individuals with <13 risk alleles. The corresponding OR for the nonobese subjects was 3.31 (1.34–8.16), for the 1.8 and 1.1% of individuals with <13 and >24 risk alleles, respectively.
ROC curve.
We evaluated the discriminatory power of a genetic test based on the 18 type 2 diabetes variants by calculating the area under the ROC curve. Using the general model (as opposed to the additive model, which assumes equal and additive effects), the ROC curve for the 18 type 2 diabetes variants studied here is 0.60 (Fig. 3). We performed the same analysis for the obese and nonobese subgroups of the cohort. The AUCs for the obese and nonobese groups were 0.58 and 0.60, respectively. A similar result was obtained when we removed the FTO variant (obese, 0.58; nonobese, 0.59). We also tested whether the risk variants would add to the discriminatory power of BMI, age, and sex alone (AUC 0.78 in our study). A model that includes BMI, age, sex, and the 18 variants has an AUC of 0.80 (Fig. 3); although marginal, the increase in the AUC was statistically significant (P = 2.88 × 10−12). The AUC remained virtually the same (AUC = 0.80) when the FTO variant was removed from the model.
The effect of BMI and age.
Supplementary Table 3 presents the individual SNP type 2 diabetes associations adjusted for BMI. As expected, the FTO association is weakened on adjusting for BMI (OR 1.00 [95% CI 0.92–1.10]), and the TCF7L2 association is strengthened (1.46 [1.32–1.61]). Testing the combined effect of the risk variants on clinical features of the type 2 diabetes patients, we found that the number of risk alleles was associated with an earlier age at diagnosis of 0.15 years per risk allele (95% CI −0.29 to −0.01, P = 0.038). We also observed an overall modifying effect on BMI (−0.14 BMI units per risk allele [−0.23 to −0.05], P = 3.41 × 10−3), but this finding is mainly explained by the known association of the TCF7L2 variant alone with BMI in type 2 diabetic case subjects (30,31). Here, each TCF7L2 risk allele was associated with a difference in BMI of −0.69 kg/m2 (−1.06 to −0.31, P = 3.18 × 10−4), whereas the combined effect of all other variants without TCF7L2 could just be detected (−0.10 kg/m2 per risk allele [−0.20 to 0.01], P = 0.036). The difference in BMI and age at diagnosis was more noticeable when we compared individuals with low and high numbers of risk alleles. For example, carriers of ≥23 risk alleles (11.8%) were, on average, diagnosed 4.2 years earlier (−6.45 to −1.87, P = 4.21 × 10−4) and had 1.60 kg/m2 lower BMI (−3.35 to 0.08, P = 0.062) than those carrying <15 (8.6%) risk alleles.
DISCUSSION
Recent success in identifying common variants predisposing to type 2 diabetes has led to suggestions that they may be useful in predicting an individual's risk of the disease. In this study, we evaluated the ability of 18 confirmed predisposing variants to discriminate between individuals with and without type 2 diabetes, using the GoDARTS study. The samples used in this study were not enriched for family history or low BMI, factors that may inflate effect sizes. Although the GoDARTS cohort was a part of the Wellcome Trust Case Control Consortium-Type 2 Diabetes GWA Study (5,6), it was only used as a stage 2 replication set for the follow-up of the initial hits. This means that there should be a minimal effect of the “winner's curse” (32), the upward bias of the effect size in the discovery samples compared with subsequent replication studies.
The combined information identifies individuals at different risks of disease.
By comparing individuals with the fewest type 2 diabetes risk alleles with those carrying the most risk alleles, combining genetic information allowed us to identify subgroups of the population at a distinctly differing risk of disease. For example, we were able to distinguish ∼1% of the population carrying >25 risk alleles that had more than four times increased risk of diabetes compared with the 2% with 10–12 risk alleles. The high-risk group also had over twice the odds for type 2 diabetes than those with the median number of risk alleles. These figures were similar in individuals who were obese and not obese, a major risk factor for type 2 diabetes and easily measurable. Obese individuals carrying large numbers of type 2 diabetes risk alleles may therefore be a particular group worth studying to test potential intervention strategies. This may be important given that the escalating rates of obesity and type 2 diabetes suggest that efforts aimed at the whole population are not effective and that intensive, but expensive, lifestyle interventions aimed at increasing exercise and improving diet can result in weight loss and a reduced risk of type 2 diabetes (33–36).
The current variants are not particularly discriminative but explain only a small amount of the heritability of type 2 diabetes.
Rather than focusing on individuals with “extreme” numbers of risk alleles, at a population level, the utility of genetic tests may be better classified by ROC curves. One of the most important factors in the validity of a genetic test in clinical practice is its ability to discriminate between individuals who will and will not develop the disease. A clinically relevant AUC threshold clearly depends on a whole range of factors (for example, the cost of the test and the availability of preventative measures), but as an example from current clinical practice, oxidized-LDL cholesterol has an AUC of ∼0.80 for coronary artery disease (37), making it a good discriminator between patients and healthy control subjects. The 18 type 2 diabetes variants had an inadequate discriminatory ability with an AUC of 0.60, a slight improvement on the AUC of 0.55 based on TCF7L2 alone. These data imply that genetic tests for type 2 diabetes (and many other complex diseases) that are offered by several commercial companies currently have limited predictive value. However, there are many more variants to be identified, because these 18 variants only explain a small amount of the heritability of type 2 diabetes: the sibling relative risk for type 2 diabetes is ∼3 (38), and the combination of these variants would only account for a sibling relative risk of ∼1.07. As more susceptibility variants are found for type 2 diabetes, genetic testing that uses the inexpensive and rapid genotyping technologies may eventually become more clinically useful.
The use of genetic information in addition to age, sex, and BMI.
For many complex diseases, there are already well-established risk factors that can be used to predict someone's chances of developing the disease. Incorporating genetic information may be justified on the basis that current preventative measures are expensive and that prevention at a population level is not effective, so the more selective we can be the better. In type 2 diabetes, family history, age, BMI, ethnicity, and lifestyle all contribute to an individual's risk of the disease. In our study, the AUC for BMI, age, and sex (we did not have family history data) combined was 0.78, a moderate diagnostic value. The genetic risk variants had a poor discriminatory ability alone (AUC = 0.60) and only marginally increased the discriminatory power of the test when combined with BMI, age, and sex (AUC = 0.80), suggesting that they add little to the already known predictive factors.
Risk variants modify clinical characteristics of individuals with type 2 diabetes.
Type 2 diabetes often occurs in individuals who are not overweight or obese, and can be diagnosed at a relatively young age. This may be because these individuals have a stronger genetic risk component than more “typical” type 2 diabetes patients. Therefore, we tested the extent to which patients with the stronger genetic predisposition tended to be leaner, and how much younger they were at diagnosis. There were notable differences between the 11.8 and 8.6% of the population carrying either high or low numbers of disease- predisposing alleles, respectively. Patients with high genetic risk had an average BMI of 30.3 kg/m2 compared with 31.9 kg/m2 in those with low genetic risk and were diagnosed at an average age of 55.2 years, compared with 59.3 years for patients with relatively low genetic risk. These results support an important role for genetic predisposition to type 2 diabetes in nonobese, young-onset case subjects.
Weighting variants and the optimal ROC curve.
The simple allele count model we used for some of our analyses of “extremes” assumes that each risk allele has the same effect size and that the effects are additive both within and between loci. Although we found no strong evidence for deviation from additivity, clearly some SNPs have stronger effects than others. This is most evident for TCF7L2, where the allelic OR is 1.37, significantly larger than any of the other variants. One way to overcome this is to weigh SNPs differently; however, we decided not to do this in this study for a number of reasons. First, all of our AUC analyses are based on a general model, in which the assumption of equal effects is not made. Second, as Janssens et al. (39) previously showed, when the ORs of the individual variants are relatively low (as here), there is little difference in the discriminative accuracy of the test based on the simple allele count model and a model that allows each variant to have a different effect size (the AUCs here are 0.583 and 0.603, respectively, although this was statistically significant [P = 0.001]). Third, it is unclear what the most appropriate weights to use would be. Fourth, an allele count model provides important advantages for simplicity and visualization of the results.
Recently, Lu and Elston (40) proposed using an optimal ROC analysis approach rather than the standard approach that we have used. Although the authors proved theoretically that their method is more powerful, the results presented by Lu and Elston (40) showed that the two methods produce the same results when there are few loci and no interactive effects. Because we still have only a relatively few loci, there is no evidence of any nonadditive effects within or between loci, and the ROC curve is concave (40), the two methods should produce the same results. We tested this using the 10 SNPs that were significant (at P < 0.05) in our study. Using these variants, the results were the same for both methods (AUC for the Lu and Elston method, 0.596; AUC for the standard method, 0.596).
Strengths and limitations of our study.
Our study was relatively large in terms of the number of samples, and the number of common variants used. We had >2,000 case subjects and >2,000 control subjects after excluding individuals who were not successfully genotyped for all of the variants included in the study. The 18 variants we used had all been convincingly shown in previous studies to associate with type 2 diabetes.
One of the main limitations of our study is that it was not prospective, and therefore, we are unable to truly determine the predictive power of these variants. Although the results of this study only apply to the Tayside population, it is likely, based on previous data (41–43), that our prediction estimates are reasonably accurate and that the effect sizes observed are likely to be representative of those in similar populations. A second limitation is that although the results are applicable to the Tayside and similar populations, they may not apply to populations of substantially different ethnic origin or those exposed to different social and environmental circumstances. A third limitation concerns the caveat that the majority of the type 2 diabetes–associated SNPs identified to date and used in this study are not the causal variants. This means that the predictive power of these susceptibility loci is likely to be an underestimate. Fine mapping and sequencing approaches are needed to identify the variants causal to these associations, which often have stronger effects than the currently identified variants. These follow-up studies may also reveal additional causal variants at these loci that cannot be detected by GWA methods because of, for example, low frequency, but that may have higher penetrance and therefore would be much more powerful predictors.
In conclusion, the combined information from the currently known susceptibility variants allows us to identify subgroups of the population at substantially increased odds of getting type 2 diabetes. These individuals could be targeted with more effective preventative measures. On a population level, these variants appear to be of limited use in discriminating between individuals who will and will not develop type 2 diabetes. As more variants are identified, tests with better predictive performance should become available and could eventually become a valuable addition to clinical practice.
Published ahead of print at http://diabetes.diabetesjournals.org on 30 June 2008.
Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See http://creativecommons.org/licenses/by-nc-nd/3.0/ for details.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Article Information
M.N.W. is a Vandervell Foundation Research Fellow. E.Z. is a Wellcome Trust Research Career Development Fellow. This work has received funding from Wellcome Trust.
We thank all study participants.