Diabetes, and particularly type 2 diabetes, is one of the most significant public health problems facing Western civilization. Every day, 4,100 people in the U.S. are diagnosed with diabetes. Of those with diabetes, each day, 230 have their legs amputated, 120 are newly placed on kidney dialysis, and 55 become blind. Overall, in the U.S., almost 7% of the entire population (∼21 million people) have diabetes, including ∼1 in 5 individuals over the age of 60 years. Diabetes is the sixth leading cause of death (and is likely underreported), and people with diabetes have twice the risk of death as someone of the same age without diabetes. The primary cause of death in individuals with diabetes is not from diabetes itself but from the complications of the disease. In all, the cost of diabetes to the U.S. economy is thought to be nearly 132 billion USD in lost productivity and in direct medical care costs (1). To predict those at risk of diabetes, design efficacious behavioral and clinical interventions, and identify critical targets for pharmacologic therapy, a better understanding of the pathways and mechanisms leading to diabetes is required. It is this need for a more complete understanding of etiology that leads to the fundamental importance of genetic studies.
There is extensive and consistent evidence that genetic factors play an important role in modifying an individual's risk for type 2 diabetes (2–6). A major research focus for several decades has been the identification of genes contributing to diabetes risk. Unlike the many forms of maturity-onset diabetes in the young (MODY) that are transmitted as a single gene defect and appear “Mendelian” in nature (7), typical type 2 diabetes is multifactorial in its transmission (8). Unlike type 1 diabetes, which is multifactorial but has major genetic contribution by genes in the human major histocompatibility complex (MHC) (including the HLA and other genes ), type 2 diabetes genetic risk has few genes of major effect (10). Thus, the search for genes contributing to risk of type 2 diabetes has been difficult, and the genes themselves have been elusive.
The methods for gene discovery for complex human diseases such as type 2 diabetes have been rapidly evolving. Previous research was focused on evaluation of families with multiple cases of diabetes to detect linkage to a gene in a hypothetical causal pathway or testing functional variants in candidate genes using a case-control approach (11,12). Historically, these studies were limited by both a small population size that reduced statistical power to detect all but major gene effects and low genomic coverage, where only a few variants within the candidate gene are tested. A problem of these earlier studies was that the positive findings were not often replicated, thereby generating confusion and concern for the application of genetic methods to diabetes risk assessment. With the advent of the International HapMap Project (13,14), the limitation of genomic coverage was effectively resolved. Reagents are now available to cover the human genome at a 5-kb resolution, and the structure of the individual candidate genes can now be characterized, making population size (and replication of novel findings) the primary requirement for gene discovery.
Recently, a series of genome-wide association scans for type 2 diabetes were published (15–25). These scans have taken the approach of an unbiased (agnostic) view of the genome related to type 2 diabetes genetic risk. Hundreds of thousands of single-nucleotide polymorphisms (SNPs) across the genome have been assayed in samples from populations of almost exclusively northern European ancestry, and novel genes (TCF7L2, SLC30A8, IDE-KIF11-HHEX, CDKAL1, CDKN2A-CDKN2B, IGF2BP2, FTO, etc. ) with uncertain function pertaining to risk of type 2 diabetes have been identified. Despite the increase in the number of genes from the few (PPARG, KCNJ11, CAPN10) to the many (now over a dozen), the contribution of these genes to both the overall and the genetic risk remains small (27). Hence, there are likely many more genes to be identified, any one of which could identify a key pathway involved in disease.
The article in this issue of Diabetes by Gaulton et al. (28) presents a variation on the candidate gene approach in the whole genome era. Using the framework of FUSION (Finland-U.S. Investigation of Type 2 Diabetes Genetics) that was applied to the genome-wide association scans, the investigators have characterized a battery of 222 candidate genes associated with type 2 diabetes risk. Unrelated individuals were abstracted from the FUSION families, and these 1,161 case subjects with type 2 diabetes and 1,174 normoglycemic control subjects were assayed for 3,531 SNPs in the candidate genes. The candidate genes were selected using a number of strategies, including use of bioinformatics and text/vocabulary processing, and the distribution of these genes across the human genome can be considered a candidate-wide association scan (CWAS). Using additional HapMap data, the investigators were able to increase the number of SNPs used in the analyses by imputing genotypes of ∼7,500 additional SNPs within/near the candidate genes, thereby capturing nearly all of the variants in the candidates. Using this CWAS approach, the FUSION team replicated associations between numerous genes with type 2 diabetes risk and identified two additional genes (RAPGEF1 and TP53) worthy of further study. The authors suggest that RAPGEF1 represents a strong candidate because of its role in insulin signaling. In addition, the RAPGEF1 pathway may be involved in regulation of proglucagon gene expression in intestinal endocrine L-cells (29), providing another mechanism for its effect on risk of type 2 diabetes. TP53, whereas primarily used as a target of breast cancer prognosis (30), is proposed here to be an indicator of apoptosis in the insulin-producing β-cells of the pancreas. This study demonstrates that not only are there likely additional genes to be discovered that affect an individual's risk of diabetes, but there are multiple approaches beyond genome-wide association scans alone that can be used for gene discovery.
There are both strengths and weaknesses associated with the contribution by Gaulton et al. (28). The strengths include the two-stage CWAS approach that uses novel text mining approaches to identify candidate genes and pathways that could be associated with risk of type 2 diabetes. In addition, the sample size of ∼1,000 case subjects and ∼1,000 control subjects, while not particularly large in the context of genome-wide coverage, has extensive genomic coverage of the candidates in a homogeneous population. Interestingly, this strength can also be viewed as a weakness. The sample only contains Finns, so there is little knowledge whether these same genes/pathways would be observed in other ethnic groups. In addition, the large battery of candidate genes and SNPs generates concern over multiple testing of associations and power associated with the study. This potential limitation can be addressed, in part, by both conduct of additional studies in different ethnic groups and by replication in other populations of the same ethnicity. There is also an issue of confounding by BMI so that it remains uncertain whether the genes identified are related to diabetes risk, or obesity risk, or after adjustment for BMI in the analytical models, any of the significant associations with diabetes becomes lost. While candidate gene studies and coding region analyses have the advantage of being “hypothesis-driven,” they are also “hypothesis-limited”; not all novel pathways and molecular mechanisms can be identified and interrogated. Finally, the question remains from ages past: can negative results from the CWAS be ignored?
Within the context of studies with modest population sizes, the thresholds for detecting significance in the literature are varied. In this study, the authors have used different thresholds for “biologically relevant candidate genes” (P < 0.10) versus other standard statistical corrections. Whereas this may protect against false-negative results, it may also include more SNPs within candidate genes for follow-up. Other aspects of the study that could lead to false-negative results include the overall power (low) for detection of associations, poor coverage of the candidate genes (SNP selection), gene-gene and gene-environment interaction (or correlation), or phenotype definition.
The genetic risk factors for both type 1 and type 2 diabetes are being identified and the etiologic pathways are being dissected. Currently, there are at least 10 genes that appear to influence risk of type 1 diabetes (31) and 18 of type 2 diabetes (32), yet much needs to be done to further understand how, in a pathophysiological sense, variation in each of the genes modifies risk, how we can predict who is at risk, and how we can intervene to reduce the risk to an individual. Three specific areas of future research are easily identified. First, and as noted in the current article, the pathway from gene to clinical outcome (diabetes) goes through protein products that are “intermediate”—quantitative phenotypes that are closer to the functional defect. Examination of these diabetes-related quantitative traits may provide important insight into the disease risk transition from normal glucose tolerance to type 2 diabetes. Second, resequencing of coding regions can offer an efficient way to focus the search for causal variants. It is estimated that coding regions make up ∼1% of the genome sequence, yet likely contain a much larger fraction of all causal variants. Studies that are limited to coding regions, however, will not identify regulatory variants that influence disease. Third, evolutionary conservation across relevant species can provide a means to identify functional sites in the human genome. Evolutionary analysis of sequence data suggests that altering sequence in regulatory regions may be as deleterious as altering sequence in coding regions. These future areas of research will need to be integrated with the ongoing epidemiologic studies that identify the important modifiable risk factors that interact with the host genotype. Using these current and future approaches to understanding the human genome, the potential to modify the current, almost inexorable, natural history of genetic risk, through quantitative trait (intermediate phenotype) abnormalities leading to clinical diabetes and its complications, is becoming more realistic. Understanding the pathophysiological mechanisms that these genes identify should hopefully lead to significant advances in the next decade. Thus, further identification of genes for diabetes and its complications will provide better understanding of etiopathogenesis and clear delineation of targets for intervention.