Known genetic susceptibility loci for type 2 diabetes (T2D) explain only a small proportion of heritable T2D risk. We hypothesize that DNA methylation patterns may contribute to variation in diabetes-related risk factors, and this epigenetic variation across the genome can contribute to the missing heritability in T2D and related metabolic traits. We conducted an epigenome-wide association study for fasting glucose, insulin, and homeostasis model assessment of insulin resistance (HOMA-IR) among 837 nondiabetic participants in the Genetics of Lipid Lowering Drugs and Diet Network study, divided into discovery (N = 544) and replication (N = 293) stages. Cytosine guanine dinucleotide (CpG) methylation at ∼470,000 CpG sites was assayed in CD4+ T cells using the Illumina Infinium HumanMethylation 450 Beadchip. We fit a mixed model with the methylation status of each CpG as the dependent variable, adjusting for age, sex, study site, and T-cell purity as fixed-effects and family structure as a random-effect. A Bonferroni corrected P value of 1.1 × 10−7 was considered significant in the discovery stage. Significant associations were tested in the replication stage using identical models. Methylation of a CpG site in ABCG1 on chromosome 21 was significantly associated with insulin (P = 1.83 × 10−7) and HOMA-IR (P = 1.60 × 10−9). Another site in the same gene was significant for HOMA-IR and of borderline significance for insulin (P = 1.29 × 10−7 and P = 3.36 × 10−6, respectively). Associations with the top two signals replicated for insulin and HOMA-IR (P = 5.75 × 10−3 and P = 3.35 × 10−2, respectively). Our findings suggest that methylation of a CpG site within ABCG1 is associated with fasting insulin and merits further evaluation as a novel disease risk marker.
Candidate gene and genome-wide association studies (GWAS) have identified a number of sequence variants that explain some of the interindividual variation in the susceptibility for type 2 diabetes (T2D) (1,2). However, a large component of heritable T2D risk remains poorly understood, with less than half of total genetic variation explained by known single nucleotide polymorphisms (SNPs), a problem known as missing heritability (3–6). T2D is a complex and heterogeneous disease, in which disease onset and development are dependent on interactions between different genetic and environmental factors (7). Epigenetic variation, whether intrinsic or altered by environmental exposure, contributes to variation in gene expression and risk for metabolic disease (8). Furthermore, Liu et al. (9) and others (10,11) have suggested that genetic and epigenetic modifications could interact biologically and that methylation analysis might uncover heritable genetic variants contributing to disease that are difficult to tease out of conventional GWAS results. Ultimately, T2D susceptibility in humans is likely to depend on both genetic and epigenetic mechanisms (8,12–14).
Studies of diabetes-related traits (e.g., fasting glucose and fasting insulin) in healthy individuals have been shown to be a fruitful approach for identifying T2D risk loci (15). Hence, for the current analysis, we examined epigenome-wide methylation associations (EWAS) for fasting glucose, insulin, and homeostasis model assessment of insulin resistance (HOMA-IR) in nondiabetic subjects from the Genetics of Lipid Lowering Drugs and Diet Network Study (GOLDN). We were also interested in the role of SNP variation in significant EWAS regions. Additionally, as previous studies have identified correlations between genetic variation and the quantitative trait of DNA methylation enriched at nearby loci (cis-meQTLs) (15), we integrated regional GWAS data with significant epigenetic findings for a uniquely comprehensive analysis of variation in metabolic traits.
Research Design and Methods
GOLDN, described in detail in prior publications (16–18), recruited families with at least two siblings from the participants of the National Heart, Lung, and Blood Institute Family Heart Study in Minneapolis, MN, and Salt Lake City, UT. The trial aimed to identify genetic factors that mediated response to lipid-raising (i.e., postprandial lipemia challenge) or lipid-lowering (fenofibrate therapy) among metabolically healthy individuals. Participants were asked to discontinue the use of lipid-lowering agents for at least 4 weeks, to fast for at least 8 h, and to abstain from alcohol for at least 24 h prior to study visits. The study protocol was approved by the Institutional Review Boards at the University of Minnesota, University of Utah, Tufts University/New England Medical Center, and the University of Alabama at Birmingham, and written informed consent was obtained from all participants. For the current study, we evaluated diabetes-related traits (glucose, insulin, and HOMA-IR) among 837 GOLDN participants without physician-diagnosed or -treated diabetes and available EWAS data. We used a split sample approach to set up internal replication in which N = 544 participants were included in the discovery stage and 293 were included in the replication stage.
Measurement of T2D-Related Traits
Briefly, blood samples were drawn after an overnight fast. Plasma glucose was determined by a hexokinase-mediated reaction on the Hitachi commercial kit (Roche Diagnostics). Plasma insulin was measured by a commercial kit using competitive radioimmunoassay (Linco Research, St. Charles, MO), with intralaboratory reliability of glucose and insulin measurements of 0.984 and 0.975, respectively. HOMA-IR was calculated as fasting glucose × fasting insulin/22.5 (19).
CD4+ T cells were isolated from frozen buffy coat samples using positive selection by antigen-specific magnetic beads (Invitrogen, Carlsbad, CA). DNA was isolated from the CD4+ T cells using DNeasy kits (Qiagen, Venlo, the Netherlands) (20). We used the Infinium Human Methylation 450 array (Illumina, San Diego, CA) to quantify genome-wide DNA methylation, as described in further detail in a manuscript from our group (20). Briefly, prior to the standard manufacturer protocol steps of amplification, hybridization, and imaging steps, we treated 500 ng of each DNA sample with sodium bisulfite (Zymo Research, Irvine, CA). We used Illumina GenomeStudio software to estimate β scores, defined as the proportion of total signal from the methylation-specific probe or color channel, and detection P values, defined as the probability that the total intensity for a given probe falls within the background signal intensity. During the quality-control (QC) stage, we removed any β scores with an associated detection P value >0.01 and samples with >1.5% missing data points across ∼470,000 autosomal cytosine guanine dinucleotides (CpGs). Additionally, we excluded any CpG probes for which >10% of samples failed to yield adequate intensity (20). We normalized the filtered β scores using the ComBat package for R software (21). We performed the normalization on random subsets of 10,000 CpGs per run, in which each array of 12 samples was used as a batch. We separately normalized probes from the Infinium I and II chemistries and subsequently adjusted the β scores for Infinium II probes using the equation derived from fitting a second-order polynomial to the observed methylation values across all pairs of probes located <50 base pairs (BPs) apart (within-chemistry correlations >0.99), where one probe was Infinium I, and one was Infinium II (20). Finally, we eliminated any CpGs in which the probe sequence mapped either to a location that did not match the annotation file or to more than one locus. We identified such markers by realigning all probes (with unconverted Cs) to the human reference genome (20). After QC, we had data for 461,281 CpGs. Principal components based on the β scores of all autosomal CpGs passing QC were generated using the prcomp function in R (v2.12.1).
Detailed descriptions of DNA extraction and purification procedures as well as the genotyping approach are available in prior publications from our group (17,18). Briefly, we used a hybrid data set of 2,543,887 SNPs, of which 484,029 were typed using the Affymetrix Genome-Wide Human 6.0 Array (Affymetrix, Santa Clara, CA), and the rest were imputed using MaCH software (version 1.0.16; Ann Arbor, MI) with Human Genome Build 36 as a reference. Prior to imputation, we excluded SNPs if they were monomorphic, had a call rate of <96%, exhibited Mendelian errors, had a minor allele frequency of <1%, or failed the Hardy-Weinberg equilibrium test at the P value threshold of <10−6.
Baseline characteristics, including fasting glucose, insulin, and HOMA-IR levels, were compared between the discovery and replication data sets using t tests and χ2 tests for continuous and categorical variables, respectively. In the discovery stage, we modeled associations between methylation β score at each CpG and log-transformed measure of insulin, glucose, and HOMA-IR using linear mixed models adjusted for age, sex, study site, and the first four principal components generated to capture T-cell impurity as fixed-effects and pedigree as a random-effect using the lmekin function of the kinship package in R (22). We used the Bonferroni correction to adjust for multiple comparisons, with the genome-wide significance level of 1.1 × 10−7 (18). In the replication stage, we evaluated markers that reached statistical significance in the discovery set for association with the trait of interest.
Integration of EWAS and Existing SNP Data
For CpG sites that showed a statistically significant association in the discovery and replication data sets, we extracted GWAS SNPs within 20 kb. Regional SNPs were then tested for association with the phenotypic trait of interest, adjusting for age, sex, and center as fixed-effects and family as a random-effect. Similarly to other GWAS published in the GOLDN data, we did not adjust for substructure (18) due to the homogeneity of the study population. Next, regional SNPs were associated with the methylation β score at the significant CpG sites (cis-meQTL) adjusting for age, sex, study site, and the first four principal components for cell purity. Regional plots were assembled to evaluate a potential role of underlying SNP variation in highlighted EWAS regions.
Demographic and Clinical Characteristics
Table 1 summarizes the population characteristics for all 837 participants considered in this study. Discovery and replication data sets did not differ significantly with regards to levels of fasting glucose, fasting insulin, or HOMA-IR (P > 0.05 for all variables). In the discovery sample, the mean age of participants was 48 years; 52.5% of the participants were female, and less than half were recruited at the Minnesota center. The replication sample was demographically comparable except that a larger proportion of participants was recruited from the Minnesota field center. All participants were of self-reported European ancestry.
Tables 2 and 3 show the top hits from the EWAS analyses for both insulin and HOMA-IR. Manhattan plots for the association of insulin and HOMA-IR with markers of methylation are shown in Figs. 1 and 2. In adjusted analyses, a CpG site in ABCG1 (CpG06500161) was significantly associated with insulin (P = 1.83 × 10−7). In addition, CpG06500161 was also significantly associated (P = 1.60 × 10−9) with HOMA-IR, while CpG1881899, also a CpG site in ABCG1, was marginally significant for that phenotype (P = 3.36 × 10−6). None of the CpG sites investigated was associated with fasting glucose after correction for multiple testing (data not shown).
We replicated findings from our discovery set in a replication data set (N = 293) in which we removed prevalent patients with diabetes. Table 4 shows the replication of the top insulin CpG signal (CpG06500161), as well as a second insulin CpG signal that was marginally significant in the discovery set (CpG01881899) in ABCG1 (P = 5.75 × 10−3 and P = 3.35 × 10−2, respectively). HOMA-IR replication analyses are also shown in Table 4 for both CpG signals on ABCG1 (CpG06500161, P = 1.3 × 10−3; and CpG01881899, P = 4.3 × 10−2).
Integration of Regional SNP and cis-meQTL Signals
Finally, we examined SNP-phenotype associations within 20 kb from CpG06500161 in ABCG1. We identified 37 regional SNPs in the region on chromosome 21 (i.e., 3638855 to 43676586, HG19 build), of which 14 were located 20 kb downstream of CpG06500161 and 23 located 20 kb upstream. The level of significance for cis-meQTLs ranged from P = 2.26 × 10−4 to P = 0.84. Level of significance for the same SNPs associated with insulin ranged from P = 1 × 10−2 to P = 0.05. Fig. 3 demonstrates the regional overlap of the EWAS, cis-meQTL, and SNP association signals. Similarly, the same SNPs were associated with HOMA-IR with P = 1.3 × 10−3 to P = 0.95. The main CpG signal of interest was also plotted on the ABCG1 genomic region using ENCODE to visualize proximity of CpG06500161 to a CpG island, shore, or shelf (Fig. 4).
Our findings suggest that differential methylation of a CpG site within ABCG1 is associated with fasting insulin and HOMA-IR in GOLDN. Additionally, our results provide evidence in support of cross talk between genetic and epigenetic modifications in complex traits (6–8). Specifically, our data show that underlying SNP variation correlates with methylation of a CpG site in ABCG1 and is marginally associated with the metabolic traits of interest. This study highlights a novel epigenetic finding for diabetes-related traits and demonstrates an example of SNP variation tagging important functional variation.
The protein encoded by ABCG1 is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. More specifically, ABCG1 is involved in macrophage cholesterol and phospholipid transport and may regulate cellular lipid homeostasis in other cell types. Our results contribute to the body of evidence in support of the role of lipid metabolism, specifically of ABC transporters, in insulin resistance. In mice, Sturek et al. (23) have shown that ABCG1 regulates subcellular cholesterol distribution in β-cells, suggesting that islet ABCG1 expression may play a role in T2D pathogenesis and treatment. Others have also shown that ABCG1 expression and cholesterol efflux are reduced in patients with T2D and that this impaired ABCG1-mediated cholesterol efflux significantly correlates with increased intracellular cholesterol accumulation (24). ABCG1 has also been implicated in modulating islet inflammatory responses, indicating possible pleiotropic effects of ABC transporters in T2D etiology (25). Additionally, decreased function of ABC transporters has been associated with impaired insulin secretion (26). Therefore, our research adds considerable support to evidence linking ABCG1 to T2D and related traits and suggests that methylation-driven gene expression changes may mediate some of the observed effects.
Few genetics studies have investigated a role for ABCG1 in T2D, most with null findings. For instance, Schou et al. (27) showed that coding SNPs in ABCA1 and ABCG1 were not associated with T2D in the Copenhagen General Population Study of nearly 40,000 individuals. Recent GWAS have also reported negative findings with regard to ABCG1 and risk for T2D (15,28,29). However, previous analyses did not consider epigenetic modifications. Importantly, our findings suggest that not only is methylation variation in ABCG1 associated with both insulin and HOMA-IR, but also that underlying SNP variation may mediate some of the epigenetic effects. SNPs in the regions were marginally associated with our traits of interest, but not at a significance threshold that would have been picked up in GWAS, confirming the importance of expanding research in the field of T2D to epigenetics.
The results of this study must be interpreted in light of several limitations. First, CD4+ T cells were harvested from stored lymphocytes in which methylation could have been altered due to the freezing and thawing process. Further, this may have resulted in some contamination of CD4+ T cells with other cell types (e.g., CD15+ neutrophils and CD14+ monocytes). Still, CD4 T cells are a reasonable choice for global determination of CpG island methylation (30). CD4+ T cells were selected for three reasons. First, DNA methylation patterns are often tissue-specific (31). For instance, studies of whole-blood samples reflect methylation variations within each blood cell type that may act to confound genomic association results. Second, there is evidence that key genes involved in glucose metabolism are expressed in lymphocytes (e.g., GLUT1) (32). Therefore, we hypothesized this cell type should reflect underlying epigenetic variation influencing our key diabetes-related phenotypes, while minimizing potential confounding (33). Third, blood collection is the most viable tissue collection method among healthy individuals.
Finally, there were some differences in the discovery and replication populations within GOLDN due to the order in which samples were selected for processing. The split-sample design, however, did allow for increased design sensitivity, as well as decreased potential for confounding. Despite these limitations, however, one major strength of this study includes genome-wide epigenetic testing with a dense, genome-wide panel of CpG markers on a large sample of healthy adults and an internal replication data set.
In summary, we have identified two plausible methylation signals for T2D-related factors: insulin and HOMA-IR. The results of this integrated study implicate DNA methylation as an intermediary in T2D pathogenesis. In sum, this information contributes to the understanding of epigenetic and genetic mechanisms governing glycemic traits and ultimately the risk of T2D.
Acknowledgments. The authors thank Steve Class for help with the figures.
Funding. This work was funded by National Heart, Lung, and Blood Institute grant U01-HL072524-04. B.H. was also funded by the National Heart, Lung, and Blood Institute, University of Alabama at Birmingham Statistical Genetics Postdoctoral Training Program, Grant 5T32-HL072757.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. B.H. wrote the manuscript and researched and analyzed the data. M.R.I., S.A., H.K.T., and D.K.A. contributed to discussion and reviewed and edited the manuscript. J.S. assisted with data analysis. D.Z. contributed to discussion, reviewed and edited the manuscript, and assisted with data analysis. D.A., E.K.K., and J.M.O. reviewed and edited the manuscript. B.H. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. This study was presented in poster form at the 73rd Scientific Sessions of the American Diabetes Association, Chicago, IL, 21–25 June 2013.