Investigators of previous cross-sectional epigenome-wide association studies (EWAS) in adults have reported hundreds of 5′-cytosine-phosphate-guanine-3′ (CpG) sites associated with type 2 diabetes mellitus (T2DM) and glycemic traits. However, the results from EWAS have been inconsistent, and longitudinal observations of these associations are scarce. Furthermore, few studies have investigated whether DNA methylation (DNAm) could be modified by smoking, drinking, and glycemic traits, which have broad impacts on genome-wide DNAm and result in altering the risk of T2DM. Twin studies provide a valuable tool for epigenetic studies, as twins are naturally matched for genetic information. In this study, we conducted a systematic literature search in PubMed and Embase for EWAS, and 214, 33, and 117 candidate CpG sites were selected for T2DM, HbA1c, and fasting blood glucose (FBG). Based on 1,070 twins from the Chinese National Twin Registry, 67, 17, and 16 CpG sites from previous studies were validated for T2DM, HbA1c, and FBG. Longitudinal review and blood sampling for phenotypic information and DNAm were conducted twice in 2013 and 2018 for 308 twins. A cross-lagged analysis was performed to examine the temporal relationship between DNAm and T2DM or glycemic traits in the longitudinal data. A total of 11 significant paths from T2DM to subsequent DNAm and 15 paths from DNAm to subsequent T2DM were detected, suggesting both directions of associations. For glycemic traits, we detected 17 cross-lagged associations from baseline glycemic traits to subsequent DNAm, and none were from the other cross-lagged direction, indicating that CpG sites may be the consequences, not the causes, of glycemic traits. Finally, a longitudinal mediation analysis was performed to explore the mediation effects of DNAm on the associations of smoking, drinking, and glycemic traits with T2DM. No significant mediations of DNAm in the associations linking smoking and drinking with T2DM were found. In contrast, our study suggested a potential role of DNAm of cg19693031, cg00574958, and cg04816311 in mediating the effect of altered glycemic traits on T2DM.
Introduction
Type 2 diabetes mellitus (T2DM) is a global public health burden and a leading cause of disability and mortality (1). In 2019, approximately 463 million people worldwide were affected by diabetes, and the global prevalence is expected to increase by 50% by 2045 (2). High fasting plasma glucose was estimated to be the third and fifth leading global level 2 risk factor for attributable deaths in females and males (3.09 million and 3.14 million), respectively (3). Risk factors for T2DM include environmental factors such as unhealthy lifestyles, physical inactivity, and genetic contributors that may interact with each other. However, only 10% of the total heritability of T2DM can be attributed to genetic variants (4), suggesting that gene-environment interactions, including epigenetic regulation, may reveal the pathogenesis of T2DM (5). DNA methylation (DNAm), as a heritable epigenetic regulation, occurs almost exclusively at 5′-cytosine-phosphate-guanine-3′ (CpG) in DNA (6,7). To date, the association of DNAm with T2DM and glycemic traits has been analyzed in multiple studies, and hundreds of associated CpG sites have been reported (8–11). However, the related CpG sites from previous studies have mainly been inconsistent. Inadequate adjustment for potential confounders, including genetic background, might lead to variability between studies (12,13). Monozygotic (MZ) twins share identical age, sex, genetic information, and intrauterine environment (14). Twin studies, which allow the analysis the DNAm variation under the premise of controlling genetic and early environmental factors, are of great value for revealing disease-related epigenetic changes (15).
Another hypothesis for inconsistent findings among DNAm and T2DM is that temporality remains uncertain due to the plasticity of DNAm in response to environmental changes (16). Previous Mendelian randomization and methylation quantitative trait loci (mQTL) analysis suggested that the methylation of several CpG sites is likely to be the cause of T2DM (17,18). On the other hand, high blood glucose has been reported to cause abnormal DNAm, which may be involved in the early stage(s) of diabetes (19). The role of DNAm in the development of T2DM and glucose metabolism has been of increasing interest; however, to our knowledge, few literatures have explored the temporal relationship of DNAm with T2DM and glycemic traits and the mediating effects of DNAm on the progression of T2DM (20). Furthermore, lifestyle factors such as smoking and drinking have also been demonstrated to induce broad methylation changes (21,22). Whether DNAm could be regulated through modification of smoking/drinking status and glucose changes to affect the development of T2DM ultimately remains to be investigated in the longitudinal design. The longitudinal study will help provide valuable evidence for establishing causality of DNAm and T2DM/glycemic traits and an understanding of the etiology of T2DM.
The objective of the current study was to 1) validate T2DM- and glycemic trait–associated CpG sites in the twin population based on CpG sites that have been previously reported, 2) examine the temporal relationship of DNAm with T2DM and glycemic traits, and 3) explore the mediation path from lifestyle factors to DNAm and T2DM or glycemic traits in a longitudinal setting.
Research Design and Methods
Study Population
Data for this study came from Chinese National Twin Registry (CNTR), established in 2001. The study design, population characteristics, and data collection procedures have previously been described (23). After two rounds of large-scale registration and follow-up, CNTR recruited 61,566 twin pairs from 11 provinces/cities in China. The information retrieved from CNTR in this study included that from questionnaires (demographic, lifestyle factors, and medical history), blood sample testing (DNAm and biochemical tests), and physical examinations (blood pressure, height, and weight). The study was approved by the Biomedical Ethics Committee at Peking University. All participants provided written informed consent (reference nos. IRB00001052-13022, IRB00001052-14021).
Eligibility criteria for participants were as follows: 1) questionnaire investigations and physical examinations were complete for both twins, 2) twins had both donated blood samples, and 3) twins completed at least one detailed survey in 2013 or 2018. Pregnant twins and their cotwins were excluded. If one twin was excluded in the following DNAm measurement process or statistical analysis, the cotwin was then removed to ensure that the twins were in pairs. A total of 1,088 participants were enrolled in this study, with mean age 49.9 years (range 19–82), 318 of whom had longitudinal data (available in both surveys).
Definition and Measurement of T2DM, Glycemic Traits, and Covariates
T2DM in this study was defined as diabetes diagnosed after 30 years of age, according to American Diabetes Association criteria (24). After participants provided informed consent, peripheral blood (15 mL) was collected by trained investigators, and qualified testing companies detected blood biochemical traits. The glycemic traits of this study consisted of HbA1c and fasting blood glucose (FBG). HbA1c level in the whole blood sample was determined with high-performance liquid chromatography. FBG level was measured with a hexokinase method. Twin zygosity was determined using 59 single nucleotide polymorphism (SNP) sites on Illumina Infinium Methylation 450K or EPIC BeadChips. With this approach the zygosities of twin pairs are defined by calculating the correlation of SNP sites within twin pairs, and coefficients <0.8 were considered to indicate dizygotic twins (25). A total of 776 twins were determined to be MZ in our studies. BMI was calculated as weight in kilograms divided by the square of height in meters. Smoking status was grouped into current, former, and never smokers (26). Alcohol consumption was categorized as current, former, and never drinkers (27).
DNAm Measurements
DNA samples were extracted from peripheral blood with a BioTeke Whole Blood DNA Extraction Kit and subjected to bisulfite conversion using the EZ DNA Methylation Kit (Zymo Research, Orange, CA). Whole-genome DNAm was assessed on Illumina Infinium Methylation 450K or EPIC BeadChip arrays (Illumina, San Diego, CA). The assay reproducibility of the two BeadChips for the samples was 98%, and >90% of probes in 450K were replicated in EPIC BeadChip (28). Illumina EPIC and 450k arrays were combined as a data set in the study with the “combined arrays” function from the R package minfi through selecting the intersection of probes on both BeadChips and processed together for the subsequent data preprocessing and analysis.
Quality control for DNAm data was conducted to filter out unqualified probes and samples: 1) type I and type II probes that detected no significant differences between the CpG site signals and the blank control (P > 0.01), 2) probes with minor allele frequency >0.05 in the 1000 Genomes Project for Asian populations or probes that had SNPs overlapping the CpG site, 3) cross-reactive probes, 4) probes having 1% of samples with a detection P value >0.05, and 5) samples having ≥1% of CpG sites with a detection P value >0.05 were removed. After ineligible probes and samples and their cotwins were eliminated, 378,654 CpG sites and 1,070 twin samples (374,786 CpG and 308 twin samples for longitudinal information) finally passed the quality control.
The DNAm intensity signal values for each locus were extracted with the minfi package. Average β-values were used to describe the DNAm level of each CpG site, calculated with the following formula: β = M / (M + U), where M and U indicate the mean methylated and unmethylated signal intensities of each CpG site, respectively. Next, β-values were normalized and adjusted for cell counts and batch effects in the following steps: 1) quantile normalization was performed with minfi to normalize the methylation β-values at each CpG site across individuals and to reduce the influence from different probe types and background noise, 2) adjustments for cell counts were processed with the R package ChAMP to reduce the bias caused by blood cell counts, 3) correction for batch effects was conducted with surrogate variable analysis (ComBat approach was used for longitudinal data due to the relatively small sample size) in the sva package to minimize the impact of confounders generated in the DNAm detection process, including heterogeneity in assay plates, experimental batches, and other experimental factors. A total of 18, 7, and 11 surrogate variables were generated for T2DM, HbA1c, and FBG and used as covariables in the association study.
Quality controls, normalization, and cell count adjustment were conducted for all samples and longitudinal data.
Statistical Analysis
All statistical analyses were performed with R software, version 4.0.3. The significance level was set at a P value <0.05. The statistical analysis was performed in three steps (Fig. 1).
Step 1: Identification of Candidate CpG Sites for T2DM and Glycemic Traits
Candidate CpG sites were selected primarily from prior epigenome-wide association studies (EWAS) for T2DM, HbA1c, and FBG. We searched PubMed and Embase for relevant studies up to 7 May 2022. The search strategy, including detailed inclusion and exclusion criteria, is provided in Supplementary Material.
Linear mixed-effects models were used to assess the associations of DNAm levels of the candidate CpG sites with T2DM and glycemic traits, where DNAm level was the dependent variable. The primary independent variable was T2DM, entered as a binary independent variable, and HbA1c and FBG levels were analyzed continuously. Twin identifier, shared by the two twin pair members, was entered as a random effect in all models. Age, sex, smoking status, alcohol consumption, BMI, and surrogate variables were covariates as fixed effects. The choice of the covariates was based on a priori knowledge and literature (29).
To further control for confounding genetic factors, we also performed discordant twin (i.e., within-pair) analyses in MZ twins. For T2DM, a paired t test was used to find the methylation difference in MZ twins who were discordant for the disease. For HbA1c and FBG, we conducted linear regression models to examine the associations between the difference in HbA1c/FBG and the difference in DNAm levels between MZ twin pairs. We calculated differences within twin pairs by adopting birth order: traits of the firstborn twin − traits of their cotwin). Since age and sex are shared in MZ twins, and environmental factors are the primary sources underlying the discordance, only batch effects were adjusted for in this analysis (ComBat approach).
The association study was conducted on all 1,070 twin participants included in the study. The discordant twin study for T2DM and HbA1c/FBG was based on 68 T2DM discordant MZ twin pairs and 758 total MZ twins, respectively. If the subjects participated in both investigations in 2013 and 2018, we used their data from 2018. P values of the association studies were corrected with a false discovery rate (FDR) adjustment (Benjamin-Hochberg) for multiple testing. CpG sites with the direction of the β-values consistent with the previous study and FDR <0.05 were considered as significant replication and kept for further analysis.
Step 2: Cross-Lagged Model Analysis
In the cross-lagged model analysis we investigated the temporal relationship between DNAm and phenotypes. The cross-lagged study is a sort of path analysis. By adopting the structural equation modeling approach, it assesses reciprocal, longitudinal relationships among variables. Details are provided in Fig. 2.
The significance of path ρ1 indicates whether baseline phenotypes predict within-person changes in DNAm levels at the follow-up time point; ρ2 indicates the prediction from baseline DNAm to phenotypes at follow-up. In cross-lagged models, prospective relationships between variables are assumed to be stable over time, and the variables at each time point were measured synchronously. These assumptions were modeled through imposing constraining paths (stability effects rX and rY and synchronous correlations rX1Y1 and rX2Y2). Before cross-lagged analysis, DNAm data were standardized with z transformation (mean = 0, SD = 1). Baseline age, sex, smoking status, alcohol consumption, and BMI were adjusted for as covariates in the cross-lagged models.
All parameters and statistics of the cross-lagged models were carried out with the lavaan package in R, and the correlations within twin pairs were processed with the “cluster” option in the lavaan package. The validity of model fitting was estimated with the comparative fit index (CFI) and standard root mean square residual (SRMR): a CFI >0.90 and an SRMR <0.08 suggested that the model adequately fit (30,31). We adjusted P values using the FDR, and the cutoff of FDR was set at 0.05.
Step 3: Longitudinal Mediation Analysis
We further studied the mediating roles of DNAm in the associations of lifestyle factors and glycemic traits with T2DM. The mediation analysis was conducted with use of the half-longitudinal design (32), a variant of the cross-lagged model. Details of the parameters and path settings of the half-longitudinal design are shown in Fig. 3.
The effect of baseline lifestyle factors and glycemic traits on subsequent DNAm levels is multiplied by effect of baseline DNAm levels on subsequent T2DM to yield the indirect effect (a * b) of mediation analysis (32). The total effect was calculated as the sum of the indirect effect (a * b) and direct effect (c′). According to the assumptions of mediation analysis, we considered the mediation significant if the total indirect effect (a * b) and total effect were statistically significant (33). The extent of mediation was estimated as the proportion of indirect effect to the total effect.
The covariates, implementation method, and validity of model fitting were the same as those in the cross-lagged model analysis described above. The cross-lagged analysis and longitudinal mediation analysis were both implemented with the R package sem and performed on longitudinal data from 308 participants. CpG sites associated with T2DM or glycemic traits were included in the two longitudinal analyses.
Ethics
The study protocol was approved by the Biomedical Ethics Committee at Peking University (approval nos. IRB00001052-13022, IRB00001052-14021). Written informed consents were provided by all participants at the time of investigations. All methods were carried out in accord with all relevant guidelines and regulations.
Data and Resource Availability
Data Sharing
The data sets generated during or analyzed during the current study are not publicly available due to the informed consent involved in this study stating that the data of the participants could not be disclosed to a third party. But the data verification and cooperative data analysis are available on reasonable request from the corresponding author.
Resource Sharing
No applicable resources were generated or analyzed during the current study.
Results
Characteristics of the Study Population
The demographics and characteristics of the study participants are shown in Table 1. Among the 1,070 participants analyzed in the CpG site association study, there were 758 MZ twins and 339 females. Of 308 participants who participated in longitudinal analysis, 186 were MZ twins and 121 were female. A total of 144 participants (13.5%) met the case definition of T2DM. There were 36 (11.7%) and 54 (17.5%) T2DM cases at baseline and 5-year follow-up for the longitudinal analysis, respectively.
. | Total (N = 1,070) . | Participants with longitudinal data (N = 308) . | |
---|---|---|---|
Baseline . | Follow-up . | ||
Age, years | 49.9 ± 12.1 | 50.2 ± 10.2 | 54.9 ± 10.2 |
Female, n (%) | 339 (31.7) | 121 (39.3) | |
MZ, n (%) | 758 (70.8) | 186 (60.4) | |
T2DM, n (%) | 144 (13.5) | 36 (11.7) | 54 (17.5) |
HbA1c (mmol/mol) | 42.1 ± 14.3 | 41.8 ± 13.2 | 45.3 ± 16.7 |
FBG (mmol/L) | 6.1 ± 2.5 | 6.0 ± 2.1 | 6.5 ± 2.9 |
Smoking status, n (%) | |||
Current smoker | 352 (32.9) | 95 (30.8) | 87 (28.2) |
Former smoker | 139 (13.0) | 25 (8.1) | 48 (15.6) |
Nonsmoker | 579 (54.1) | 188 (61.0) | 173 (56.2) |
Alcohol consumption, n (%) | |||
Current drinker | 435 (42.2) | 156 (50.6) | 85 (27.6) |
Former drinker | 73 (7.1) | 10 (3.2) | 92 (29.9) |
Nondrinker | 522 (50.7) | 142 (46.1) | 131 (42.5) |
BMI, kg/m2 | 24.8 ± 3.9 | 24.2 ± 3.6 | 24.3 ± 3.5 |
. | Total (N = 1,070) . | Participants with longitudinal data (N = 308) . | |
---|---|---|---|
Baseline . | Follow-up . | ||
Age, years | 49.9 ± 12.1 | 50.2 ± 10.2 | 54.9 ± 10.2 |
Female, n (%) | 339 (31.7) | 121 (39.3) | |
MZ, n (%) | 758 (70.8) | 186 (60.4) | |
T2DM, n (%) | 144 (13.5) | 36 (11.7) | 54 (17.5) |
HbA1c (mmol/mol) | 42.1 ± 14.3 | 41.8 ± 13.2 | 45.3 ± 16.7 |
FBG (mmol/L) | 6.1 ± 2.5 | 6.0 ± 2.1 | 6.5 ± 2.9 |
Smoking status, n (%) | |||
Current smoker | 352 (32.9) | 95 (30.8) | 87 (28.2) |
Former smoker | 139 (13.0) | 25 (8.1) | 48 (15.6) |
Nonsmoker | 579 (54.1) | 188 (61.0) | 173 (56.2) |
Alcohol consumption, n (%) | |||
Current drinker | 435 (42.2) | 156 (50.6) | 85 (27.6) |
Former drinker | 73 (7.1) | 10 (3.2) | 92 (29.9) |
Nondrinker | 522 (50.7) | 142 (46.1) | 131 (42.5) |
BMI, kg/m2 | 24.8 ± 3.9 | 24.2 ± 3.6 | 24.3 ± 3.5 |
Identification of Candidate CpG Sites Associated With T2DM and Glycemic Traits
The literature search process for candidate CpG sites is shown in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 flowchart in Figure 4. A total of 16 studies including provision of candidate sites associated with T2DM and glycemic traits were retrieved from the literature search after inclusion and exclusion criteria were applied, as listed in Supplementary Table 1. In all 16 studies investigators used an Illumina Methylation BeadChip to perform DNAm analysis. The included studies were conducted in different ethnic groups: Europeans, Indian Asians, Africans, East Asians, etc.
The numbers of candidate CpG sites are presented in Figure 4. Of them, 74 were validated to be significantly associated with T2DM, 18 with HbA1c, and 17 with FBG. Among these associations, 67, 17, and 16 CpG sites revealed the same directions of the effect for T2DM, HbA1c, and FBG as in previous studies and were retained in subsequent analysis (Table 2 and Supplementary Table 2). In the discordant twin study, 14 and 12 CpG sites with PFDR <0.05 and the consistent directions as previous studies were validated in MZ twins for HbA1c and FBG, respectively, including 6 CpG sites that were newly identified in discordant MZ analysis. For T2DM, although 14 CpG sites showed significance with P < 0.05 and consistent directions, none of the t tests were significant after correction for multiple comparisons (Supplementary Tables 3 and 4).
Probe ID . | Chr. . | Position . | Gene . | Direction* . | β . | P . | Direction# . |
---|---|---|---|---|---|---|---|
cg19693031 | 1 | 145,441,552 | TXNIP | − | −3.56E−02 | 1.57E−16 | − |
cg04816311 | 7 | 1,066,650 | C7orf50 | + | 2.01E−02 | 7.88E−13 | + |
cg06500161 | 21 | 43,656,587 | ABCG1 | + | 1.33E−02 | 2.59E−09 | + |
cg08309687 | 21 | 35,320,596 | Unannotated | − | −1.93E−02 | 8.56E−09 | − |
cg05778424 | 17 | 55,169,508 | AKAP1 | + | 1.24E−02 | 2.97E−08 | + |
cg19750657 | 13 | 38,935,967 | UFM1 | + | 1.39E−02 | 1.73E−07 | + |
cg01676795 | 7 | 75,586,348 | POR | + | 1.48E−02 | 6.63E−07 | + |
cg14597545 | 15 | 73,074,210 | ADPGK | + | 1.17E−02 | 2.00E−06 | + |
cg11024682 | 17 | 17,730,094 | SREBF1 | + | 1.02E−02 | 2.00E−06 | + |
cg26546155 | 16 | 57,269,635 | RSPRY1 | + | 1.12E−02 | 2.04E−06 | + |
cg16809457 | 6 | 90,399,677 | MDN1 | + | 1.42E−02 | 1.03E−05 | + |
cg09249494 | 6 | 6,588,846 | LY86, LOC285780 | + | 9.47E−03 | 1.03E−05 | + |
cg13640297 | 6 | 170,099,238 | WDR27 | + | 9.78E−03 | 3.15E−05 | + |
cg03078690 | 6 | 33,235,504 | VPS52 | + | 9.48E−03 | 3.15E−05 | + |
cg08788930 | 8 | 142,201,685 | DENND3 | + | 1.08E−02 | 3.59E−05 | + |
cg26804423 | 7 | 8,201,134 | − | + | 9.20E−03 | 3.82E−04 | + |
cg00277397 | 7 | 71,800,412 | CALN1 | − | −1.13E−02 | 3.82E−04 | − |
cg03497652 | 16 | 4,751,569 | ANKS3 | + | 9.14E−03 | 3.89E−04 | + |
cg00076653 | 4 | 15,341,878 | C1QTNF7 | + | 1.16E−02 | 5.69E−04 | + |
cg17066943 | 6 | 34,192,320 | Unannotated | − | −6.13E−03 | 6.03E−04 | − |
cg03699074 | 16 | 88,849,875 | FAM38A | − | −6.66E−03 | 6.56E−04 | − |
cg06235429 | 11 | 67,373,114 | NDUFV1 | + | 7.70E−03 | 7.29E−04 | + |
cg26371731 | 1 | 156,390,240 | MIR9–1, C1orf61 | + | 1.35E−02 | 8.12E−04 | − |
cg17666418 | 6 | 106,612,002 | Unannotated | − | −5.68E−03 | 1.03E−03 | − |
cg23939642 | 17 | 79,225,757 | SLC38A10 | + | 1.05E−02 | 1.03E−03 | + |
cg11183227 | 15 | 91,455,407 | MAN2A2 | + | 8.60E−03 | 1.03E−03 | + |
cg22540135 | 19 | 51,898,904 | Unannotated | − | −4.23E−03 | 1.03E−03 | − |
cg22627753 | 1 | 988,623 | AGRN | − | −9.09E−03 | 1.51E−03 | − |
cg22909677 | 6 | 109,172,312 | ARMC2 | + | 7.79E−03 | 1.72E−03 | + |
cg00055726 | 9 | 136,445,425 | FAM163B | + | 8.19E−03 | 1.85E−03 | + |
cg01973676 | 7 | 101,596,404 | CUX1 | − | −6.30E−03 | 2.12E−03 | − |
cg04973995 | 10 | 74,057,977 | Unannotated | − | −6.48E−03 | 2.30E−03 | − |
cg06007201 | 16 | 88,850,218 | FAM38A | − | −4.31E−03 | 2.30E−03 | − |
cg07960624 | 8 | 119,208,486 | SAMD12 | − | −1.02E−02 | 2.30E−03 | − |
cg00574958 | 11 | 68,607,622 | CPT1A | − | −4.03E−03 | 2.35E−03 | − |
cg17315426 | 14 | 23,527,629 | CDH24 | − | −4.42E−03 | 2.60E−03 | − |
cg04344749 | 1 | 25,871,814 | LDLRAP1 | + | 6.17E−03 | 2.60E−03 | + |
cg03539717 | 19 | 13,065,086 | GADD45GIP1 | + | 5.57E−03 | 2.60E−03 | + |
cg03318904 | 22 | 39,801,522 | MAP3K7IP1 | + | 9.19E−03 | 4.23E−03 | + |
cg13520715 | 21 | 43,135,854 | C21orf129, NCRNA00112 | − | −5.34E−03 | 4.38E−03 | − |
cg21766592 | 19 | 47,288,066 | SLC1A5 | − | −3.66E−03 | 4.46E−03 | − |
cg24704287 | 19 | 13,951,481 | Unannotated | − | −7.85E−03 | 5.41E−03 | − |
cg19266329 | 1 | 145,456,128 | Unannotated | − | −7.75E−03 | 7.67E−03 | − |
cg13065169 | 3 | 145,878,979 | PLOD2 | + | 1.09E−02 | 8.30E−03 | − |
cg04567334 | 10 | 73,408,185 | CDH23 | − | −4.55E−03 | 8.30E−03 | − |
cg11439821 | 15 | 48,483,834 | CTXN2 | + | 1.14E−02 | 9.65E−03 | + |
cg02519286 | 12 | 6,642,354 | GAPDH | − | −3.59E−03 | 9.66E−03 | − |
cg04645070 | 11 | 34,393,106 | Unannotated | − | −4.08E−03 | 9.66E−03 | − |
cg01657995 | 6 | 31,804,883 | C6orf48, SNORD52 | − | −7.37E−03 | 1.07E−02 | − |
cg09247619 | 1 | 198,648,849 | PTPRC | − | −4.60E−03 | 1.10E−02 | − |
cg13514042 | 7 | 1,192,202 | Unannotated | + | 5.97E−03 | 1.13E−02 | + |
cg10919522 | 14 | 74,227,441 | C14orf43 | − | −5.88E−03 | 1.21E−02 | − |
cg16765088 | 15 | 99,640,471 | Unannotated | − | −4.70E−03 | 1.31E−02 | − |
cg11061603 | 6 | 34,393,330 | RPS10 | + | 1.69E−03 | 1.43E-02 | + |
cg27425511 | 19 | 11,039,385 | YIPF2, C19orf52 | + | 4.41E−03 | 1.45E−02 | + |
cg00398764 | 15 | 37,402,637 | Unannotated | + | 4.05E−03 | 1.51E−02 | + |
cg02026141 | 10 | 23,492,470 | Unannotated | − | −1.62E−02 | 1.57E-02 | + |
cg11387705 | 7 | 22,122,473 | Unannotated | + | 5.85E−03 | 1.64E−02 | − |
cg02560388 | 2 | 11,969,958 | Unannotated | − | −4.21E−03 | 1.83E−02 | − |
cg20785560 | 1 | 78,511,235 | GIPC2 | + | 8.33E−03 | 1.89E−02 | − |
cg04727071 | 19 | 4,061,359 | ZBTB7A | − | −4.60E−03 | 1.89E−02 | − |
cg03725309 | 1 | 109,757,585 | SARS | − | −3.62E−03 | 1.89E−02 | − |
cg17058475 | 11 | 68,607,737 | CPT1A | − | −4.39E−03 | 1.98E−02 | − |
cg08994060 | 10 | 6,214,026 | PFKFB3 | − | −6.21E−03 | 2.47E−02 | − |
cg14159963 | 7 | 116,139,849 | CAV2 | + | 6.38E−03 | 2.47E−02 | − |
cg15962267 | 5 | 138,612,986 | SNHG4 | − | −4.51E−03 | 2.96E−02 | − |
cg06284479 | 3 | 37,173,546 | LRRFIP2 | + | 5.69E−03 | 3.15E−02 | + |
cg01665118 | 13 | 77,502,805 | BTF3L1 | − | −5.60E−03 | 3.61E−02 | − |
cg18998860 | 5 | 176,522,719 | FGFR4 | − | −4.42E−03 | 3.61E−02 | − |
cg05424199 | 16 | 30,759,608 | PHKG2 | + | 3.54E−03 | 3.63E−02 | + |
cg05165935 | 12 | 113,957,941 | Unannotated | + | 5.37E−03 | 4.10E−02 | + |
cg21699330 | 7 | 26,193,032 | NFE2L3 | − | −3.99E−03 | 4.10E−02 | − |
cg19766489 | 1 | 78,511,344 | GIPC2 | + | 8.46E−03 | 4.10E−02 | − |
cg15188939 | 15 | 72,809,154 | ARIH1 | + | 5.69E−03 | 4.40E−02 | + |
Probe ID . | Chr. . | Position . | Gene . | Direction* . | β . | P . | Direction# . |
---|---|---|---|---|---|---|---|
cg19693031 | 1 | 145,441,552 | TXNIP | − | −3.56E−02 | 1.57E−16 | − |
cg04816311 | 7 | 1,066,650 | C7orf50 | + | 2.01E−02 | 7.88E−13 | + |
cg06500161 | 21 | 43,656,587 | ABCG1 | + | 1.33E−02 | 2.59E−09 | + |
cg08309687 | 21 | 35,320,596 | Unannotated | − | −1.93E−02 | 8.56E−09 | − |
cg05778424 | 17 | 55,169,508 | AKAP1 | + | 1.24E−02 | 2.97E−08 | + |
cg19750657 | 13 | 38,935,967 | UFM1 | + | 1.39E−02 | 1.73E−07 | + |
cg01676795 | 7 | 75,586,348 | POR | + | 1.48E−02 | 6.63E−07 | + |
cg14597545 | 15 | 73,074,210 | ADPGK | + | 1.17E−02 | 2.00E−06 | + |
cg11024682 | 17 | 17,730,094 | SREBF1 | + | 1.02E−02 | 2.00E−06 | + |
cg26546155 | 16 | 57,269,635 | RSPRY1 | + | 1.12E−02 | 2.04E−06 | + |
cg16809457 | 6 | 90,399,677 | MDN1 | + | 1.42E−02 | 1.03E−05 | + |
cg09249494 | 6 | 6,588,846 | LY86, LOC285780 | + | 9.47E−03 | 1.03E−05 | + |
cg13640297 | 6 | 170,099,238 | WDR27 | + | 9.78E−03 | 3.15E−05 | + |
cg03078690 | 6 | 33,235,504 | VPS52 | + | 9.48E−03 | 3.15E−05 | + |
cg08788930 | 8 | 142,201,685 | DENND3 | + | 1.08E−02 | 3.59E−05 | + |
cg26804423 | 7 | 8,201,134 | − | + | 9.20E−03 | 3.82E−04 | + |
cg00277397 | 7 | 71,800,412 | CALN1 | − | −1.13E−02 | 3.82E−04 | − |
cg03497652 | 16 | 4,751,569 | ANKS3 | + | 9.14E−03 | 3.89E−04 | + |
cg00076653 | 4 | 15,341,878 | C1QTNF7 | + | 1.16E−02 | 5.69E−04 | + |
cg17066943 | 6 | 34,192,320 | Unannotated | − | −6.13E−03 | 6.03E−04 | − |
cg03699074 | 16 | 88,849,875 | FAM38A | − | −6.66E−03 | 6.56E−04 | − |
cg06235429 | 11 | 67,373,114 | NDUFV1 | + | 7.70E−03 | 7.29E−04 | + |
cg26371731 | 1 | 156,390,240 | MIR9–1, C1orf61 | + | 1.35E−02 | 8.12E−04 | − |
cg17666418 | 6 | 106,612,002 | Unannotated | − | −5.68E−03 | 1.03E−03 | − |
cg23939642 | 17 | 79,225,757 | SLC38A10 | + | 1.05E−02 | 1.03E−03 | + |
cg11183227 | 15 | 91,455,407 | MAN2A2 | + | 8.60E−03 | 1.03E−03 | + |
cg22540135 | 19 | 51,898,904 | Unannotated | − | −4.23E−03 | 1.03E−03 | − |
cg22627753 | 1 | 988,623 | AGRN | − | −9.09E−03 | 1.51E−03 | − |
cg22909677 | 6 | 109,172,312 | ARMC2 | + | 7.79E−03 | 1.72E−03 | + |
cg00055726 | 9 | 136,445,425 | FAM163B | + | 8.19E−03 | 1.85E−03 | + |
cg01973676 | 7 | 101,596,404 | CUX1 | − | −6.30E−03 | 2.12E−03 | − |
cg04973995 | 10 | 74,057,977 | Unannotated | − | −6.48E−03 | 2.30E−03 | − |
cg06007201 | 16 | 88,850,218 | FAM38A | − | −4.31E−03 | 2.30E−03 | − |
cg07960624 | 8 | 119,208,486 | SAMD12 | − | −1.02E−02 | 2.30E−03 | − |
cg00574958 | 11 | 68,607,622 | CPT1A | − | −4.03E−03 | 2.35E−03 | − |
cg17315426 | 14 | 23,527,629 | CDH24 | − | −4.42E−03 | 2.60E−03 | − |
cg04344749 | 1 | 25,871,814 | LDLRAP1 | + | 6.17E−03 | 2.60E−03 | + |
cg03539717 | 19 | 13,065,086 | GADD45GIP1 | + | 5.57E−03 | 2.60E−03 | + |
cg03318904 | 22 | 39,801,522 | MAP3K7IP1 | + | 9.19E−03 | 4.23E−03 | + |
cg13520715 | 21 | 43,135,854 | C21orf129, NCRNA00112 | − | −5.34E−03 | 4.38E−03 | − |
cg21766592 | 19 | 47,288,066 | SLC1A5 | − | −3.66E−03 | 4.46E−03 | − |
cg24704287 | 19 | 13,951,481 | Unannotated | − | −7.85E−03 | 5.41E−03 | − |
cg19266329 | 1 | 145,456,128 | Unannotated | − | −7.75E−03 | 7.67E−03 | − |
cg13065169 | 3 | 145,878,979 | PLOD2 | + | 1.09E−02 | 8.30E−03 | − |
cg04567334 | 10 | 73,408,185 | CDH23 | − | −4.55E−03 | 8.30E−03 | − |
cg11439821 | 15 | 48,483,834 | CTXN2 | + | 1.14E−02 | 9.65E−03 | + |
cg02519286 | 12 | 6,642,354 | GAPDH | − | −3.59E−03 | 9.66E−03 | − |
cg04645070 | 11 | 34,393,106 | Unannotated | − | −4.08E−03 | 9.66E−03 | − |
cg01657995 | 6 | 31,804,883 | C6orf48, SNORD52 | − | −7.37E−03 | 1.07E−02 | − |
cg09247619 | 1 | 198,648,849 | PTPRC | − | −4.60E−03 | 1.10E−02 | − |
cg13514042 | 7 | 1,192,202 | Unannotated | + | 5.97E−03 | 1.13E−02 | + |
cg10919522 | 14 | 74,227,441 | C14orf43 | − | −5.88E−03 | 1.21E−02 | − |
cg16765088 | 15 | 99,640,471 | Unannotated | − | −4.70E−03 | 1.31E−02 | − |
cg11061603 | 6 | 34,393,330 | RPS10 | + | 1.69E−03 | 1.43E-02 | + |
cg27425511 | 19 | 11,039,385 | YIPF2, C19orf52 | + | 4.41E−03 | 1.45E−02 | + |
cg00398764 | 15 | 37,402,637 | Unannotated | + | 4.05E−03 | 1.51E−02 | + |
cg02026141 | 10 | 23,492,470 | Unannotated | − | −1.62E−02 | 1.57E-02 | + |
cg11387705 | 7 | 22,122,473 | Unannotated | + | 5.85E−03 | 1.64E−02 | − |
cg02560388 | 2 | 11,969,958 | Unannotated | − | −4.21E−03 | 1.83E−02 | − |
cg20785560 | 1 | 78,511,235 | GIPC2 | + | 8.33E−03 | 1.89E−02 | − |
cg04727071 | 19 | 4,061,359 | ZBTB7A | − | −4.60E−03 | 1.89E−02 | − |
cg03725309 | 1 | 109,757,585 | SARS | − | −3.62E−03 | 1.89E−02 | − |
cg17058475 | 11 | 68,607,737 | CPT1A | − | −4.39E−03 | 1.98E−02 | − |
cg08994060 | 10 | 6,214,026 | PFKFB3 | − | −6.21E−03 | 2.47E−02 | − |
cg14159963 | 7 | 116,139,849 | CAV2 | + | 6.38E−03 | 2.47E−02 | − |
cg15962267 | 5 | 138,612,986 | SNHG4 | − | −4.51E−03 | 2.96E−02 | − |
cg06284479 | 3 | 37,173,546 | LRRFIP2 | + | 5.69E−03 | 3.15E−02 | + |
cg01665118 | 13 | 77,502,805 | BTF3L1 | − | −5.60E−03 | 3.61E−02 | − |
cg18998860 | 5 | 176,522,719 | FGFR4 | − | −4.42E−03 | 3.61E−02 | − |
cg05424199 | 16 | 30,759,608 | PHKG2 | + | 3.54E−03 | 3.63E−02 | + |
cg05165935 | 12 | 113,957,941 | Unannotated | + | 5.37E−03 | 4.10E−02 | + |
cg21699330 | 7 | 26,193,032 | NFE2L3 | − | −3.99E−03 | 4.10E−02 | − |
cg19766489 | 1 | 78,511,344 | GIPC2 | + | 8.46E−03 | 4.10E−02 | − |
cg15188939 | 15 | 72,809,154 | ARIH1 | + | 5.69E−03 | 4.40E−02 | + |
+, positive correlation; −, negative correlation. Chr., chromosome; ID, identifier.
Direction of the effect for CpG sites in our study;
direction of the effect for CpG sites from previous studies.
Cross-Lagged Analysis for the CpG Sites Associated With T2DM and Glycemic Traits
The cross-lagged analysis was conducted on the longitudinal data (n = 308). Among the 67 CpG sites associated with T2DM, 11 cross-lagged effects from baseline T2DM levels to follow-up DNAm and 15 effects from baseline DNAm to follow-up T2DM were found to be significant (Table 3). For the glycemic traits, we observed 17 significant cross-lagged associations (8 for HbA1c and 9 for FBG), and these associations were all from glycemic traits to the subsequent DNAm levels (Supplementary Table 5).
Probe ID . | T2DMbase→CpGfollow . | CpGbase→T2DMfollow . | Goodness of fit . | |||
---|---|---|---|---|---|---|
ρ1 . | PFDR . | ρ2 . | PFDR . | SRMR . | CFI . | |
cg19750657 | 0.47 | <0.01 | 0.02 | 0.18 | 0.02 | 0.98 |
cg16809457 | 0.57 | <0.01 | 0.03 | 0.12 | 0.02 | 0.98 |
cg26804423 | 0.45 | <0.01 | <0.01 | 0.95 | 0.02 | 0.98 |
cg00277397 | −0.34 | <0.01 | −0.02 | 0.33 | 0.02 | 0.99 |
cg04973995 | −0.41 | <0.01 | <0.01 | 0.34 | 0.03 | 0.98 |
cg24704287 | −0.46 | 0.01 | −0.02 | 0.12 | 0.03 | 0.98 |
cg10919522 | −0.48 | <0.01 | 0.01 | 0.87 | 0.03 | 0.96 |
cg05778424 | 0.44 | 0.04 | 0.05 | 0.02 | 0.02 | 0.98 |
cg14597545 | 0.57 | 0.01 | 0.01 | 0.04 | 0.02 | 1.00 |
cg26546155 | 0.44 | 0.01 | 0.05 | 0.02 | 0.02 | 0.98 |
cg08788930 | 0.26 | 0.05 | 0.05 | 0.02 | 0.02 | 1.00 |
cg19693031 | −0.25 | 0.31 | 0.02 | 0.02 | 0.04 | 0.96 |
cg04816311 | 0.24 | 0.31 | 0.02 | 0.02 | 0.03 | 0.99 |
cg01676795 | 0.25 | 0.23 | 0.02 | 0.04 | 0.03 | 0.98 |
cg03078690 | 0.31 | 0.15 | 0.02 | 0.04 | 0.02 | 0.99 |
cg03497652 | 0.15 | 0.37 | 0.01 | 0.04 | 0.02 | 0.99 |
cg00076653 | 0.21 | 0.23 | 0.01 | 0.04 | 0.02 | 1.00 |
cg23939642 | −0.04 | 0.76 | 0.01 | 0.04 | 0.02 | 1.00 |
cg03539717 | <0.01 | 1.00 | 0.02 | 0.02 | 0.02 | 1.00 |
cg21766592 | −0.11 | 0.54 | 0.01 | 0.04 | 0.03 | 0.98 |
cg19266329 | −0.16 | 0.42 | 0.01 | 0.04 | 0.03 | 0.97 |
cg06284479 | 0.13 | 0.53 | 0.01 | 0.04 | 0.02 | 1.00 |
Probe ID . | T2DMbase→CpGfollow . | CpGbase→T2DMfollow . | Goodness of fit . | |||
---|---|---|---|---|---|---|
ρ1 . | PFDR . | ρ2 . | PFDR . | SRMR . | CFI . | |
cg19750657 | 0.47 | <0.01 | 0.02 | 0.18 | 0.02 | 0.98 |
cg16809457 | 0.57 | <0.01 | 0.03 | 0.12 | 0.02 | 0.98 |
cg26804423 | 0.45 | <0.01 | <0.01 | 0.95 | 0.02 | 0.98 |
cg00277397 | −0.34 | <0.01 | −0.02 | 0.33 | 0.02 | 0.99 |
cg04973995 | −0.41 | <0.01 | <0.01 | 0.34 | 0.03 | 0.98 |
cg24704287 | −0.46 | 0.01 | −0.02 | 0.12 | 0.03 | 0.98 |
cg10919522 | −0.48 | <0.01 | 0.01 | 0.87 | 0.03 | 0.96 |
cg05778424 | 0.44 | 0.04 | 0.05 | 0.02 | 0.02 | 0.98 |
cg14597545 | 0.57 | 0.01 | 0.01 | 0.04 | 0.02 | 1.00 |
cg26546155 | 0.44 | 0.01 | 0.05 | 0.02 | 0.02 | 0.98 |
cg08788930 | 0.26 | 0.05 | 0.05 | 0.02 | 0.02 | 1.00 |
cg19693031 | −0.25 | 0.31 | 0.02 | 0.02 | 0.04 | 0.96 |
cg04816311 | 0.24 | 0.31 | 0.02 | 0.02 | 0.03 | 0.99 |
cg01676795 | 0.25 | 0.23 | 0.02 | 0.04 | 0.03 | 0.98 |
cg03078690 | 0.31 | 0.15 | 0.02 | 0.04 | 0.02 | 0.99 |
cg03497652 | 0.15 | 0.37 | 0.01 | 0.04 | 0.02 | 0.99 |
cg00076653 | 0.21 | 0.23 | 0.01 | 0.04 | 0.02 | 1.00 |
cg23939642 | −0.04 | 0.76 | 0.01 | 0.04 | 0.02 | 1.00 |
cg03539717 | <0.01 | 1.00 | 0.02 | 0.02 | 0.02 | 1.00 |
cg21766592 | −0.11 | 0.54 | 0.01 | 0.04 | 0.03 | 0.98 |
cg19266329 | −0.16 | 0.42 | 0.01 | 0.04 | 0.03 | 0.97 |
cg06284479 | 0.13 | 0.53 | 0.01 | 0.04 | 0.02 | 1.00 |
ρ1 indicates the pathway from baseline phenotypes to subsequent DNAm levels at follow-up; ρ2 represents the pathway from baseline DNAm to subsequent phenotypes at follow-up. base, baseline; follow, follow-up; ID, identifier.
Longitudinal Mediation Effects of DNAm
We established a longitudinal mediation model based on the CpG sites associated with T2DM and glycemic traits. First, the DNAm of these CpG sites was set as the mediation variable in the path from lifestyle factors to T2DM. We found adequately fitted indirect effects of smoking status on T2DM via DNAm of cg19693031 and cg04816311; however, the total effects of these two associations were not significant (Table 4). No significant mediation effects were detected for alcohol consumption.
Phenotype1 . | Probe ID . | Phenotype 2 . | Path a . | P . | Path b . | P . | a * b . | P . | Path c′ . | P . | Total . | P . | Mediated (%) . | SRMR . | CFI . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lifestyle→CpG→T2DM | |||||||||||||||
Smoking status | cg19693031 | T2DM | −0.18 | <0.01 | −0.06 | 0.01 | 0.01 | 0.02 | −0.03 | 0.07 | −0.02 | 0.18 | a | 0.03 | 0.97 |
Smoking status | cg04816311 | T2DM | 0.15 | 0.01 | 0.05 | <0.01 | 0.01 | 0.05 | −0.02 | 0.21 | −0.01 | 0.49 | a | 0.03 | 0.97 |
HbA1c→CpG→T2DM | |||||||||||||||
HbA1c | cg19693031 | T2DM | −0.13 | 0.01 | −0.04 | 0.01 | 0.01 | 0.04 | 0.08 | 0.01 | 0.08 | 0.01 | 6.2 | 0.04 | 0.95 |
HbA1c | cg04816311 | T2DM | 0.19 | <0.01 | 0.04 | 0.02 | 0.01 | 0.05 | 0.08 | 0.02 | 0.09 | 0.01 | 9.1 | 0.03 | 0.97 |
FBG→CpG→T2DM | |||||||||||||||
FBG | cg00574958 | T2DM | −0.21 | <0.01 | −0.02 | 0.03 | 0.01 | 0.04 | 0.11 | <0.01 | 0.11 | <0.01 | 4.2 | 0.05 | 0.94 |
Phenotype1 . | Probe ID . | Phenotype 2 . | Path a . | P . | Path b . | P . | a * b . | P . | Path c′ . | P . | Total . | P . | Mediated (%) . | SRMR . | CFI . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lifestyle→CpG→T2DM | |||||||||||||||
Smoking status | cg19693031 | T2DM | −0.18 | <0.01 | −0.06 | 0.01 | 0.01 | 0.02 | −0.03 | 0.07 | −0.02 | 0.18 | a | 0.03 | 0.97 |
Smoking status | cg04816311 | T2DM | 0.15 | 0.01 | 0.05 | <0.01 | 0.01 | 0.05 | −0.02 | 0.21 | −0.01 | 0.49 | a | 0.03 | 0.97 |
HbA1c→CpG→T2DM | |||||||||||||||
HbA1c | cg19693031 | T2DM | −0.13 | 0.01 | −0.04 | 0.01 | 0.01 | 0.04 | 0.08 | 0.01 | 0.08 | 0.01 | 6.2 | 0.04 | 0.95 |
HbA1c | cg04816311 | T2DM | 0.19 | <0.01 | 0.04 | 0.02 | 0.01 | 0.05 | 0.08 | 0.02 | 0.09 | 0.01 | 9.1 | 0.03 | 0.97 |
FBG→CpG→T2DM | |||||||||||||||
FBG | cg00574958 | T2DM | −0.21 | <0.01 | −0.02 | 0.03 | 0.01 | 0.04 | 0.11 | <0.01 | 0.11 | <0.01 | 4.2 | 0.05 | 0.94 |
Path a indicates the pathway from baseline phenotype 1 to subsequent DNAm levels of CpG sites at follow-up. Path b shows the path from baseline DNAm levels to subsequent phenotype 2 at follow-up. a * b refers to the indirect effect of the mediation. Path c′, direct effect, donates the path from baseline phenotype 1 to subsequent phenotype 2 at follow-up in controlling for the DNAm; total, total effects of the mediation. ID, identifier.
Not applicable, since the total effect was not significant.
On the other hand, the results from the cross-lagged analysis indicate that several CpGs, such as cg19693031, appear to be affected by HbA1c and FBG while influencing T2DM. Based on this fact, we explored whether CpG sites mediate the effects of alterations in glycemic traits on T2DM. Three significant mediation effects were revealed for HbA1c and FBG: cg19693031 and cg04816311 may be mediators of the association between HbA1c and T2DM, with cg19693031 and cg04816311 accounting for 6.2% and 9.1% of the association, respectively, and the association between FBG and T2DM was partially mediated by cg00574958 (mediation effect = 4.2%).
Discussion
For T2DM, HbA1c, and FBG, we validated 67, 17, and 16 CpG sites from previous studies in the Chinese twin population, respectively. In discordant MZ twin studies, 14 and 12 CpG sites were identified for HbA1c and FBG, respectively. Longitudinal research detected 15 cross-lagged effects from DNAm prior to T2DM and 11 effects from baseline T2DM to DNAm, including four sites that may have bidirectional effects. Cross-lagged effects for glycemic traits were all from glycemic traits to subsequent DNAm (eight for HbA1c and nine for FBG). Mediation analysis suggested no significant mediation effects of DNAm on the associations between lifestyle factors and T2DM. In contrast, DNAm of cg19693031 and cg04816311 might be mediators linking HbA1c with T2DM, and DNAm of cg00574958 might mediate the association between FBG and T2DM.
CpG Sites Associated With T2DM and Glycemic Traits
Candidate CpG sites were used for a DNAm association study rather than an epigenome-wide association study to validate previously reported CpG sites in twin populations and maximize the probability of detecting the associated CpG sites for the subsequent analysis. While much research has been conducted to investigate associations between DNAm and T2DM or glycemic traits, with identification of hundreds of CpG sites, these studies yielded inconsistent results. Many of them have been based on small populations, lenient for multiple testing, or unable to adjust for cell heterogeneity, which is an essential factor leading to variability in DNAm, raising questions about their robustness and reproducibility. In addition, the differences in ethnicity and genetic diversity between studies account for the variation in DNAm, which might be an explanation for the inconsistency of results. There is ample evidence showing that DNAm differs by race (34–36), which has similarly been reported at several CpG sites in studies of T2DM (37). Therefore, to confirm the authenticity of the associations, here we conducted an association study based on CpG sites that have previously been reported in the Chinese twin population with adjustment for blood cell composition and correction for multiple testing. Furthermore, we conducted a discordant analysis in MZ twins who are genetically identical. The reproducible results from discordant MZ analysis provide valuable and reliable evidence for these associations.
Our findings support the correlations between DNAm and T2DM among the Chinese population. Many CpG sites that we identified, particularly the most significant CpG sites (e.g., cg19693031, cg06500161, and cg00574958), have been demonstrated in several studies (9,38,39), validating the results from our study and indicating that the effects of several T2DM- and glycemic trait–associated CpG sites might be robust and homogeneous across different cohorts and racial/ethnic groups, such as cg19693031, cg00574958, and cg06500161.
Temporality Between DNAm and T2DM and Glycemic Traits
The cause-effect relationship between DNAm and T2DM has been a focus of research (16), and temporal sequence is a necessary prerequisite for causality. Our findings showed that 15 CpG sites are likely to predict subsequent T2DM and glycemic traits. Most of these temporal relationships were reported for the first time. In previous Mendelian randomization and longitudinal studies, investigators have similarly concluded that the DNAm of several CpG sites may affect T2DM and glycemic traits. However, in these studies investigators mostly reported the impact of DNAm on specific genes, including G6PC2, Fgf21, KCNQ1, and MSI2 (17,18,40,41). In other studies several CpG sites were reported as mediators of the association between features such as drug use or age and the risk of T2DM (42,43).
The 15 CpG sites identified to affect T2DM were annotated to 14 genes. Among these, TXNIP, GADD45GIP1, SLC1A5, AKAP1, and ADPGK were previously implicated in T2DM. TXNIP regulates intra- and extracellular oxidation–reduction processes and plays a vital role in glucose metabolism and defective glucose homeostasis preceding T2DM (44–46). An animal experiment study demonstrated that GADD45GIP1 reduces insulin secretion in response to glucose, and the secretory defects in glucose-stimulated insulin secretion account for glucose intolerance of mice (47). The expression of SLC1A5 is involved in the pathological changes caused by high glucose (48). In our study, cg05778424 at AKAP1 and cg14597545 at ADPGK were detected to have bidirectional effects with T2DM, suggesting that the interrelationship between T2DM and DNAm of several CpGs is complicated and possibly bidirectional. AKAP1 is a critical regulatory molecule in several mitochondrial events, and in a previous study AKAP1 proved to be a pathogenic factor for hyperglycemic injury through the modulation of mitochondrial fission (49). ADP-dependent glucokinase (ADPGK) is involved in converting glucose into glucose-6-phosphate and plays a vital role in glycolysis and the pentose phosphate pathway (50). Recent advances have shed light on the possibility of bidirectional effects between DNAm and BMI (51), while further studies will be needed to definitively establish whether these bidirectional effects exist for T2DM.
Additionally, we observed that baseline T2DM was associated with DNAm at follow-up in several CpG sites, which had not been reported before. Of these CpG sites that may be affected by T2DM, cg19750657 and cg26804423 were annotated to T2DM-related genes. According to Lemaire et al. (52), Ufm1 (annotated by cg19750657) and its target, Ufbp1, are highly expressed within insulin-secreting β-cells in the pancreatic islets of Langerhans. In addition, UFM1 expression could be suppressed by metformin, resulting in a reduction in UFMylation throughout the cell, and researchers have reported that the process may be regulated by DNAm (53). Ica1 (annotated by cg26804423) is localized to the trans and cis Golgi. It may regulate the transport of insulin from the endoplasmic reticulum to the Golgi apparatus, and the diminishing expression of Ica1 could result in decreased transport of insulin from the endoplasmic reticulum to the Golgi apparatus and decreased vesicular transport of insulin from the Golgi apparatus (54).
In the current study, baseline HbA1c and FBG preceded subsequent DNAm and not the other way around, which supports that HbA1c and FBG are the cause, not a consequence, of related DNAm changes. According to Vigorelli et al. (55), diabetic vascular inflammation and endothelial dysfunction due to high glucose levels will result in endothelial dysfunction linked to DNAm. This probably explains why some CpG sites are affected by glycemic traits. These affected CpG sites may consequently influence the expression of relevant genes and ultimately contribute to the pathogenesis of diabetes. Eight and nine CpG sites that may have been affected by HbA1c and FBG were annotated to seven and nine genes, respectively. For HbA1c, CPT1A (annotated from cg00574958) regulates the composition of circulating polyunsaturated n-3 fatty acids and docosahexaenoic acid, maintaining pancreatic islet secretion of glucagon, suggesting an essential impact on T2DM (56,57). cg02197919 was annotated in CDK5RAP1. Several studies conducted in vitro and involving cell lines have shown CDK5RAP1 to be essential for insulin secretion and insulin-induced β-cell survival and proliferation, which may have a critical role in glucose homeostasis (58,59). Moreover, for CpGs that may be affected by FBG, except for the previously discussed cg19693031 and cg00574958, the methylation of cg06500161 is reported to be expression associated, and ABCG1 is involved in insulin secretion and sensitivity (60–62). Of note, cg19693031, cg00574958, and cg06500161 were identified in many studies to have associations with T2DM and glycemic traits (37,63,64); our findings suggested that the methylation of these sites may vary with HbA1c and FBG concentration.
The potential involvement of other CpG sites in the changes in glycemic traits and pathogenesis of T2DM has not been directly illustrated in the literature. However, several CpG sites were annotated to BMI- and lipid metabolism–related genes (C7orf50, POR, and C1QTNF7) (65–67), which synergistically affect the pathogenesis of T2DM.
Mediation Effects Linking Lifestyle Factors, DNAm, T2DM, and Glycemic Traits
Previous EWAS on smoking revealed the influence of smoking on DNAm (68). In a meta-analysis investigators performed an EWAS based on 15,907 participants from 16 cohorts and found 2,623 smoke-associated CpG sites, suggesting that smoking has a broad impact on genome-wide DNAm (21). The EWAS for alcohol consumption also illustrated that the methylation of multiple signaling pathways involving multiple genes is related to drinking, suggesting that drinking (including alcohol addiction) affects genome-wide DNAm modification (22,69). Furthermore, it is well-known that T2DM can be caused by alcohol consumption and smoking (70). Based on the above facts, one unanswered question raised is whether DNAm variances mechanistically mediate the associations between smoking and alcohol consumption and T2DM.
The direct effects for the mediator effects were not significant between lifestyle factors and T2DM in our study, probably due to the variation of these variables being too slight (categorical variables). Nevertheless, we cannot exclude the possibility of these effects; the detrimental effects of smoking and drinking on diabetes are well documented (71,72). Although the results of our study showed incomplete mediation relationships, they suggested a potential role of DNAm in mediating the effect of lifestyle factors on T2DM.
On the other hand, hyperglycemia is a central risk factor for T2DM, while the pathological mechanism of development from dysglycemia to diabetes is very complicated (73). Few studies have included evaluation of the role of DNAm underlying this pathomechanism, which remains unclear.
It should be noted that the three CpG sites we identified that might mediate the development of diabetes (cg19693031, cg00574958, and cg04816311) have all been demonstrated in multiple studies to be associated with T2DM (37,38,63,64). Research has proven that TXNIP expression is regulated by methylation of cg19693031. A significant dose-dependent relationship between methylation of cg19693031 and HbA1c has been reported, and the methylation may be hyperglycemia induced (74). Moreover, Chambers et al. (75) performed a nested case-control analysis of DNAm and identified the association between differential methylation at cg19693031 and risk of future T2DM incidence. This evidence indicated that cg19693031 might play an essential role in the progression from dysglycemia to T2DM. The methylation level of cg00574958 is strongly associated with CPT1A expression (76). Given published observations, animal experiments have illustrated that high fructose intake induces CPT1A methylation in rodents and results in increased risk of T2DM (77).
Our study provided novel evidence for the regulatory role of cg19693031 and cg00574958 in the pathogenesis of T2DM in a longitudinal design, aligning with current evidence. Whether these CpG sites mediate the effects of smoking status and changes in glycemic traits and ultimately affect the development of T2DM still needs to be examined in future studies.
Strengths and Limitations
The strengths of the current study are reflected in several aspects. First, we analyzed candidate CpG sites from previous studies among the twin population to maximize the breadth of our search for possible associated CpG sites while increasing the authenticity of the results. Twins are naturally matched for many genetic and environmental factors, which has a unique value for the DNAm studies (78). Second, to our knowledge, this is the first study to reveal the temporal sequence relationship of DNAm with T2DM and glycemic traits in demonstrating their cross-lagged effect patterns. Finally, as far as we know, this is the first study with analysis of the mediating effect of the methylation levels of CpG sites on the association between lifestyle factors and T2DM or glycemic traits and the associations between glycemic traits and T2DM with use of a longitudinal design.
Our study has several limitations. First, the relationship between DNAm and T2DM or glycemic traits might differ between studies, because the candidate CpG sites for the association analysis were from various ancestries, and needs to be further validated in other ethnicities and ancestries. Despite the differences that existed between studies in ethnicity and study design, several CpG sites have been identified in multiple studies, indicating that some CpG sites associated with T2DM and glycemic traits may operate homogeneously across cohorts, ancestral groups, and different age-groups, which was previously illustrated (79). Second, the sample size for cross-lagged and longitudinal mediation analyses was relatively small; therefore, a different approach to adjusting for batch effect was used in the longitudinal analysis (ComBat approach), which also posed a potential limitation. Also, the discordance in MZ twin analysis was not significant in the current study and may reflect a limitation on the sample size of T2DM discordant MZ twins. A larger sample size is needed for investigation of the association and temporal relationship between DNAm and T2DM. Third, we did not perform in vitro or in vivo functional studies to validate the effects of these identified CpG sites on the expression of genes and subsequently on T2DM and glycemic traits, which still need to be addressed in future research.
Conclusions
In summary, our study replicated 67, 17, and 16 CpG sites from previous studies in the Chinese twin population for their associations with T2DM, HbA1c, and FBG, respectively. In discordant MZ twin studies, 14 and 12 CpG sites were validated for HbA1c and FBG. Results of the longitudinal study indicated that DNAm might predict subsequent T2DM at 15 identified CpG sites. The prediction in the other direction was detected in 11 CpG sites. For glycemic traits, 17 of the identified cross-lagged associations were all from glycemic traits to DNAm of CpG sites in the follow-up. Longitudinal mediation analysis from our study suggested a potential role of DNAm of cg19693031 and cg04816311 in mediating the effect of HbA1c on T2DM development and that DNAm of cg00574958 might be a mediator linking FBG with T2DM. This study has clear implications for future T2DM research on epigenetic and genetic testing, prevention, and treatment.
This article contains supplementary material online at https://doi.org/10.2337/figshare.21183427.
Article Information
Funding. The authors are grateful for the enthusiastic and outstanding support of the National Natural Science Foundation of China (82073633, 81973126, 81573223) and the Special Fund for Health Scientific Research in Public Welfare (201502006, 201002007).
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. X.H. analyzed data and wrote, reviewed, and edited the manuscript. Z.W. checked the data analysis process and contributed to the study design. W.C., J.L., C.Y., T.H., D.S., C.L., Y.P., and L.L. reviewed and edited the manuscript. Z.P., L.C., H.W., X.W., and Y.L. contributed to data collection. W.G. researched data and reviewed and edited the manuscript. W.G. and L.L. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.