A growing number of genetic loci have been shown to influence individual predisposition to type 2 diabetes (T2D). Despite longstanding interest in understanding whether nonlinear interactions between these risk variants additionally influence T2D risk, the ability to detect significant gene-gene interaction (GGI) effects has been limited to date. To increase power to detect GGI effects, we combined recent advances in the fine-mapping of causal T2D risk variants with the increased sample size available within UK Biobank (375,736 unrelated European participants, including 16,430 with T2D). In addition to conventional single variant–based analysis, we used a complementary polygenic score–based approach, which included partitioned T2D risk scores that capture biological processes relevant to T2D pathophysiology. Nevertheless, we found no evidence in support of GGI effects influencing T2D risk. The current study was powered to detect interactions between common variants with odds ratios >1.2, so these findings place limits on the contribution of GGIs to the overall heritability of T2D.
Genome-wide association studies (GWAS) have provided a detailed inventory of genetic loci conferring susceptibility to type 2 diabetes (T2D). A study of ∼900,000 individuals of European descent identified >400 association signals (1). To date, most studies of T2D predisposition have focused on the detection of main effects attributable to individual genetic variants. However, there has been longstanding interest in understanding the contributions of gene-gene interactions (GGIs) to individual predisposition to T2D. This interest was initially driven by the possibility that non–log-additive interactions might explain the apparent failure of main effects to account for the observed heritability of T2D (2). More recently, the search for GGIs has been motivated by the desire to establish whether second-order genetic effects could improve disease prediction models on the basis of genotype data and by the potential for statistical interactions to provide clues to underlying disease mechanisms (3). To date, there has been little evidence to indicate that GGIs have any appreciable impact on T2D risk (4,5).
However, it is clear that detection of all but the largest GGI effects require sample sizes substantially larger than those used to identify the main effects (1,6,7). Furthermore, the sample sizes necessary to detect GGI scale exponentially (precisely, to the fourth power of the correlation coefficient) when the variants being tested are only partially correlated with the (often unknown) causal variant (3).
The UK Biobank study, which provides genetic data for ∼500,000 individuals, offers a singular opportunity to advance the exploration of GGIs in T2D. We sought to capitalize on recent advances in the characterization of T2D risk loci resulting from fine-mapping efforts that have improved localization of the causal variant at many risk loci (1).
Research Design and Methods
UK Biobank Study Population
The UK Biobank (8) is a prospective cohort study of ∼500,000 individuals from across the U.K. We used imputed genetic data from the March 2018 release (version 3); details regarding quality control and imputation are provided elsewhere (9). We generated discrete genotypes using a genotype probability threshold of 0.8 and excluded variants with info score <0.5 and Hardy-Weinberg disequilibrium (P < 10−6). Individuals with discordant sex, putative sex chromosome aneuploidy, withdrawal of consent, and diagnosis of another form of diabetes, such as type 1 or gestational diabetes mellitus, were excluded. We further selected a subset of 375,736 individuals (16,430 T2D cases, 359,306 controls) who were 1) unrelated (up to second-degree relatives determined using the KING toolset [https://people.virginia.edu/∼wc9c/KING/manual.html] ) and 2) of European ancestry (determined using a combination of 15 principal components [PCs] provided by UK Biobank and self-reported white British ancestry). Variance-weighted PC scores were used to calculate the “genetic distance” with a hypothetical median white British participant to identify European individuals (genetic distance <60 units) (11). Prevalent T2D status was defined using self-reported medical history and medication information (12).
Prioritization of Variants for GGI Analysis
We used the following two approaches. The first approach was to select variants associated with T2D risk (the T2D risk set). We selected the index variant at each of the 403 conditionally independent association signals reported in the largest T2D GWAS in Europeans (1).
The second approach was to select variants associated with heterogeneity in T2D variance (the T2D variance set): variants that demonstrate marked heterogeneity in phenotype variance across genotypes represent potential candidates for GGI effects (13–15). To detect such variants, we selected a random subset of 100,000 unrelated European subjects (4,284 T2D cases, 95,716 controls), adjusted their T2D status for age, sex, genotyping batch, and 10 PCs; standardized the residuals; and used Levene test to assess equality of variance across genotypes (including only variants with minor allele frequency [MAF] >5%). For each 1-Mb block of the genome, we identified the variant most significantly associated with T2D variance (index variant). Finally, we selected the 100 most significant index variants that did not overlap with variants in the T2D risk set (Supplementary Table 1). The union of T2D risk and T2D variance sets provided a T2D joint set of 503 variants.
Analysis of GGIs for T2D
GGI effects were sought using two complementary strategies (Supplementary Fig. 1). The first strategy was single variant-based GGI analysis. We tested for interactions between individual variants using the –epistasis function in PLINK (https://www.cog-genomics.org/plink2) (16), which fits a logistic regression model (assuming a log-additive model of disease risk), as follows: ln(P [T2D] / P [control]) = β0 +β1SNP1 +β2SNP2 +β3SNP1SNP2, where SNP1 and SNP2 refer to the variants being tested, β1 and β2 refer to their main effects, and β3 refers to their interaction effect on T2D.
We deployed two analytical approaches: in the T2D joint set pairwise analysis, interaction was tested between each pair of variants in the T2D joint set; and, in the T2D joint set-versus-genome set analysis, interaction testing for the T2D joint set variants was extended to variants in the remainder of the genome (the genome set). Power calculations were performed (17) to estimate the MAF threshold (10%) for a variant in the genome set above which there was adequate power (>75%) to detect a substantial interaction effect (which we defined as an odds ratio [OR] >1.5) with a variant of MAF = 5% from the T2D joint set on the basis of the following parameters: 1) main effect OR on T2D for T2D joint set variant = 1.1; 2) main effect OR on T2D for genome set variant = 1.0; 3) interaction effect OR between the variants = 1.5; 4) sample size = 375,736, with a T2D case:control ratio (per UK Biobank) of 0.044; and 5) single test α = 0.05, with Bonferroni adjustment for 503 genome-wide analyses. Since the presence of linkage disequilibrium (LD) can confound interaction tests, we removed variants in the genome set in LD (r2 > 0.1 within 1 Mb) with a given T2D joint set index variant.
The second strategy was polygenic score (PS)–based GGI analysis. Compared with the single variant–based approach, the PS-based approach for testing GGIs offers greater statistical power if the underlying interaction effects for multiple T2D risk alleles are shared across alleles. We aggregated effects of the 403 variants in the T2D risk set to construct an overall T2D PS using the –score function in PLINK (16). The risk allele dosage for each variant was weighed by the effect size obtained from a T2D meta-analysis of 455,302 European individuals that included all studies from Mahajan et al. (1) except UK Biobank.
Similarly, a set of 93 T2D variants, which were either members of the T2D risk set or proxies (median r2 = 0.91) thereof, was used to construct six (weighted) partitioned PS (pPS) that captured biological processes relevant to T2D pathophysiology (18,19) (Supplementary Table 2). These pPS are referred to as pPSIS1, pPSIS2 (both reflecting processes involved in insulin secretion), pPSIA (insulin action), pPSadiposity (overall adiposity), pPSdyslipidemia (predominantly affecting liver metabolism), and pPSmix (a mixture of the above). The overall PS and the pPS were standardized to represent SD units. For the PS-based interaction analysis, we tested interactions between each PS (overall PS and six pPS) and genome-wide variants (MAF ≥1%; ∼6 million) variants.
Data Resource and Availability
The summary statistics of genome-wide analyses performed in this study are available at https://zenodo.org/record/3978776#.XzIAzC3MzRY.
The approaches deployed to characterize GGIs in T2D are summarized in Supplementary Fig. 1. These analyses were performed in 375,736 unrelated European individuals from UK Biobank (16,430 T2D cases, 359,306 controls) (research design and methods).
Analysis of GGIs for T2D
Since testing all genome-wide variants for interaction presents computational and statistical challenges, we prioritized two sets of variants for single variant analysis: the T2D risk set of 403 index variants, representing conditionally independent association signals for T2D risk in Europeans (1), and the T2D variance set, a nonoverlapping set of 100 index variants selected for the most extreme effects on T2D variance within UK Biobank (indicative of possible interaction effects) (13–15). Our primary analyses considered the combined T2D joint set of 503 variants (research design and methods).
We first sought pairwise interactions between variants in the T2D joint set. None of the pairwise interactions crossed the Bonferroni-corrected significance threshold (α = 4 × 10−7, correcting for 126,253 pairwise tests) (Supplementary Table 3). We estimate >70% power to detect an interaction effect of OR ≥1.5 between two variants in the T2D joint set, when both have an MAF at the lower end of the common variant range (5%) and main effect T2D OR = 1.1 (Supplementary Fig. 2). For more common variants (MAF = 50%), we were powered for interaction effects OR >1.2. The quantile-quantile plot provided no evidence that the distribution of GGI effects departed from the null (Fig. 1A). The strongest signal in this pairwise GGI analysis, involving rs629137 (near UVRAG, T2D variance set) and rs76011118 (near CDKN2A/B, T2D risk set), had an interaction OR of 1.24 (95% CI 1.21–1.27; P = 1 × 10−5). No significant pairwise interactions were observed when analyses were restricted to just the T2D risk set (81,003 tests, α = 6 × 10−7) or the T2D variance set (4,950 tests, α = 1 × 10−5) (Fig. 1A).
Next, we tested interactions between T2D joint set variants and the remainder of the genome. For the latter, we focused on ∼3.2 million variants with MAF >10%, since power calculations indicated >75% power to detect an interaction effect of OR ≥1.5 for variants above this MAF threshold, given a joint set MAF at the lower end of the common variant range (5%) (Supplementary Fig. 3). Again, we found no evidence for significant interactions at α = 1 × 10−10 (accounting for 503 genome-wide analyses) (Figs. 1B and 2 and Supplementary Table 4). The strongest signal involved two variants near TCF7L2: rs184509201, a T2D risk set variant corresponding to one of seven secondary signals at this locus, and rs10885397, a variant from the genome set (interaction OR 1.55 [1.51–1.59]; P = 2 × 10−9). These two variants, located ∼28 kb apart, are not in LD (r2 = 0.0). Genome-wide interaction analyses with variants exclusive to the T2D risk set (α = 1 × 10−10) or the T2D variance set (α = 5 × 10−10) were similarly negative.
Since the power to detect GGI effects for individual variants can be limited even with large sample sizes, we sought to bolster GGI detection by aggregating T2D risk variants into PS. In addition to an overall PS generated from the 403 variants in the T2D risk set, we constructed six additional pPS using a subset of 93 T2D risk variants that were stratified into physiological clusters capturing biological processes relevant to T2D pathophysiology (research design and methods).
Analyses evaluating GGIs using a PS-based approach failed to detect significant interactions between these PS and variants in the genome set (α = 7 × 10−9, accounting for seven GWAS) (Supplementary Table 5 and Supplementary Figs. 4 and 5). The strongest signal from this analysis was between variants at the CHN2 locus and pPSdyslipidemia (interaction OR 1.22; P = 1 × 10−7); this locus has previously been associated with diabetic retinopathy and development of severe insulin resistance (20).
By addressing several limitations of previous analyses, the current study offers substantially improved power for detecting GGI effects influencing T2D risk. First, we conducted GGI analyses in a much larger sample size than earlier efforts (4,5). Second, we prioritized sets of genetic variants likely to be enriched for interaction effects. Third, we leveraged improved fine-mapping of association signals, avoiding the attrition of power for signals only partially correlated with the causal variant. Fourth, we implemented a PS-based approach to complement the single variant–based interaction analysis. Despite using all these measures, we found no credible evidence either in the overall distribution of interaction effects or in individual signals meeting study-wide significance for GGI effects influencing T2D risk.
There are some clear limitations to this analysis. For computational and statistical reasons, we chose not to test all genome-wide variants for interaction (requiring ∼1013 tests). Nevertheless, the sets of variants prioritized were those most likely to demonstrate interaction effects, and the lack of significant associations within this subset should be indicative of patterns seen more broadly. Additionally, our study did not rule out the possibility of higher-order interactions and interactions detectable on a scale of disease-risk other than log-additive.
We are unable to determine the extent to which more subtle or infrequent GGI effects, singly or in combination, contribute to the residual heritability not attributable to known T2D risk variants. Nevertheless, the absence of detectable major GGI effects on T2D risk implies that for many practical clinical and epidemiological purposes, the joint effects of multiple genetic risk factors for T2D can be derived purely from the combination of main effects. Although it has been suggested that statistical GGIs might be indicative of functional interactions (3) and, thereby, provide mechanistic insights into disease biology, the extent to which this holds for common variants influencing complex human diseases remains to be established.
M.I.M. and A.M. are currently affiliated with Genentech, South San Francisco, CA.
This article contains supplementary material online at https://doi.org/10.2337/figshare.12827219.
Acknowledgments. This research has been conducted using the UK Biobank Resource (https://www.ukbiobank.ac.uk) under application number 9161. We acknowledge Andrew P. Morris (University of Manchester) for useful input on some of the analyses.
Funding. This study received funding from the National Institute of Diabetes and Digestive and Kidney Diseases (U01-DK-105535) and the Wellcome Trust (090532, 098381, 106130, 203141, 212259). M.I.M. was a Wellcome investigator and an NIHR senior investigator.
The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.
Duality of Interest. As of January 2020, A.M. is an employee of Genentech and a holder of Roche stock. M.I.M. has served on advisory panels for Pfizer, Novo Nordisk, and Zoe Global; has received honoraria from Merck, Pfizer, Novo Nordisk, and Eli Lilly; and has received research funding from AbbVie, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, Novo Nordisk, Pfizer, Roche, Sanofi, SERVIER, and Takeda. As of June 2019, M.I.M. is an employee of Genentech and a holder of Roche stock. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. A.N., M.I.M., and A.M. wrote the manuscript. A.N. and A.M. analyzed the data. M.I.M. and A.M. conceived and designed the study. All authors revised the manuscript. A.M. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. Parts of this study were presented at the 79th Scientific Sessions of the American Diabetes Association, San Francisco, CA, 7–11 June 2019.