Type 2 diabetes shows an increasing prevalence in both adults and children. Identification of biomarkers for both youth and adult-onset type 2 diabetes is crucial for development of screening tools or drug targets. In this study, using two-sample Mendelian randomization (MR), we identified 22 circulating proteins causally linked to adult type 2 diabetes and 11 proteins with suggestive evidence for association with youth-onset type 2 diabetes. Among these, colocalization analysis further supported a role in type 2 diabetes for C-type mannose receptor 2 (MR odds ratio [OR] 0.85 [95% CI 0.79–0.92] per genetically predicted SD increase in protein level), MANS domain containing 4 (MR OR 0.90 [95% CI 0.88–0.92]), sodium/potassium-transporting ATPase subunit β2 (MR OR 1.10 [95% CI 1.06–1.15]), endoplasmic reticulum oxidoreductase 1β (MR OR 1.09 [95% CI 1.05–1.14]), spermatogenesis-associated protein 20 (MR OR 1.12 [95% CI 1.06–1.18]), haptoglobin (MR OR 0.96 [95% CI 0.94–0.98]), and α1–3-N-acetylgalactosaminyltransferase and α1–3-galactosyltransferase (MR OR 1.04 [95% CI 1.03–1.05]). Our findings support a causal role in type 2 diabetes for a set of circulating proteins, which represent promising type 2 diabetes drug targets.
Introduction
Type 2 diabetes is considered the fifth leading cause of death worldwide among adults, leading to early morbidity and mortality (1). Over the past two decades, the prevalence of type 2 diabetes has increased substantially in children in all ancestries. Interestingly, youth-onset type 2 diabetes is associated with greater mortality and earlier complications, and its treatment options are limited (2). Primary prevention of both adult- and youth-onset type 2 diabetes consists of lifestyle interventions in individuals at high risk of developing the disease. Therefore, identifying early disease biomarkers is important for both screening for type 2 diabetes as early as in childhood and characterizing novel drug targets.
It has been reported that ∼150 Food and Drug Administration–approved biomarkers target plasma circulating proteins (3). Identification of circulating protein biomarkers using population-based proteomics can improve our understanding on the etiology of type 2 diabetes and enhance strategies for screening, diagnosis, and treatment of this disease. In several observational studies, mostly cross-sectional, ∼142 plasma proteins have been associated with risk of type 2 diabetes (3–5). However, these studies suffer from bias due to unmeasured confounders, such as adiposity, inflammation, etc. Reverse causation may occur when type 2 diabetes itself leads to changes in circulating protein levels—for instance, due to a profound metabolic disarrangement in overt diabetes. Therefore, the available observational evidence has not been able to establish a causal association between these proteins and type 2 diabetes (4).
Mendelian randomization (MR) applies an approach based on instrumental variable analyses to minimize confounding and reverse causation to identify causal effects of an exposure (such as circulating proteins) on a disease outcome (6). Adding to existing evidence from observational studies (3–5), a recent MR study has identified candidate protein biomarkers for adult type 2 diabetes (7) using a set of genetic instruments associated with the tested proteins in a large proteomic cohort, but not at a genome-wide level. Expanding this approach, in the current study, we used genome-wide significant single nucleotide polymorphisms (SNPs) from the five largest protein genome-wide association study (GWAS) consortia available to date (8–12) as genetic instruments to our protein exposures and queried their effects on type 2 diabetes in the largest adult type 2 diabetes GWAS available to date (13). Furthermore, we applied a similar approach to identify causal circulating proteins for youth-onset type 2 diabetes (2) using data from the only available recently published pediatric type 2 diabetes GWAS.
Research Design and Methods
Study Exposures
We used the five largest proteomic GWAS (8–12) to date to identify SNPs as MR instruments for circulating proteins termed cis-protein quantitative trait loci (cis-pQTLs), which were defined as the genome-wide significant SNPs within 1 Mb of the gene encoding the measured protein. The circulating proteins in the Sun et al. (8), Emilsson et al. (9), Suhre et al. (11), and Yao et al. (12) GWAS were quantified using the Aptamer-based (SOMAmer) technology; however, in the GWAS by Folkersen et al. (10), the circulating proteins were measured using the Olink platform (Supplementary Table 1).
Study Outcomes
Adult Type 2 Diabetes GWAS: Adjusted and Unadjusted for BMI
To test whether there is an association between the proteins associated with the aforementioned cis-pQTLs and adult type 2 diabetes risk, we retrieved the effects of these cis-pQTLs on type 2 diabetes from the DIAMANTE consortium GWAS for type 2 diabetes, which is a meta-analysis of 32 European type 2 diabetes cohorts with available GWAS data (13) (n = 71,124 case and 824,006 control subjects) (Fig. 1). As a sensitivity analysis, and since there is a known overlap in the genetic architecture of obesity and type 2 diabetes (14), we repeated our MR analysis using effects from the BMI-adjusted DIAMANTE GWAS to assess if BMI could have affected the association of cis-pQTLs with adult type 2 diabetes risk. This GWAS included up to 50,409 case subjects with type 2 diabetes and 523,897 control subjects of European ancestry (13). Details on these cohorts can be found in the prior publication (13).
Flowchart of our MR studies assessing the causal role of circulating proteins on adult and youth-onset type 2 diabetes, and of sensitivity analyses testing the MR assumptions.
Flowchart of our MR studies assessing the causal role of circulating proteins on adult and youth-onset type 2 diabetes, and of sensitivity analyses testing the MR assumptions.
Youth-Onset Type 2 Diabetes GWAS
To test whether any circulating proteins from the aforementioned MR analysis had evidence for a causal role in youth-onset type 2 diabetes, we used proteins nominally associated (MR P value <0.05) with unadjusted for BMI adult type 2 diabetes risk as exposures and queried the effects of their respective cis-pQTLs in a recently published GWAS study on youth-onset type 2 diabetes (2) (Fig. 1). This GWAS included 3,006 youth case subjects and 6,000 adult control subjects from the multiethnic Progress in Diabetes Genetics in Youth (ProDiGY) consortium. For the purpose of our MR study, we used GWAS data from the European ancestry subset of ProDiGY, including 664 case subjects with type 2 diabetes and 1,976 control subjects (2).
Statistical Analyses
Two-Sample MR
To test for causal evidence between circulating protein levels and type 2 diabetes risk in adults or children, we performed two-sample MR analyses implemented in the “TwoSampleMR” R package (15) using the Wald ratio method. First, we identified the lead SNPs (cis-pQTLs) with the lowest P value for association with protein levels in the five proteomic GWAS (8–12). We then performed linkage disequilibrium (LD) clumping (R2 < 0.3) using the 1000 Genomes “EUR” reference panel to avoid including more than a single cis-pQTL instrument per protein exposure. It is important to mention that the LD clumping was performed within each proteomic GWAS study, but not across all of the studies. We next combined all of the candidate proteins prioritized by our MR analysis obtained from all of the five proteomic studies (8–12) and assessed the variance explained (R2), which is the variance in protein levels explained by the SNP and the F statistic of their respective cis-pQTL association as a metric of strength for the genetic instrument (an F statistic >10 implying a strong instrument) to further assure that the first MR assumption was satisfied. We calculated the proportion of the variance of the respective protein level explained by the cis-pQTL (R2) using the following formula: R2 ≈ 2β2ƒ(1 − ƒ), where β and ƒ denote the effect estimate and the effect allele frequency of the allele on a standardized phenotype, respectively (16). We also computed the F statistic of each cis-pQTL using the following formula: F = (R2/k)/([1 − R2]/[n − k − 1]), where R2 is the proportion of the variance of the respective protein level explained by the cis-pQTL, k is the number of instruments used in the model (in this case, k = 1, since there was a single cis-pQTL per protein), and n is the GWAS sample size (17).
Then, we tested the effects of the lead cis-pQTL in the adult type 2 diabetes DIAMANTE GWAS, unadjusted or adjusted for BMI (13), and in the youth-onset type 2 diabetes ProDiGY GWAS (2). In this study, we used single-variant MR, and, in order to calculate the Wald ratios, SNP-exposure effects were used against SNP-outcome effects to calculate a single MR estimate for each protein trait on type 2 diabetes risk. Next, we applied Bonferroni correction to control for the total number of proteins tested in our MR experiments. After prioritizing proteins based on Bonferroni correction, the findings from overlapping proteins from different proteomic GWAS were cross validated. Although we allowed for overlaps of tested proteins across proteomic GWAS, LD clumping was performed to ensure that there is a single cis-pQTLs per protein per proteomic GWAS. As a final stage, the results of all five independent MRs for each proteomic GWAS were combined and compared.
The findings of our single-variant MR studies are presented as MR odds ratios (ORs) and 95% CIs for risk of type 2 diabetes per genetically predicted 1 SD increase in circulating protein level.
MR Assumptions
In each MR analysis, three assumptions need to be satisfied. The first MR assumption requires that the genetic instrument must be strongly associated with the exposure; we thus used cis-pQTLs, which have been associated with their respective protein’s level at a genome-wide significant level (P ≤ 5 × 10−8). The cis-pQTLs were defined as the genome-wide significant SNPs with the lowest P value within 1 Mb of the transcription start site of the gene encoding the measured protein. For the cis-pQTLs that were not present in the type 2 diabetes GWAS, SNPs in high LD (defined by an LD R2 ≥ 0.8 in the 1000 Genomes phase 3 European panel) were selected as proxies in the LDlink website (https://ldlink.nci.nih.gov/?tab=ldproxy).
According to the second MR assumption, the genetic instrument should not be associated with confounders that link the exposure to outcome. We therefore used the PhenoScanner v2 (18) database to determine any reported associations of the cis-pQTLs of the MR-prioritized proteins with potential confounders at a genome-wide significant level (P ≤ 5 × 10−8).
The third MR assumption, known as exclusion restriction assumption, requires that the genetic instruments should be associated with the outcome only via the exposure. To satisfy this assumption, we elected to use only cis-acting SNPs (located within 1 Mb of the genes that encode the proteins) (19) as instruments in our MR studies. Since cis-pQTLs are considered to have a direct and definite influence on the protein compared with trans-pQTLs, they are less likely to impact the levels of this protein independently of the levels of the proteins encoded by their respective gene.
Sensitivity Analyses
Assessment for Confounding
In order to explore the second MR assumption, we queried reported genome-wide significant associations of the cis-pQTLs of the MR-prioritized proteins with potential confounders, such as body fat mass and waist-to-hip ratio, using PhenoScanner v2 (18).
Colocalization Analysis
We further assessed for potential confounding by LD via checking whether the cis-pQTL of the MR-prioritized proteins is itself associated with adult- or youth-onset type 2 diabetes or rather in LD with a separate causal variant for type 2 diabetes. To do so, we used colocalization, as implemented in the coloc R package (20). The colocalization analysis provides posterior probabilities for H0 (no association of the genomic locus with either trait), H1 (association with type 2 diabetes but not with the protein level), H2 (association with the protein level but not with type 2 diabetes), H3 (association with type 2 diabetes and the protein level through two different SNPs), and H4 (association with type 2 diabetes and the protein level through one shared SNP). To determine the posterior probability of each genomic locus containing a single variant affecting both the protein and the type 2 diabetes, we analyzed all SNPs within 1 Mb of the cis-pQTL. Colocalization analyses were performed only for proteins with the evidence of significant association with type 2 diabetes in our MR analysis and available summary-level results from the GWAS by Sun et al. (8). Visualization of colocalization results was performed using the LocusCompareR R package (21).
Multiple Locus Analysis
As a further sensitivity analysis, we assessed whether including multiple genetic instruments per protein exposure, explaining a larger portion of its variance, would affect the results of our main MR analysis. To do this, we performed a multiple locus analysis, including both cis- and trans-pQTLs (i.e., pQTLs that did not satisfy the aforementioned criteria of being cis-pQTL) whenever the latter were available for our protein exposures. We thus used the inverse variance–weighted MR approach to meta-analyze the MR effects of the SNPs used as instruments for a subset of the candidate proteins of our main MR analysis, which had available trans-pQTLs. To do so, we used “TwoSampleMR” R package (15).
Assessment for Protein-Altering Variants
For cis-pQTLs of our MR-prioritized proteins quantified on the SOMAlogic platform, we assessed the possibility of potential aptamer-binding effects, in which the presence of protein-altering variants (PAVs) may affect protein measurements. We verified whether the MR-prioritized cis-pQTL are PAVs or they are in LD (R2 > 0.8) with PAVs. Since such variants may impact direct measurements of the respective proteins using antibody-based or aptamer-based methods, this assessment was important for the interpretation of the results of our MR study and for future validation studies.
Expression QTL Assessment
To assess whether the cis-pQTL of the MR-prioritized proteins exert their effects on gene expression and demonstrate evidence of being expression quantitative trait loci (eQTL), we used the Genotype-Tissue Expression (GTEx) database (22) (https://www.gtexportal.org).
Data and Resource Availability
Data from proteomics studies are available from the referenced peer-reviewed studies or their corresponding authors, as applicable. Summary statistics for the type 2 diabetes GWAS are publicly available for download from the GWAS catalog. The statistical code needed to reproduce the results in the article is available upon request.
Results
Combining all of the five proteomic GWAS (8–12), we obtained 1,690 cis-pQTLs, including cis-pQTLs in overlapping loci. These 1,690 cis-pQTLs correspond to 1,089 unique circulating proteins and have available GWAS effects on BMI-unadjusted adult type 2 diabetes for our MR studies. After Bonferroni correction for multiple testing (P value threshold for significance = 0.05/1,089 or 4.6 × 10−5), our MR analyses revealed associations for 20 circulating proteins, which all had cis-pQTLs with an F statistic >10, indicating that these SNPs were strong instruments (Table 1 and Supplementary Table 2). As shown in Table 1, these 20 proteins included cyclin-dependent kinase 2–associated protein 1 (CDK2AP1), cyclin H (CCNH), tyrosine-protein kinase receptor (TYRO3), mitogen-activated protein kinase 3 (MAPK3), tubulin folding cofactor E (TBCE), TNF receptor superfamily member 6B (TNFRSF6B), arginase 1(ARG1), drebrin-like (DBNL), C-type mannose receptor 2 (MRC2), sex hormone–binding globulin (SHBG), activating transcription factor 6β (ATF6B), spermatogenesis-associated protein 20 (SPATA20), sodium/potassium-transporting ATPase subunit β2 (ATP1B2), MANSC domain containing 4 (MANSC4), haptoglobin (HP), β-mannosidase (MANBA), α1–3-N-acetylgalactosaminyltransferase and α1–3-galactosyltransferase (ABO), ACE I (peptidyl-dipeptidase A) 1 (ACE), peptidyl-glycine α-amidating monooxygenase (PAM), and neural EGFLlike 1 (NELL1). Importantly, as demonstrated in Fig. 2A, we observed MR ORs ranging from 0.69 (for CDK2AP1) to up to 1.30 (for TYRO3) per SD increase in protein levels. This means that the risk of type 2 diabetes was increased to 1.3-fold per genetically predicted SD increase in TRYO3 level (MR OR 1.29 [95% CI 1.19–1.42]; P = 1.9 × 10−9), while a genetically predicted SD increase in CDK2AP1 was associated with decreased risk of type 2 diabetes by ∼30% (MR OR 0.69 [95% CI 0.61–0.78]; P = 2.3 × 10−9).
Forest plots displaying the results of the MR analyses. A: Forest plot displaying the MR OR and 95% CIs of BMI-unadjusted adult type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level. B: Forest plot displaying the MR OR and 95% CIs of youth-onset type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level. C: Forest plot displaying the MR OR and 95% CIs of BMI-adjusted adult type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level.
Forest plots displaying the results of the MR analyses. A: Forest plot displaying the MR OR and 95% CIs of BMI-unadjusted adult type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level. B: Forest plot displaying the MR OR and 95% CIs of youth-onset type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level. C: Forest plot displaying the MR OR and 95% CIs of BMI-adjusted adult type 2 diabetes per genetically predicted 1 SD increase of each candidate protein level.
MR results for circulating proteins associated with BMI-unadjusted adult type 2 diabetes, after Bonferroni correction
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
MANSC4 | 12 | 27927881 | rs36138811 | 0.23 | C | 0.90 | 0.88–0.92 | 3.81 × 10−18 | 0.14 | 557.01 | Sun et al. (8) |
MANSC4 | 12 | 27923241 | rs11049131 | 0.77 | G | 0.92 | 0.91–0.94 | 3.81 × 10−18 | 0.24 | 1018.26 | Emilsson et al. (9) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.04 | 1.03–1.05 | 6.62 × 10−12 | 0.72 | 8668.98 | Sun et al. (8) |
ABO | 9 | 136144960 | rs492488 | 0.74 | G | 1.00 | 1.00–1.00 | 1.39 × 10−9 | 0.82 | 14796.78 | Emilsson et al. (9) |
TYRO3 | 15 | 41860698 | rs2289743 | 0.69 | C | 1.30 | 1.19–1.41 | 1.97 × 10−9 | 0.01 | 35.80 | Emilsson et al. (9) |
CDK2AP1 | 12 | 123614813 | rs2510885 | 0.75 | C | 0.69 | 0.61–0.78 | 2.31 × 10−9 | 0.01 | 18.45 | Emilsson et al. (9) |
PAM | 5 | 102418604 | rs257309 | 0.35 | G | 0.92 | 0.89–0.94 | 2.37 × 10−9 | 0.10 | 363.29 | Sun et al. (8) |
CCNH | 5 | 86577352 | rs7719891 | 0.76 | A | 0.72 | 0.64–0.81 | 2.77 × 10−8 | 0.01 | 17.59 | Emilsson et al. (9) |
TBCE | 1 | 235594951 | rs10802708 | 0.65 | C | 0.79 | 0.73–0.86 | 4.91 × 10−8 | 0.09 | 314.04 | Emilsson et al. (9) |
MANBA | 4 | 103680984 | rs223489 | 0.66 | A | 0.92 | 0.90–0.95 | 4.91 × 10−8 | 0.01 | 34.98 | Emilsson et al. (9) |
MANBA | 4 | 103612043 | rs227370 | 0.67 | C | 0.94 | 0.92–0.96 | 3.88 × 10−7 | 0.14 | 525.84 | Sun et al. (8) |
ACE | 17 | 61566724 | rs4344 | 0.50 | A | 0.95 | 0.93–0.97 | 5.73 × 10−7 | 0.17 | 654.75 | Emilsson et al. (9) |
ATF6B | 6 | 32113980 | rs114887538 | 0.76 | G | 1.15 | 1.09–1.23 | 1.68 × 10−6 | 0.02 | 65.55 | Emilsson et al. (9) |
DBNL | 7 | 44156146 | rs3087367 | 0.56 | G | 0.87 | 0.82–0.93 | 5.86 × 10−6 | 0.02 | 73.28 | Emilsson et al. (9) |
HP | 16 | 72105965 | rs217184 | 0.20 | C | 0.96 | 0.94–0.98 | 1.13 × 10−5 | 0.24 | 1027.97 | Sun et al. (8) |
SHBG | 17 | 7531965 | rs858519 | 0.47 | T | 0.88 | 0.83–0.93 | 1.21 × 10−5 | 0.04 | 138.35 | Emilsson et al. (9) |
ATP1B2 | 17 | 7554772 | rs1642762 | 0.59 | T | 1.10 | 1.06–1.15 | 1.21 × 10−5 | 0.02 | 74.74 | Sun et al. (8) |
ARG1 | 6 | 131897278 | rs2781668 | 0.80 | C | 0.80 | 0.72–0.88 | 1.34 × 10−5 | 0.01 | 27.59 | Emilsson et al. (9) |
HP | 16 | 72,114,002 | rs217181 | 0.20 | T | 1.04 | 1.02–1.05 | 1.55 × 10−5 | 0.29 | 403.95 | Suhre et al. (11) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.23 | 1.12–1.35 | 2.00 × 10−5 | 0.01 | 30.57 | Emilsson et al. (9) |
SPATA20 | 17 | 48624523 | rs9890200 | 0.38 | C | 1.12 | 1.06–1.18 | 2.21 × 10−5 | 0.03 | 102.32 | Sun et al. (8) |
MRC2 | 17 | rs217184 | rs146385050 | 0.20 | A | 0.85 | 0.79–0.92 | 2.84 × 10−5 | 0.02 | 50.35 | Sun et al. (8) |
MAPK3 | 16 | 30134656 | rs28529403 | 0.40 | C | 1.27 | 1.13–1.42 | 3.27 × 10−5 | 0.01 | 19.75 | Emilsson et al. (9) |
HP | 16 | 72079657 | rs77303550 | 0.82 | C | 0.96 | 0.94–0.98 | 3.38 × 10−5 | 0.23 | 952.73 | Emilsson et al. (9) |
NELL1 | 11 | 20952237 | rs16907058 | 0.95 | G | 1.06 | 1.03–1.09 | 3.43 × 10−5 | 0.09 | 334.45 | Emilsson et al. (9) |
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
MANSC4 | 12 | 27927881 | rs36138811 | 0.23 | C | 0.90 | 0.88–0.92 | 3.81 × 10−18 | 0.14 | 557.01 | Sun et al. (8) |
MANSC4 | 12 | 27923241 | rs11049131 | 0.77 | G | 0.92 | 0.91–0.94 | 3.81 × 10−18 | 0.24 | 1018.26 | Emilsson et al. (9) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.04 | 1.03–1.05 | 6.62 × 10−12 | 0.72 | 8668.98 | Sun et al. (8) |
ABO | 9 | 136144960 | rs492488 | 0.74 | G | 1.00 | 1.00–1.00 | 1.39 × 10−9 | 0.82 | 14796.78 | Emilsson et al. (9) |
TYRO3 | 15 | 41860698 | rs2289743 | 0.69 | C | 1.30 | 1.19–1.41 | 1.97 × 10−9 | 0.01 | 35.80 | Emilsson et al. (9) |
CDK2AP1 | 12 | 123614813 | rs2510885 | 0.75 | C | 0.69 | 0.61–0.78 | 2.31 × 10−9 | 0.01 | 18.45 | Emilsson et al. (9) |
PAM | 5 | 102418604 | rs257309 | 0.35 | G | 0.92 | 0.89–0.94 | 2.37 × 10−9 | 0.10 | 363.29 | Sun et al. (8) |
CCNH | 5 | 86577352 | rs7719891 | 0.76 | A | 0.72 | 0.64–0.81 | 2.77 × 10−8 | 0.01 | 17.59 | Emilsson et al. (9) |
TBCE | 1 | 235594951 | rs10802708 | 0.65 | C | 0.79 | 0.73–0.86 | 4.91 × 10−8 | 0.09 | 314.04 | Emilsson et al. (9) |
MANBA | 4 | 103680984 | rs223489 | 0.66 | A | 0.92 | 0.90–0.95 | 4.91 × 10−8 | 0.01 | 34.98 | Emilsson et al. (9) |
MANBA | 4 | 103612043 | rs227370 | 0.67 | C | 0.94 | 0.92–0.96 | 3.88 × 10−7 | 0.14 | 525.84 | Sun et al. (8) |
ACE | 17 | 61566724 | rs4344 | 0.50 | A | 0.95 | 0.93–0.97 | 5.73 × 10−7 | 0.17 | 654.75 | Emilsson et al. (9) |
ATF6B | 6 | 32113980 | rs114887538 | 0.76 | G | 1.15 | 1.09–1.23 | 1.68 × 10−6 | 0.02 | 65.55 | Emilsson et al. (9) |
DBNL | 7 | 44156146 | rs3087367 | 0.56 | G | 0.87 | 0.82–0.93 | 5.86 × 10−6 | 0.02 | 73.28 | Emilsson et al. (9) |
HP | 16 | 72105965 | rs217184 | 0.20 | C | 0.96 | 0.94–0.98 | 1.13 × 10−5 | 0.24 | 1027.97 | Sun et al. (8) |
SHBG | 17 | 7531965 | rs858519 | 0.47 | T | 0.88 | 0.83–0.93 | 1.21 × 10−5 | 0.04 | 138.35 | Emilsson et al. (9) |
ATP1B2 | 17 | 7554772 | rs1642762 | 0.59 | T | 1.10 | 1.06–1.15 | 1.21 × 10−5 | 0.02 | 74.74 | Sun et al. (8) |
ARG1 | 6 | 131897278 | rs2781668 | 0.80 | C | 0.80 | 0.72–0.88 | 1.34 × 10−5 | 0.01 | 27.59 | Emilsson et al. (9) |
HP | 16 | 72,114,002 | rs217181 | 0.20 | T | 1.04 | 1.02–1.05 | 1.55 × 10−5 | 0.29 | 403.95 | Suhre et al. (11) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.23 | 1.12–1.35 | 2.00 × 10−5 | 0.01 | 30.57 | Emilsson et al. (9) |
SPATA20 | 17 | 48624523 | rs9890200 | 0.38 | C | 1.12 | 1.06–1.18 | 2.21 × 10−5 | 0.03 | 102.32 | Sun et al. (8) |
MRC2 | 17 | rs217184 | rs146385050 | 0.20 | A | 0.85 | 0.79–0.92 | 2.84 × 10−5 | 0.02 | 50.35 | Sun et al. (8) |
MAPK3 | 16 | 30134656 | rs28529403 | 0.40 | C | 1.27 | 1.13–1.42 | 3.27 × 10−5 | 0.01 | 19.75 | Emilsson et al. (9) |
HP | 16 | 72079657 | rs77303550 | 0.82 | C | 0.96 | 0.94–0.98 | 3.38 × 10−5 | 0.23 | 952.73 | Emilsson et al. (9) |
NELL1 | 11 | 20952237 | rs16907058 | 0.95 | G | 1.06 | 1.03–1.09 | 3.43 × 10−5 | 0.09 | 334.45 | Emilsson et al. (9) |
MR OR represents the OR for type 2 diabetes per 1 SD increase in the protein level.
Chr., chromosome; EA, effect allele; EAF, effect allele frequency.
Next, we undertook an MR study using cis-pQTLs linked to 278 proteins that were nominally associated with adult type 2 diabetes in the previous MR study to evaluate the causal role of these prioritized proteins in youth-onset type 2 diabetes (2) (Fig. 1). By querying the 278 cis-pQTLs in the youth-onset type 2 diabetes GWAS, effects of cis-pQTLs from 174 unique circulating proteins were retrieved (Table 2 and Supplementary Table 3), which were used as genetic instruments for their respective proteins in our MR studies. As shown in Table 2, our MR analyses demonstrated 11 proteins nominally associated with risk of youth-onset type 2 diabetes, but after Bonferroni correction (P value threshold for significance = 0.05/174 or 2.8 × 10−4), no protein was significantly associated with risk of youth-onset type 2 diabetes. The 11 circulating proteins, all with F statistics >10, are, namely: growth differentiation factor 15 (GDF15), antiselectin-like osteoblast-derived protein 1 (SVEP1), surface glycoprotein, Ig superfamily member (CDON), kinase insert domain receptor (KDR), cytochrome B5 type A (CYB5A), complement component 4A/B (Rodgers blood group) (C4A/C4B) complex, fibroblast growth factor 2 (FGF2), and CCNH, TNFRSF6B, ABO, and ACE; the latter 4 proteins are in common with adult type 2 diabetes. As demonstrated in Fig. 2B, we observed MR ORs ranging from 0.77 (for CYB5A and CCNH) to up to 1.25 (for TNFRSF6B) per SD increase in protein levels. Specifically, a genetically predicted SD increase in CYB5A and CCNH levels was associated with ∼20% decreased risk of type 2 diabetes (MR OR 0.77 [95% CI 0.61–0.97], P = 0.03; and MR OR 0.77 [95% CI 0.61–0.98], P = 0.03, respectively), while a 25% increase in the risk of youth-onset type 2 diabetes was detected per genetically predicted SD increase in TNFRSF6B protein levels (MR OR 1.25 [95% CI 1.01–1.54]; P = 0.013).
MR results for circulating proteins associated with youth-onset type 2 diabetes
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
GDF15 | 19 | 18503194 | rs45543339 | 0.26 | T | 1.07 | 1.02–1.13 | 0.009 | 0.13 | 479.97 | Sun et al. (8) |
SVEP1 | 9 | 113312231 | rs61751937 | 0.03 | C | 0.89 | 0.81–0.98 | 0.015 | 0.08 | 294.12 | Sun et al. (8) |
SVEP1 | 9 | 113260708 | rs78742138 | 0.96 | T | 0.82 | 0.69–0.97 | 0.020 | 0.03 | 101.27 | Emilsson et al. (9) |
CDON | 11 | 125897840 | rs60929339 | 0.93 | G | 0.91 | 0.84–0.99 | 0.021 | 0.06 | 214.61 | Emilsson et al. (9) |
CDON | 11 | 125889526 | rs3740909 | 0.07 | T | 1.08 | 1.01–1.15 | 0.022 | 0.09 | 102.35 | Suhre et al. (11) |
KDR | 4 | 55979558 | rs2305948 | 0.93 | C | 0.89 | 0.81–0.98 | 0.023 | 0.03 | 86.60 | Emilsson et al. (9) |
CYB5A | 18 | 71945370 | rs7239618 | 0.68 | T | 0.77 | 0.61–0.97 | 0.030 | 0.01 | 22.21 | Emilsson et al. (9) |
CCNH | 5 | 86577352 | rs7719891 | 0.76 | A | 0.77 | 0.61–0.98 | 0.032 | 0.01 | 17.59 | Emilsson et al. (9) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.02 | 1.00–1.05 | 0.036 | 0.72 | 8668.98 | Sun et al. (8) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.25 | 1.01–1.54 | 0.037 | 0.01 | 30.57 | Emilsson et al. (9) |
ACE | 17 | 61566724 | rs4344 | 0.50 | A | 1.05 | 1.00–1.10 | 0.042 | 0.17 | 654.75 | Emilsson et al. (9) |
C4A C4B | 6 | 31928691 | rs2280774 | 0.37 | A | 1.06 | 1.00–1.13 | 0.046 | 0.12 | 134.84 | Suhre et al. (11) |
FGF2 | 4 | 123757748 | rs308403 | 0.32 | T | 0.95 | 0.91–1.00 | 0.046 | 0.15 | 177.84 | Suhre et al. (11) |
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
GDF15 | 19 | 18503194 | rs45543339 | 0.26 | T | 1.07 | 1.02–1.13 | 0.009 | 0.13 | 479.97 | Sun et al. (8) |
SVEP1 | 9 | 113312231 | rs61751937 | 0.03 | C | 0.89 | 0.81–0.98 | 0.015 | 0.08 | 294.12 | Sun et al. (8) |
SVEP1 | 9 | 113260708 | rs78742138 | 0.96 | T | 0.82 | 0.69–0.97 | 0.020 | 0.03 | 101.27 | Emilsson et al. (9) |
CDON | 11 | 125897840 | rs60929339 | 0.93 | G | 0.91 | 0.84–0.99 | 0.021 | 0.06 | 214.61 | Emilsson et al. (9) |
CDON | 11 | 125889526 | rs3740909 | 0.07 | T | 1.08 | 1.01–1.15 | 0.022 | 0.09 | 102.35 | Suhre et al. (11) |
KDR | 4 | 55979558 | rs2305948 | 0.93 | C | 0.89 | 0.81–0.98 | 0.023 | 0.03 | 86.60 | Emilsson et al. (9) |
CYB5A | 18 | 71945370 | rs7239618 | 0.68 | T | 0.77 | 0.61–0.97 | 0.030 | 0.01 | 22.21 | Emilsson et al. (9) |
CCNH | 5 | 86577352 | rs7719891 | 0.76 | A | 0.77 | 0.61–0.98 | 0.032 | 0.01 | 17.59 | Emilsson et al. (9) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.02 | 1.00–1.05 | 0.036 | 0.72 | 8668.98 | Sun et al. (8) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.25 | 1.01–1.54 | 0.037 | 0.01 | 30.57 | Emilsson et al. (9) |
ACE | 17 | 61566724 | rs4344 | 0.50 | A | 1.05 | 1.00–1.10 | 0.042 | 0.17 | 654.75 | Emilsson et al. (9) |
C4A C4B | 6 | 31928691 | rs2280774 | 0.37 | A | 1.06 | 1.00–1.13 | 0.046 | 0.12 | 134.84 | Suhre et al. (11) |
FGF2 | 4 | 123757748 | rs308403 | 0.32 | T | 0.95 | 0.91–1.00 | 0.046 | 0.15 | 177.84 | Suhre et al. (11) |
MR OR represents the OR for youth-onset type 2 diabetes per 1 SD increase in the protein level.
Chr., chromosome; EA, effect allele; EAF, effect allele frequency.
Obesity, expressed as an increased BMI, increases the risk of type 2 diabetes (14,23). Therefore, we assessed whether BMI could have affected the findings of our main MR analysis on adult type 2 diabetes risk. Using cis-pQTLs for candidate proteins from the same five proteomic GWAS (8–12), we queried their effects on BMI-adjusted adult type 2 diabetes risk in the DIAMANTE consortium (13). We identified effects on BMI-adjusted type 2 diabetes for 915 unique circulating proteins with distinct cis-pQTLs, which were used as instruments in our MR studies (Table 3 and Supplementary Table 4). After Bonferroni correction for multiple testing (P value threshold for significance = 0.05/915 or 5.45 × 10−5), MR effects for nine of the candidate proteins from the main MR analysis were attenuated after adjusting the outcome (type 2 diabetes) for BMI (Table 3). However, 13 circulating proteins remained causally associated with type 2 diabetes, for which cis-pQTL have F statistics >10 (Table 3). These 13 proteins included TYRO3, TNFRSF6B, TBCE, SHBG, MRC2, DBNL, ATP1B2, ATF6B, PAM, ABO, and MANSC4, which have been also associated with the risk of BMI-unadjusted type 2 diabetes, and 2 novel proteins, endoplasmic reticulum oxidoreductase 1β (ERO1LB) and polypeptide-related sequence B (MICB) (Table 3). As demonstrated in Fig. 2C, after adjusting for BMI, we obtained comparable ORs ranging from 0.79 (0.72–0.86) (for MRC2) to 1.34 (1.21–1.48) (for TYRO3) per SD increase in protein levels.
MR results for circulating proteins associated with BMI-adjusted adult type 2 diabetes, after Bonferroni correction
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
MANSC4 | 12 | 27927881 | rs36138811 | 0.23 | C | 0.91 | 0.88–0.93 | 3.25 × 10−12 | 0.144 | 557.006 | Sun et al. (8) |
MANSC4 | 12 | 27923241 | rs11049131 | 0.77 | G | 0.93 | 0.91–0.95 | 3.25 × 10−12 | 0.242 | 1018.259 | Emilsson et al. (9) |
ATP1B2 | 17 | 7554772 | rs1642762 | 0.59 | T | 1.17 | 1.11–1.23 | 7.06 × 10−9 | 0.040 | 138.346 | Sun et al. (8) |
TYRO3 | 15 | 41860698 | rs2289743 | 0.69 | C | 1.34 | 1.21–1.48 | 9.94 × 10−9 | 0.011 | 35.796 | Emilsson et al. (9) |
PAM | 5 | 102418604 | rs257309 | 0.35 | G | 0.91 | 0.88–0.94 | 1.69 × 10−8 | 0.099 | 363.292 | Sun et al. (8) |
SHBG | 17 | 7531965 | rs858519 | 0.47 | T | 0.83 | 0.77–0.88 | 4.59 × 10−8 | 0.023 | 74.736 | Emilsson et al. (9) |
MRC2 | 17 | 60637258 | rs146385050 | 0.20 | A | 0.79 | 0.72–0.86 | 1.99 × 10−7 | 0.015 | 50.350 | Sun et al. (8) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.03 | 1.02–1.04 | 4.12 × 10−7 | 0.724 | 8668.978 | Sun et al. (8) |
ABO | 9 | 136144960 | rs492488 | 0.74 | G | 1.03 | 1.02–1.04 | 7.95 × 10−7 | 0.811 | 13727.667 | Emilsson et al. (9) |
ATF6B | 6 | 32113980 | rs114887538 | 0.76 | G | 1.19 | 1.11–1.28 | 1.06 × 10−6 | 0.020 | 65.546 | Emilsson et al. (9) |
DBNL | 7 | 44156146 | rs3087367 | 0.56 | G | 0.85 | 0.79–0.91 | 3.06 × 10−6 | 0.022 | 73.276 | Emilsson et al. (9) |
MICB | 6 | 31472720 | rs2855812 | 0.80 | G | 0.95 | 0.92–0.97 | 5.76 × 10−6 | 0.156 | 589.275 | Emilsson et al. (9) |
ERO1LB | 1 | 236399442 | rs1254194 | 0.60 | T | 1.09 | 1.05–1.14 | 7.69 × 10−6 | 0.070 | 249.683 | Sun et al. (8) |
TBCE | 1 | 235594951 | rs10802708 | 0.65 | C | 0.81 | 0.73–0.89 | 1.82 × 10−5 | 0.011 | 34.978 | Emilsson et al. (9) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.27 | 1.14–1.42 | 2.70 × 10−5 | 0.009 | 30.568 | Emilsson et al. (9) |
Protein . | Chr. . | Position . | rs number cis-pQTL . | EAF . | EA . | MR OR . | 95% CI . | MR P value . | R2 . | F statistic . | Source (first author, reference) . |
---|---|---|---|---|---|---|---|---|---|---|---|
MANSC4 | 12 | 27927881 | rs36138811 | 0.23 | C | 0.91 | 0.88–0.93 | 3.25 × 10−12 | 0.144 | 557.006 | Sun et al. (8) |
MANSC4 | 12 | 27923241 | rs11049131 | 0.77 | G | 0.93 | 0.91–0.95 | 3.25 × 10−12 | 0.242 | 1018.259 | Emilsson et al. (9) |
ATP1B2 | 17 | 7554772 | rs1642762 | 0.59 | T | 1.17 | 1.11–1.23 | 7.06 × 10−9 | 0.040 | 138.346 | Sun et al. (8) |
TYRO3 | 15 | 41860698 | rs2289743 | 0.69 | C | 1.34 | 1.21–1.48 | 9.94 × 10−9 | 0.011 | 35.796 | Emilsson et al. (9) |
PAM | 5 | 102418604 | rs257309 | 0.35 | G | 0.91 | 0.88–0.94 | 1.69 × 10−8 | 0.099 | 363.292 | Sun et al. (8) |
SHBG | 17 | 7531965 | rs858519 | 0.47 | T | 0.83 | 0.77–0.88 | 4.59 × 10−8 | 0.023 | 74.736 | Emilsson et al. (9) |
MRC2 | 17 | 60637258 | rs146385050 | 0.20 | A | 0.79 | 0.72–0.86 | 1.99 × 10−7 | 0.015 | 50.350 | Sun et al. (8) |
ABO | 9 | 136149229 | rs505922 | 0.31 | C | 1.03 | 1.02–1.04 | 4.12 × 10−7 | 0.724 | 8668.978 | Sun et al. (8) |
ABO | 9 | 136144960 | rs492488 | 0.74 | G | 1.03 | 1.02–1.04 | 7.95 × 10−7 | 0.811 | 13727.667 | Emilsson et al. (9) |
ATF6B | 6 | 32113980 | rs114887538 | 0.76 | G | 1.19 | 1.11–1.28 | 1.06 × 10−6 | 0.020 | 65.546 | Emilsson et al. (9) |
DBNL | 7 | 44156146 | rs3087367 | 0.56 | G | 0.85 | 0.79–0.91 | 3.06 × 10−6 | 0.022 | 73.276 | Emilsson et al. (9) |
MICB | 6 | 31472720 | rs2855812 | 0.80 | G | 0.95 | 0.92–0.97 | 5.76 × 10−6 | 0.156 | 589.275 | Emilsson et al. (9) |
ERO1LB | 1 | 236399442 | rs1254194 | 0.60 | T | 1.09 | 1.05–1.14 | 7.69 × 10−6 | 0.070 | 249.683 | Sun et al. (8) |
TBCE | 1 | 235594951 | rs10802708 | 0.65 | C | 0.81 | 0.73–0.89 | 1.82 × 10−5 | 0.011 | 34.978 | Emilsson et al. (9) |
TNFRSF6B | 20 | 62370349 | rs1056441 | 0.61 | C | 1.27 | 1.14–1.42 | 2.70 × 10−5 | 0.009 | 30.568 | Emilsson et al. (9) |
MR OR represents the OR for type 2 diabetes per 1 SD increase in the protein level.
Chr., chromosome; EA, effect allele; EAF, effect allele frequency.
Sensitivity Analyses
Colocalization Analyses
Our colocalization analyses demonstrated that the posterior probability that MRC2 levels and type 2 diabetes shared a single causal signal was H4 = 0.92, suggesting that the two traits shared a single causal variant within the 1-Mb locus around the rs146385050 cis-pQTL (Supplementary Fig. 1A). Similar colocalization results were observed for circulating ATP1B2 levels with H4 = 0.96 (Supplementary Fig. 1B), SPATA20 levels with H4 = 0.84 (Supplementary Fig. 1C), HP levels with H4 = 0.95 (Supplementary Fig. 1D), ABO levels with H4 = 0.54 (Supplementary Fig. 1E), MANSC4 levels with H4 = 0.52 (Supplementary Fig. 1F), and ERO1LB levels with H4 = 0.88 (Supplementary Fig. 1G), implying single shared causal signals between the above protein levels and risk of adult type 2 diabetes. However, for all of the above proteins, except for ABO, the single lead SNP for type 2 diabetes and the respective proteins differs from the cis-pQTL used as an instrument in our MR studies, and, only for ERO1LB, the lead SNP (rs2463185) was in LD (R2 = 0.95) with its cis-pQTL (rs1254194). Interestingly, for circulating PAM levels, our colocalization result showed posterior probability H3 = 1.0 (Supplementary Fig. 1H), similar to MANBA levels with H3 = 0.83 (Supplementary Fig. 1I), suggesting that the two traits are linked to type 2 diabetes through two independent SNPs in the same locus, which implies a possible bias due to LD.
For youth-onset type 2 diabetes, we found that the posterior probability that GDF15 levels and type 2 diabetes shared a single causal signal was low (H4 = 0.19) and as such did not colocalize with youth-onset type 2 diabetes (Supplementary Fig. 1J), and a similar result was found for SVEP1 (Supplementary Fig. 1K). Conversely, our analysis showed that, similar as in adult type 2 diabetes, ABO protein level colocalized with youth-onset type 2 diabetes (H4 = 0.55) (Supplementary Fig. 1L). The results of our colocalization analyses for both adult and youth-onset type 2 diabetes are illustrated in Fig. 3.
Venn diagram summarizing the candidate proteins prioritized by our MR analyses.
Assessment for PAV
We then assessed the cis-pQTL of all of our MR-prioritized proteins for being PAVs or in LD (R2 > 0.8) with PAVs. Our results demonstrated that, except for the cis-pQTL of SVEP1 (rs61751937), CDON (rs3740909), and KDR (rs2305948), which are missense variants or PAVs, the remaining MR prioritized cis-pQTLs are not PAVs (Supplementary Table 5). We further showed that the cis-pQTL rs60929339 (CDON) is in LD (R2 = 0.91) with the missense variant rs3740909, and the cis-pQTL rs2855812 of MICB is in LD with two missense variants (rs1065075, R2 = 0.805; rs1051788, R2 = 0.805). Also, the rs8176786 for NELL1 is in perfect LD with the rs16907058 (missense variant, R2 = 1), and the rs1056441 cis-pQTL for TNFRSF6B is in LD with rs8957 (missense variant, R2 = 0.807). For SPATA20, its cis-pQTL rs9890200 is in LD with rs8076632 (missense variant, splice region variant, R2 = 1), while for ACE, its cis-pQTL(rs4344) is in LD with rs4316 (missense variant, R2 = 0.959) and with rs4362 (missense variant, R2 = 0.920) (Supplementary Tables 6 and 7). Taken together, these findings suggest that, except for SVEP1, CDON, KDR, MICB, NELL1, TNFRSF6B, ACE, and SPATA20, the affinity and binding sites of the rest of the MR-prioritized proteins are not affected by PAVs, and therefore, these proteins can be reliably quantified for further validation studies.
Confounder Assessment
Our confounder assessment using PhenoScanner (18) demonstrated that the cis-pQTL for TNFRSF6B, ATF6B, CDK2AP1, MAPK3, ARG1, GDF15, C4A/C4B complex, and FGF2 were genome-wide significantly associated with whole-body fat mass, body fat percentage, and whole-body fat-free mass (Supplementary Table 8 and 9 and Fig. 3). This indicates that our MR estimate of the effect of the above proteins on type 2 diabetes risk might have been driven by the above confounders. However, for the remaining 21 candidate proteins for adult and youth-onset type 2 diabetes, we observed either no association or nominal association with the confounding traits related to type 2 diabetes, such as whole-body fat-free mass, whole-body fat mass, obesity/overweight, waist-to-hip ratio, and body fat percentage (Supplementary Table 8 and 10). This reinforces the hypothesis that these proteins may have a direct causal effect on type 2 diabetes, which is not mediated by adiposity.
eQTL Assessment
Our eQTL assessment using GTEx (22) demonstrated that cis-pQTLs for CDK2AP1, CCNH, TYRO3, MAPK3, TBCE, TNFRSF6B, ARG1, MRC2, SPATA20, PAM, ERO1LB, ACE, HP, and CYB5A were eQTLs for their respective genes in tissues such as whole blood, skin, fibroblasts, pancreas, adipose-visceral or subcutaneous tissue, and skeletal muscle (Supplementary Table 11). Interestingly, SHBG’s and CDON’s cis-pQTLs are eQTLs in pituitary and thyroid; cis-pQTLs for MANSC4, C4A/C4B complex, and FGF2 are eQTLs for the respective genes in adrenal gland, testis, coronary artery, and esophagus; and for cis-pQTLs of ATP1B2, MICB, and MANBA, the same applied in tissue from the tibial artery (Supplementary Table 12). The cis-pQTLs associated with DBNL, ATF6B, ABO, NELL1, GDF15, SVEP1, and KDR proteins were not identified as eQTLs in the GTEx database.
Multi-Instrument MR Analyses
Our multiple SNP MR analysis including trans-pQTL demonstrated a very slight tightening of the 95% CI of our MR OR for type 2 diabetes only for PAM: specifically, the 95% CI changed from 0.89–0.94 to 0.9–0.94. This result suggests that adding trans-pQTL to our MR instruments did not significantly increase the power of our MR analyses to detect associations between circulating proteins and adult type 2 diabetes risk (Supplementary Table 13 and Supplementary Fig. 2).
Discussion
Using MR, we provided evidence that genetically altered levels of 22 circulating proteins (CDK2AP1, CCNH, TYRO3, MAPK3, TBCE, TNFRSF6B, ARG1, DBNL, MRC2, SHBG, ATF6B, SPATA20, ATP1B2, MANSC4, HP, MANBA, ABO, ACE, PAM, NELL1, ERO1LB, and MICB) are likely to be causally linked to adult type 2 diabetes risk, while 11 proteins (GDF15, SVEP1, CDON, KDR, CYB5A, C4A/C4B complex, FGF2, CCNH, TNFRSF6B, ABO, and ACE) presented suggestive evidence of association with risk of youth-onset type 2 diabetes. Our sensitivity analysis indicated that after adjusting our adult type 2 diabetes outcome for BMI, 13 proteins showed significant MR associations. These findings are supported by evidence from colocalization, showing that 7 of the above-mentioned 22 proteins share a single causal SNP with adult type 2 diabetes, namely MRC2, ATP1B2, ERO1LB, HP, ABO, SPATA20, and MANSC4, or are linked to type 2 diabetes via 2 independent SNPs, in the case of PAM and MANBA. In addition, our follow-up analyses showed that the majority of our candidate proteins are not significantly associated with confounding traits linked to type 2 diabetes risk, which reinforces their causal role in type 2 diabetes (Fig. 3). We demonstrated that the cis-pQTLs associated with the candidate proteins affect expression of their genes in skeletal muscle, adipose tissue, and pancreas, which are all relevant tissues in the pathophysiology of type 2 diabetes (24). Finally, we demonstrated that the majority of the cis-pQTLs used as instruments for the above proteins are not sequence variants or in LD with such variants, indicating that these proteins can be reliably quantified in future validation studies. While the individual effects of genetically altered levels of these proteins on risk of type 2 diabetes are small, these molecules represent potential druggable targets and pinpoint to causal pathways that can be targeted for intervention.
Among the 20 proteins prioritized by our main MR analysis on BMI-unadjusted type 2 diabetes, 9 proteins, namely CDK2AP1, CCNH, ACE, MAPK3, SPATA20, MANBA, HP, ARG1, and NELL1, were attenuated after adjusting for BMI, which indicates that their effects on type 2 diabetes risk were probably mediated by BMI. Interestingly, these proteins are involved in insulin secretion (25,26), diabetic cardiomyopathy (27), arm adiposity (28), vascular dysfunction (29,30), and diabetic nephropathy (31), respectively.
After adjusting for BMI in adult type 2 diabetes, our MR study replicated three proteins with evidence for association with type 2 diabetes from a previous MR study (7), including SHBG, DBNL, and ATP1B2, showing effects of these proteins on type 2 diabetes in the same direction as the previous report. We also identified PAM and ABO, with prior evidence of involvement in β-cell function (32) and insulin secretion (33,34), respectively, and TYRO3 (35), which was shown to be associated with type 2 diabetes in individuals with cardiovascular diseases. In addition, our MR study prioritized seven novel candidate proteins, among which MANSC4, TNFRSF6B, and MRC2 have known roles in diabetic nephropathy (36–38). Notably, it has been shown that MRC2 promotes proliferation and inhibits apoptosis in the diabetic kidney, while it also contributes to type 2 diabetes pathogenesis (38); however, this is in the opposite direction from our MR result, demonstrating that increased levels of MRC2 are associated with decreased type 2 diabetes risk. Thus, further investigation with direct measurement of this protein in case-control cohorts with type 2 diabetes is required to clarify the role of MRC2 in type 2 diabetes. Among the remaining four novel candidate proteins, ATF6B and ERO1LB are known to be involved in pancreatic β-cell function (39,40), and TBCE has been associated with glycemic traits and obesity in humans (41). While there is contradictory evidence regarding the role of ERO1LB as protective or pathogenic in type 2 diabetes (40,42), our study demonstrated that increased circulating levels of ERO1LB increase type 2 diabetes risk. Finally, MICB, encoded by the human MHC class I chain–related gene, has a known association with type 1 diabetes risk (43).
Our youth-onset type 2 diabetes MR analysis identified 11 proteins with suggestive evidence of association with type 2 diabetes risk in childhood, of which 4 proteins (CCNH, ABO, ACE, and TNFRSF6B) are in common with adult type 2 diabetes, and GDF15 was replicated from a previous MR study for adult type 2 diabetes (7) (Fig. 3). SVEP1, KDR, and FGF2 proteins have been shown to be involved in cardiovascular disease (44) and diabetic retinopathy (45,46), while CDON (47) and CYB5A (48) have been involved in adiposity associated with type 2 diabetes. In a whole-exome sequencing study in an American Indian population, CYB5A was positively associated with obesity and nominally associated with increased risk of type 2 diabetes (49). However, the direction of the effect of CYB5A on type 2 diabetes in this study (49) is not in agreement with our MR result, as we show that the genetically increased CYB5A levels decrease the risk of youth-onset type 2 diabetes. Thus, direct measurement of the protein in an independent case-control cohort is required to validate this MR result.
Although observational studies (3–5,50) have identified candidate plasma proteins as biomarkers of adult type 2 diabetes, the major strength of our study is that we used MR, an established approach in genetic epidemiology known to limit bias from confounding and reverse causation. Although recently a study using a similar MR design sought to identify causal proteins for type 2 diabetes in a large European cohort, the SNPs used as genetic instruments were not associated with the protein exposures at a genome-wide level (7). In our MR study, we leveraged data from the largest protein GWAS consortia available to date to maximize our yield in tested proteins with available genome-wide significant cis-pQTLs (8–12) and from the largest available type 2 diabetes GWAS (13) to ensure adequate statistical power for our MR analyses. Moreover, we sought to identify child-specific protein biomarkers using data from the only available youth-onset type 2 diabetes GWAS (2). By using solely genome-wide significant cis-acting pQTLs as instruments for our protein exposures, we prevented horizontal pleiotropy in our MR (51). We undertook multiple sensitivity analyses to account for possible mediating or confounding effects related to obesity and adiposity affecting the findings of our main MR analysis on both adult- and youth-onset type 2 diabetes. Finally, we performed colocalization analyses as an additional strategy to explore association between the candidate protein biomarkers and type 2 diabetes risk for a subset of candidate proteins with available summary-level GWAS results.
We are aware of a few considerable limitations of our study. First, the small sample size of the European subset in the only available youth-onset type 2 diabetes GWAS could not ensure adequate statistical power for discovery of candidate proteins with small individual effects on disease risk. Thus, we elected to report proteins with suggestive MR evidence for association with risk of youth-onset type 2 diabetes, while no proteins survived after multiple-testing correction. These findings should be validated in future MR studies using data from emerging larger pediatric type 2 diabetes GWAS consortia. Second, we did not perform validation studies to confirm our MR findings by directly measuring the candidate protein levels in independent case-control studies for type 2 diabetes, as these validation studies were out of the scope of this work. Also, these experiments are often strongly influenced by confounding and reverse causation. However, our PAV analysis ensured the feasibility of such studies, showing that most of these proteins can be measurable in a clinical setting using aptamer- or antibody-based bioassays. Third, although our colocalization analysis showed that type 2 diabetes and proteins such as MRC2, ATP1B2, SPATA20, ERO1LB, HP, ABO, and MANSC4 are linked via a single causal variant in the same locus, the lead SNPs were not the same as the corresponding cis-pQTLs, which implies possible bias due to LD. Nevertheless, a limitation of colocalization is the assumption of a single shared common causal SNP; however, in reality, genetic loci may contain several causal SNPs. Moreover, we observed that in our MR-prioritized proteins, 10 proteins (CDK2AP1, CCNH, TNFRSF6B, DBNL, MRC2, ATF6B, PAM, HP, GDF15, and C4A/C4B) have instruments (SNPs) that do not map directly in the protein gene itself, but rather next to the gene. Nevertheless, these SNPs still satisfied the definition of being a cis-pQTL for their respective protein in the proteomic GWAS (8–12), as these SNPs were located within a maximum of 1 Mb of the transcription start site of the gene encoding the measured protein. Finally, our study has been performed using only GWAS data from cohorts of European ancestry, and, as such, our results cannot be generalized to other ancestries. Upon availability of large proteomic and type 2 diabetic GWAS in diverse ancestries, future ancestry-specific MR studies are needed to cross validate our findings in non-Europeans.
In conclusion, our two-sample MR approach provides evidence for a causal role in adult type 2 diabetes and suggestive evidence for a role in youth-onset type 2 diabetes for the above-mentioned circulating proteins. While for a set of these circulating proteins, previous evidence for association with type 2 diabetes exists from observational and MR studies, we also identified novel candidate proteins with previously known involvement in type 2 diabetes complications, pancreatic β-cell functions, and adiposity. Our findings highlight a possible role of these circulating proteins in type 2 diabetes pathophysiology and support a potential utility of these molecules in drug development for type 2 diabetes in adults and children.
This article contains supplementary material online at https://doi.org/10.2337/figshare.19233129.
Article Information
Funding. D.M. received a Pediatric Endocrine Society Clinical Scholar Award and is a Fonds de recherche Québec–Santé and Canadian Child Health Clinician Scientist Program scholar.
The funding body had no involvement in the study design, data collection, analysis and interpretation of results, or writing of this manuscript.
Duality of Interest. J.B.R. has served as an advisor to GlaxoSmithKline and Deerfield Capital; his institution has received investigator-initiated grant funding from Eli Lilly and Company, GlaxoSmithKline, and Biogen for projects unrelated to this research; and he is the founder of 5 Prime Sciences. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. F.G., N.Y., M.Y., and D.M. conducted the analyses and interpretation of data. F.G. and D.M. produced the first draft of the manuscript. D.M. designed the study. All authors reviewed and approved the final version. D.M. is the guarantor of this work and, as such, had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.