OBJECTIVE

Integrated analyses of plasma proteomics and genetic data in prospective studies can help assess the causal relevance of proteins, improve risk prediction, and discover novel protein drug targets for type 2 diabetes (T2D).

RESEARCH DESIGN AND METHODS

We measured plasma levels of 2,923 proteins using Olink Explore among ∼2,000 randomly selected participants from China Kadoorie Biobank (CKB) without prior diabetes at baseline. Cox regression assessed associations of individual protein with incident T2D (n = 92 cases). Proteomic-based risk models were developed with discrimination, calibration, reclassification assessed using area under the curve (AUC), calibration plots, and net reclassification index (NRI), respectively. Two-sample Mendelian randomization (MR) analyses using cis-protein quantitative trait loci identified in a genome-wide association study of CKB and UK Biobank for specific proteins were conducted to assess their causal relevance for T2D, along with colocalization analyses to examine shared causal variants between proteins and T2D.

RESULTS

Overall, 33 proteins were significantly associated (false discovery rate <0.05) with risk of incident T2D, including IGFBP1, GHR, and amylase. The addition of these 33 proteins to a conventional risk prediction model improved AUC from 0.77 (0.73–0.82) to 0.88 (0.85–0.91) and NRI by 38%, with predicted risks well calibrated with observed risks. MR analyses provided support for the causal relevance for T2D of ENTR1, LPL, and PON3, with replication of ENTR1 and LPL in Europeans using different genetic instruments. Moreover, colocalization analyses showed strong evidence (pH4 > 0.6) of shared genetic variants of LPL and PON3 with T2D.

CONCLUSIONS

Proteomic analyses in Chinese adults identified novel associations of multiple proteins with T2D with strong genetic evidence supporting their causal relevance and potential as novel drug targets for prevention and treatment of T2D.

Globally, type 2 diabetes (T2D) affects >530 million adults (1), causing substantial risks of premature death and macrovascular and microvascular complications. China has the largest number of people with diabetes (>140 million) in the world, and the prevalence is still rising (1). Several important modifiable risk factors for T2D are established (e.g., adiposity, lack of physical activity, and suboptimal diet), and they account for about 70% of new cases globally (2). These risk factors have been widely used, typically in combination with blood glucose and/or HbA1c, to predict risk of T2D and inform prevention and treatment decision in diverse populations (3,4). Recently, a genome-wide association study (GWAS) of T2D in diverse populations identified >240 common genetic variants, including >180 in East Asian populations (5,6). However, the mechanisms underlying many of these associations remain to be elucidated. Plasma proteins play a central role in human biology and represent a primary source of therapeutic targets (7). Analyses of circulating protein biomarkers, particularly when integrated with genetic data, in population and clinical studies, can help clarify disease etiology, improve risk prediction and early diagnosis, and discover novel and repurposing therapeutic targets for treatment of T2D and other major diseases (8–12).

Previous studies of plasma proteins and T2D have highlighted the roles of several specific proteins (e.g., IGFBP1, IGFBP2, GHR, and SHBG) in etiology of T2D (12–16). Advances in high-throughput proteomic assays now enable measurement of several thousand proteins (17–19), and their application in population and clinical studies of primarily European ancestry populations have identified several novel protein biomarkers for T2D (12–16). However, little is known about the relevance of protein biomarkers for T2D in non-European ancestry populations, including Chinese, where the disease rates, distribution of risk factors, and genetic architecture differ greatly from European ancestry populations. Moreover, few previous studies undertook detailed genetic analyses (e.g., Mendelian randomization [MR] and colocalization analyses) to assess the causal relevance of specific proteins for T2D (16,20).

We undertook an integrated analysis of observational and genetic data of ∼3,000 proteins with incident T2D in ∼2,000 adults selected from the China Kadoorie Biobank (CKB) (CKB Research Track no. 2022-0036). The present report aims to 1) identify plasma proteins significantly associated with incident T2D, 2) assess the utility of selected proteins for prediction of T2D risk, 3) use cis-protein quantitative trait loci (cis-pQTLs) identified in the GWAS for proteins to assess their causal relevance for T2D via two-sample MR and separate colocalization analyses, and 4) clarify the mechanisms of action for specific proteins using a phenome-wide association study (PheWAS), tissue expression, and other experimental evidence.

Study Population and Data Collection

Details of the CKB study design, methods, and participants have been previously reported (21). Briefly, a total of 512,715 participants aged 30–79 years were enrolled from 10 (5 urban, 5 rural) geographically diverse regions in China. At baseline (June 2004 to July 2008) and at three subsequent resurveys in a ∼5% subset (in 2008, 2014, and 2021, respectively), detailed data were collected on sociodemographic characteristics, smoking, alcohol consumption, diet, physical activity, and personal (e.g., ischemic heart disease, stroke, and T2D) and family medical history, along with physical and blood measurements (e.g., blood pressure, BMI, and random blood glucose but not HbA1c).

Follow-up of CKB participants was through linkage to established mortality (cause-specific) and morbidity (including T2D) registries, and to the nationwide health insurance system, which records all hospitalized episodes (21). All disease events and causes of death were ICD-10 coded by trained health workers, blinded to baseline information, and checked and integrated centrally. Prior to starting the project, central ethics approvals were obtained from Oxford University and the China National Center for Disease Control and Prevention (CDC). In addition, approvals were also obtained from institutional research boards at the local CDCs in the 10 regions. All participants provided written informed consent.

The current study involved 2,026 participants selected as a subcohort for a nested case-subcohort study of ischemic heart disease in CKB (22). They were randomly selected from a population subset of 69,353 genotyped participants who had no prior history of cardiovascular disease (CVD) or statin use at baseline and were genetically unrelated to each other. Among these 2,026 participants, 130 having prevalent diabetes at baseline were analyzed separately for internal replication but excluded from the main analyses. During 11 years of follow-up (up to 1 January 2018), 92 individuals developed incident T2D (ICD10: E10–E14) among the 1,896 participants included in the main analyses.

Proteomics Assay

Stored baseline plasma samples from participants were retrieved, thawed, and subaliquoted to multiple aliquots, with one aliquot (100 µL; for batch 1 assay) shipped on dry ice to the Olink Biosciences Laboratory at Uppsala, Sweden, and one aliquot (for batch 2 assay) shipped subsequently to Olink Proteomics Laboratory in Boston, MA, for multiplex proximity extension assay of proteins. Batch 1 covered 1,463 unique proteins first released by Olink, while batch 2 covered a further 1,460 unique proteins released subsequently by Olink. To minimize interrun and intrarun variation, the samples were randomized across plates and normalized using both an internal control (extension control) and an interplate control and then transformed using a predetermined correction factor.

Details of the Olink assay performance and validation have been previously reported (18). The limits of detection were determined using negative control samples (buffer without antigen). A sample is flagged as having a quality control warning if the incubation control deviates more than a predetermined value (±0.3) from the median value of all samples on the plate. The preprocessed data were provided in the arbitrary unit normalized protein expression (NPX) on a log2 scale. The present analysis has a total of 2,941 proteins (2,923 unique proteins), including 1,472 proteins (1,463 unique proteins) in batch 1 and 1,469 proteins (1,460 unique proteins) in batch 2.

Statistical Analysis

Plasma protein levels were standardized (i.e., values of each protein were divided by their SD) and analyzed as continuous variables. In observational analysis, Cox and logistic regression models were used to estimate adjusted hazards ratios (HRs) and odds ratios (ORs) (and 95% CI) for incident and prevalent diabetes, respectively. All analyses were adjusted for age, age2, sex, study area, fasting time, ambient temperature, plate ID, education (four categories: no formal school, primary school, secondary school, and high school and above), smoking (three categories: never, occasional or ex-regular, and regular smoker), alcohol drinking (three categories: never, occasional or ex-regular, and weekly), physical activity (MET-hours), family history of diabetes (binary), and BMI. Sensitivity analyses were conducted by further adjusting dietary variables, and by adjusting the effect of the plate using residual from regression of protein values on the plate as a covariate in the models. The cross-sectional analysis for random plasma glucose (per SD) was restricted to participants without prior diabetes at baseline. Analyses were also conducted in UK Biobank (UKB) for diabetes incidence, blood glucose, and HbA1c (see the Supplementary Material).

For proteins significantly associated with incident diabetes, we further 1) examined the shape of the associations with T2D by quartiles of individual proteins and 2) assessed the performance of proteomic-based risk prediction of T2D in CKB, with discrimination and calibration utilities assessed using area under the curve (AUC) and calibration plots with the Hosmer-Lemeshow test, respectively. Reclassification was measured using both the percentile-based net reclassification index (NRI), with deciles of relative risk as reference categories, and the continuous NRI. The proteomic risk model was internally validated using 1,000 bootstrap method and compared and combined with a conventional risk prediction model in Chinese (23), with further external validation in the UKB. We 3) conducted gene ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses using clusterProfiler (v.4.2.2) (24), to determine which biological functions or processes were significantly enriched based on hypergeometric tests.

For proteins showing significant associations with T2D in observational analyses, a two-sample MR using the Wald ratio method (25,26), was conducted using 1) cis-pQTLs obtained from a GWAS of CKB, with lookups in Asian Genetic Epidemiology Network (AGEN) consortium of East Asian adults including 66,677 diabetes cases (6), and 2) cis-pQTLs obtained from a GWAS of UKB, with lookups in the Diabetes Meta-Analysis of Trans-Ethnic association studies (DIAMANTE) Consortium of European descent including 80,154 diabetes case and 853,816 control participants (5). Colocalization was performed only for those proteins that had 95% credible sets identified by fine mapping in both AGEN T2D and CKB cis-pQTL data sets. Fine mapping was performed using susieR (v0.12.16), and colocalization was performed using coloc (v5.2.1) packages in R.

For proteins showing significant genetic associations with T2D, we screened the protein expression database of Genotype-Tissue Expression (GTEx) to study the tissue-specific role of the causal proteins in diabetes. We further searched Type 2 Diabetes Knowledge Portal (T2DKP) for associations of 1) cis-pQTLs from both CKB and UKB with a range of phenotypes using a P value threshold of 5 × 10−8 and 2) genes with available diseases and traits using a Human Genetic Evidence Scores threshold of 10, indicating strong evidence for the causal relevance of such proteins for diseases or traits (27).

Figure 1 provides an overview of the main analytic approaches. All statistical analyses were performed using R version 4.1.2. The Benjamini-Hochberg false discovery rate (FDR) was used to correct for multiple testing.

Figure 1

Overview of study design, analytic approaches, and key findings. EAS, East Asian; EUR, European.

Figure 1

Overview of study design, analytic approaches, and key findings. EAS, East Asian; EUR, European.

Close modal

Data Availability

The CKB is a global resource for the investigation of lifestyle, environmental, blood biochemical, and genetic factors as determinants of common diseases. The CKB study group is committed to making the cohort data available to the scientific community in China, the U.K., and worldwide to advance knowledge about the causes, prevention, and treatment of disease. For detailed information on what data are currently available to open access users and how to apply for it, please visit https://www.ckbiobank.org/data-access/data-access-procedures. A research proposal will be requested to ensure that any analysis is performed by bona fide researchers. Researchers who are interested in obtaining additional information or data that underlies this paper should contact [email protected]. For any data that are not currently available for open access, researchers may need to develop formal collaboration with a study group. Custom code was used for all statistical analyses in this report.

Among the 1,896 participants included in the main analyses, the mean (SD) age was 51.3 (10.4) years, 62.1% were women, and 50.6% were urban residents, which, along with many other baseline characteristics, were similar to those in the overall genotyped CKB cohort (Supplementary Table 1).

Observational Associations of Proteins With Diabetes

After adjusting for conventional risk factors, 33 proteins were significantly associated at 5% FDR with risk of T2D (batches 1/2: 24/9) (Fig. 2). The associations were typically log-linear throughout the full ranges of levels of specific proteins examined (Supplementary Fig. 1), although the adjusted HRs (per 1-SD higher protein level) varied, from 1.38 to 1.98 for those showing positive associations (23 proteins) and from 0.48 to 0.70 for those showing inverse associations (10 proteins) with T2D (Supplementary Fig. 2). Proteins showing the strongest positive associations were VNN1 (1.98, 95% CI 1.49–2.64), GHR (1.80, 1.35–2.39), PRCP (1.78, 1.32–2.40), CPM (1.68, 1.28–2.22), and IGSF9 (1.67, 1.32–2.11). The proteins showing strongest inverse associations included IGFBP2 (0.48, 0.36–0.65), CKB (0.59, 0.44–0.78), IGFBP1 (0.61, 0.47–0.80), LPL (0.61, 0.48–0.78), and ESM1 (0.62, 0.48–0.69). Six proteins (IGFBP2, VNN1, IGSF9, GLB1, PON3, and RIDA) were significantly associated with incident T2D after applying Bonferroni multiple test correction. Further adjustments for fresh fruit, red meat consumption, and a healthy diet score did not alter the results, and, similarly, use of the residual from regression of protein values on the plate as a covariate in the models did not alter the results.

Figure 2

Associations of 1-SD higher levels of 2,941 proteins with incident diabetes in observational analyses. Models were adjusted for age, age2, sex, study area, fasting time, ambient temperature, plate ID, education, smoking, alcohol consumption, physical activity, family history of diabetes, and BMI. Red, blue, and gray dots denote significant positive, significant inverse, and nonsignificant associations, respectively.

Figure 2

Associations of 1-SD higher levels of 2,941 proteins with incident diabetes in observational analyses. Models were adjusted for age, age2, sex, study area, fasting time, ambient temperature, plate ID, education, smoking, alcohol consumption, physical activity, family history of diabetes, and BMI. Red, blue, and gray dots denote significant positive, significant inverse, and nonsignificant associations, respectively.

Close modal

In internal replication analyses, most of these 33 proteins were significantly associated with blood glucose (29/33; 88%) levels or prevalent diabetes (30/33; 91%) (Supplementary Fig. 3). Moreover, all 33 proteins were externally replicated in UKB (at 5% FDR) for blood glucose, HbA1c, and incident T2D (except one protein), although the effect sizes varied (Supplementary Fig. 4). Of these 33 proteins, most proteins had similar HRs in observational analyses of both studies, while the HRs for three proteins (VNN1, GHR, and IGFBP2) differed somewhat but were all directionally concordant with each other.

These 33 proteins were only moderately correlated with each other, with 99.5% of protein pairs having correlation coefficients ranging from −0.7 to +0.7 (Supplementary Fig. 5).

Risk Prediction of Incident T2D

In CKB, the conventional risk prediction model without blood glucose had an AUC of 0.754 (0.710–0.798), increasing to 0.774 (0.730–0.818) with the addition of blood glucose (Table 1). The proteomic-based model (33 proteins) alone had an AUC of 0.824 (0.786–0.862), which significantly outperformed the conventional models. The addition of the top 10 or all 33 proteins to conventional risk factors plus a blood glucose model yielded AUCs of 0.844 (0.803–0.885) and 0.876 (0.846–0.906), respectively. For NRI, the corresponding values were 28% (15–41%) and 38% (24–52%), respectively, using a categorical approach, rising to 84% (65–103%) and 97% (79–115%) when using a continuous approach (Supplementary Table 2). The observed and predicted risks of T2D showed excellent calibration (the Hosmer-Lemeshow test: χ2 = 3.4, P = 0.90) (Supplementary Fig. 6). The application of these same proteins identified in CKB to UKB yielded comparable results for prediction of T2D (Supplementary Tables 3 and 4).

Table 1

Predictive values of conventional risk factors, random plasma glucose (RPG), and 33 proteins for incident T2D, separately and combined

ModelAUCNRI (95% CI), %
Base modela 0.754 (0.710–0.798)  
RPG 0.646 (0.591–0.700)  
33 proteins 0.824 (0.786–0.862)  
Base model + RPG 0.774 (0.730–0.818) 14 (3–25)b 
Base model + 33 proteins 0.874 (0.844–0.904) 36 (22–50)b 
RPG + 33 proteins 0.829 (0.791–0.868) 43 (32–54)c 
Base model + RPG + 33 proteins 0.876 (0.846–0.906) 38 (24–52)d 
Base model + RPG + top 10 proteinse 0.844 (0.803–0.885) 28 (19–37)d 
ModelAUCNRI (95% CI), %
Base modela 0.754 (0.710–0.798)  
RPG 0.646 (0.591–0.700)  
33 proteins 0.824 (0.786–0.862)  
Base model + RPG 0.774 (0.730–0.818) 14 (3–25)b 
Base model + 33 proteins 0.874 (0.844–0.904) 36 (22–50)b 
RPG + 33 proteins 0.829 (0.791–0.868) 43 (32–54)c 
Base model + RPG + 33 proteins 0.876 (0.846–0.906) 38 (24–52)d 
Base model + RPG + top 10 proteinse 0.844 (0.803–0.885) 28 (19–37)d 
a

Predictors in the base model included age, sex, study area, fasting time, education, smoking, alcohol consumption, physical activity, family history of diabetes, and BMI.

b

Reference: base model.

c

Reference: RPG.

d

Reference: base model + RPG.

e

Ordered by P value.

Enrichment Analysis

In enrichment analyses of 33 proteins, hydrolase activity was identified as the top biological pathway (Supplementary Fig. 7). Other pathways, such as growth factor binding and insulin-like growth factor binding, were also among the top overrepresented biological pathways. None of the pathways were annotated after correction for multiple testing in similar analyses using the Kyoto Encyclopedia of Genes and Genomes method.

Genetic Associations

In the CKB GWAS, cis-pQTL variants were identified for 22 (67%) of the 33 proteins. In two-sample MR analyses involving CKB and AGEN that excluded CKB, three proteins (ENTR1, LPL, and PON3) were significantly associated at FDR <0.05 with T2D (Table 2). Of these three proteins, the HRs were less extreme in MR than in observational analyses but were directionally concordant. Moreover, colocalization analyses provided strong support (pH4 >0.6) for shared genetic variants of two proteins (LPL and PON3) with T2D. Independent two-sample MR analyses involving 22 cis-pQTLs identified in the UKB GWAS for these 33 T2D-associated proteins replicated associations for ENTR1 (P = 0.004) and LPL (P = 0.01). For PON3, although cis-pQTL was identified in the UKB GWAS, the association (with same direction) was not significant (P = 0.49).

Table 2

Genetic effect estimates, colocalization, PheWAS results, and relevant drug targets of three proteins showing genetic effects on T2D

ProteinFull nameTwo-sample MRpH4PheWAS associationsDrug (indication)
cis-pQTLOR (95% CI) per SD higherP value
ENTR1 Endosome-associated-trafficking regulator 1 rs1051957 1.26 (1.18–1.34) 1.3E-11 0.01 T2D, HbA1c, glucose, insulin — 
LPL Lipoprotein lipase rs17411113 0.91 (0.85–0.97) 0.0098 0.87 T2D, MI, CAD, TG, VLDL, ApoA Ibrolipim (lipid lowering) 
PON3 Serum paraoxonase/ lactonase 3 rs1053275 0.94 (0.89–0.98) 0.0047 0.65 ApoA, LDL — 
ProteinFull nameTwo-sample MRpH4PheWAS associationsDrug (indication)
cis-pQTLOR (95% CI) per SD higherP value
ENTR1 Endosome-associated-trafficking regulator 1 rs1051957 1.26 (1.18–1.34) 1.3E-11 0.01 T2D, HbA1c, glucose, insulin — 
LPL Lipoprotein lipase rs17411113 0.91 (0.85–0.97) 0.0098 0.87 T2D, MI, CAD, TG, VLDL, ApoA Ibrolipim (lipid lowering) 
PON3 Serum paraoxonase/ lactonase 3 rs1053275 0.94 (0.89–0.98) 0.0047 0.65 ApoA, LDL — 

PheWAS and Drug Target Lookup

In PheWAS analyses of these three proteins, cis-pQTLs for ENTR1 were associated with T2D and several T2D-related traits, including HbA1c, insulin, and glucose (Table 2). Likewise, cis-pQTLs for LPL and PON3 were related to T2D, CVD outcomes (MI and CAD), and CVD risk factors (LDL, TG, and ApoA). Within CKB, ENTR1, LPL, and PON3 were each significantly associated with glucose in cross-sectional analyses. All three proteins were highly expressed in liver, pancreas, and adipose tissues, and additional analyses of single-gene knockout mouse models identified associations with several lipidemia-related phenotypes (LPL, ENTR1, and PON3), abnormal liver morphology (ENTR1), and oxidative stress (PON3). Analysis of Open Targets and other databases indicated evidence of drug development for one protein (LPL), including the commercially available drug Ibrolipim, a lipoprotein lipase activator that degrades circulating triglycerides in blood (Table 2). However, there were no reports of drug targets or development for ENTR1 and PON3.

In this study of Chinese adults, we found that 33 proteins were significantly associated with risk of incident T2D. The addition of these proteins to the conventional prediction models substantially improved risk prediction of T2D, with comparable performance in Chinese and European populations. Moreover, MR analyses based on cis-pQTLs identified in the CKB GWAS provided strong support for the causal relevance of three proteins (ENTR1, LPL, and PON3) for T2D, with replication of ENTR1 and LPL in Europeans using different cis-pQTLs. In colocalization analyses, there was strong evidence of shared causal genetic variants of T2D with two proteins (LPL and PON3). Furthermore, the PheWAS results confirmed the importance of these proteins for T2D or T2D-related traits. Among these three proteins, there was, however, no evidence of any drug development for ENTR1 and PON3.

Previous observational studies of proteomics and T2D have involved primarily European ancestry populations, used different study designs, and included varying number of proteins measured by different assay platforms (13,15,16,20). Although there were inconsistent findings, several proteins have been consistently associated with T2D, including IGFBP1, IGFBP2, GHR, and SHBG (12,13). In the current study, observational analyses found that IGFBP1 and IGFBP2 were most strongly associated with T2D. The IGFBPs, which comprise some 15 proteins so far identified, importantly impact on systemic IGF signaling by modulating activity and decay of their binding partners. IGFBP-2, which is mainly released by the liver, directly supports glucose homeostasis by stimulating glucose uptake into adipocytes, and also inhibits adipogenesis and enhances long-term insulin sensitivity (28). Consistent with previous findings, GHR was positively associated with T2D risk in the current study. There is evidence that high levels of GHR can accelerate systemic insulin resistance (29), which may partly explain the observed associations. Several previous studies reported possible protective associations between T2D risk and SHBG, which is a hepatokine that binds to circulating steroid hormones (testosterone and estradiol) and acts on macrophages and adipocytes to suppress inflammation and lipid accumulation (30). Although the associations of SHBG with T2D became borderline significant after multiple testing correction (HR per SD higher: 0.69 [0.56–0.85]; 5% FDR P = 0.057), the present results were consistent with previous study findings. Indeed, in CKB observational analyses, levels of adiponectin were inversely associated with risk of diabetes before multiple testing correction, consistent with previous literature (31). In genetic analyses, however, we did not find the causal roles of these few known proteins in etiology of T2D, possibly because of limited study power. The current study also found significant protective associations of pancreatic α-amylase (AMY2A and AMY2B) with T2D. Amylase is a digestive enzyme predominantly secreted by the pancreas and salivary glands and acts as a catalyst for carbohydrate hydrolysis, which is one of the viable targets to control T2D. Despite consistent observational finings, few previous studies were able to assess the causal relevance of these proteins in T2D. In our study, we confirmed the causal relevance of AMY2B in UKB using different cis-pQTLs (OR per SD higher: 0.94 [0.91–0.97]; P = 0.0003). On the other hand, we found that three proteins (PON3, LPL, and ENTR1) were significantly associated with T2D in both observational and genetic analyses.

PON3 is expressed primarily in the liver, and there is good evidence from animal experiment and epidemiologic studies that PON3 can inhibit oxidative stress, suppress inflammation, improve insulin resistance and abnormal glucolipid metabolism, and protect against atherosclerosis (32). As in the current study, the inverse associations between PON3 and incident T2D were also reported in two previous prospective studies in Sweden (1,026 participants with 146 incident T2D cases) and Germany (1,143 participants with 178 incident T2D cases) (16,33), which persisted after adjusting for plasma glucose (marginal significant in the current study), implying a glucose-independent association with T2D incidence (16). However, no previous genetic studies supported PON3’s causal relevance for T2D incidence; therefore, findings from the current study provide strong and novel support for PON3 as a potential target for improved prevention and treatment of T2D.

LPL (lipoprotein lipase) is a rate-limiting enzyme that hydrolyzes circulating triglyceride-rich lipoproteins, including very low-density lipoproteins and chylomicrons (34). This enzyme is predominantly located in adipose tissue, muscle, and cardiac tissue, and a reduction in LPL activity is associated with an increase in plasma levels of triglycerides, prompting evaluation of target druggability for treatment of dyslipidemia (34), but its relevance to insulin resistance and glucose metabolism is less clear. Previous MR analyses suggested potential causal effects of LPL on insulin levels and the development of T2D (33). Moreover, a pharmacological study involving 392,220 Europeans showed that triglyceride-lowering alleles in LPL were associated with lower risk of T2D, independent of LDL-C–lowering genetic mechanisms (35). These findings provide genetic support for the development of agents that enhance LPL-mediated lipolysis for T2D prevention, which suggests, if further confirmed in other studies, potential opportunities for drug repurposing for the treatment of T2D.

None of the previous observational studies have examined the associations of plasma levels of ENTR1 with risk of T2D. The ENTR1 gene encodes ENTR, which has a potential role in the transcriptional regulation of the solute carrier family 2 member 1 glucose 40 transporter protein (SLC2A1) (36). Importantly, SLC2A1 is responsible for approximately 30−40% of the glucose uptake in skeletal muscle, with the remainder transported through GLUT4 (36). This may partially explain the strong and apparently causal associations with T2D observed in the current study. However, we could not exclude the possibility that these associations were caused by other pathways, and further investigations of ENTR1 as a potential novel target for T2D are warranted.

In recent years, various proteomic-based prediction models for prevalent or incident diabetes have been developed, with varying numbers of proteins included (3 to 1,468) and largely different degrees of predictive performance (9,12,14). More recently, UKB developed a ProteinScore for T2D based on 1,468 Olink proteins, which outperformed a polygenic risk score and HbA1c (9). In the absence of HbA1c, we found that the addition of 33 or even the 10 top proteins to conventional risk factors (including blood glucose) significantly improved the risk prediction of incident T2D in Chinese adults. Moreover, the same proteins identified in Chinese adults also yielded comparable results in European populations and so could be considered for future clinical application in diverse populations.

The chief strengths of the current study include the large number of proteins assayed, independent replication of the main results internally and externally, use of ancestry-specific genetic instruments to assess causality, exclusion of CKB data from AGEN T2D GWAS summary statistics to minimize potential collider bias resulting from sample overlap, and multiple downstream analyses to assess possible mechanisms underlying these associations. Moreover, we also assessed the utility of proteomic-based risk prediction for T2D in diverse populations, independently and in combination with conventional risk factors. However, the current study also had several limitations. First, the study sample size was modest in CKB, limiting its power to detect more significant associations. Second, we were unable to independently replicate the observational findings in other East Asian populations, because of the lack of available data. However, internal replication with prevalence of T2D and plasma glucose levels and external replication with incident T2D, glucose, and HbA1c in UKB confirmed the validity of our observational findings when applying similar multiple test correction. Third, the two-sample MR analyses only involved two-thirds of proteins, because of a lack of overlapping cis-pQTLs in publicly available GWAS summary statistics. Fourth, there were no kidney function data collected in CKB among the study participants. In UKB, however, further adjustment for kidney function (blood creatinine) had a minor effect on the total number of proteins associated with incident diabetes (1,514 with adjustment vs. 1,541 without adjustment, at FDR <0.05). Future studies with a larger sample size and better genetic instruments, involving perhaps both cis- and trans-pQTLs, a more advanced method of machine learning in risk prediction, and functional analyses, are needed to further identify, replicate, and clarify the associations of different proteins with T2D in different ancestry populations.

In summary, the current study identified 33 proteins that were significantly associated with T2D, with strong genetic support for the causal relevance of three proteins. With the exception of one protein (LPL), there was no evidence of any drug development for two proteins, particularly PON3, which is highly expressed in liver cells and is a promising drug target for improved prevention and treatment of T2D. Further biological validation using in vitro and in vivo experiments along with human studies is required to elucidate the underlying mechanisms. The current study highlighted the importance of proteomics in prospective studies of diverse populations to improve risk prediction, enhance understanding of disease etiology, and discover potential novel drug targets for treatment and prevention of T2D as well as other diseases.

Acknowledgments. The chief acknowledgment is of the participants, the project staff, and the China CDC (Changping District, Beijing, China) and its regional offices for assisting with the fieldwork. The authors thank Judith Mackay at the Asian Consultancy on Tobacco Control in Hong Kong, Yu Wang, Gonghuan Yang, Zhengfu Qiang, Lin Feng, Maigeng Zhou, Wenhua Zhao, and Yan Zhang at China CDC, Lingzhi Kong, Xiucheng Yu, and Kun Li at the Chinese Ministry of Health (Xicheng District, Beijing, China), and Sarah Clark, Martin Radley, and Mike Hill at CTSU (The Clinical Trial Service Unit and Epidemiological Studies Unit Oxford, U.K.), for assisting with the planning, conduct, and organization of the study.

Funding. The CKB baseline survey and the first re-survey were supported by the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up and subsequent resurveys have been supported by Wellcome grants to Oxford University (212946/Z/18/Z, 202922/Z/16/Z, 104085/Z/14/Z, and 088158/Z/09/Z) and grants from the National Natural Science Foundation of China (82192901, 82192904, and 82192900) and from the National Key Research and Development Program of China (2016YFC0900500). The UK Medical Research Council (MC_UU_00017/1, MC_UU_12026/2, and MC_U137686851), Cancer Research UK (C16077/A29186 and C500/A16896), and the British Heart Foundation (CH/1996001/9454) provide core funding to the Clinical Trial Service Unit and Epidemiological Studies Unit at Oxford University for the project. The proteomic assays were supported by BHF (18/23/33512), Novo Nordisk, and Olink. DNA extraction and genotyping were supported by GlaxoSmithKline and the UK Medical Research Council (MC-PC-13049 and MC-PC-14135). The trans-ethnic BMI-genetic score also used data from UKB (application no. 50474).

This research was funded in whole, or in part, by the Wellcome Trust (212946/Z/18/Z, 202922/Z/16/Z, 104085/Z/14/Z, and 088158/Z/09/Z). For the purpose of open access, the author has applied a CC-BY public copyright license to any author accepted manuscript version arising from this submission.

Duality of Interest. No potential conflicts of interest relevant to this article were reported.

Author Contributions. P.Y., H.D., and Z.C. contributed to the concept and design of the study. P.Y. conducted statistical analyses and drafted the manuscript. P.Y., A.I., A.P., S.S., N.W., K.L., I.M., H.F., C.K., M.M., Y.C., F.B., B.L., L.Y., J.Li., D.A., D.Sc., D.Su., P.P., J.Lv., C.Y., M.H., D.B., R.W., L.L., R.C., H.D., and Z.C. were involved in the planning, acquisition and interpretation of data. I.M., H.F., Y.C., D.A., D.Sc., P.P., and M.H. provided administrative, technical, or material support. All authors provided critical revision of the manuscript for important intellectual content. Z.C. supervised the work. P.Y. and Z.C. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Handling Editors. The journal editors responsible for overseeing the review of the manuscript were Elizabeth Selvin and Casey M. Rebholz.

This article contains supplementary material online at https://doi.org/10.2337/figshare.25393255.

*

Members of the China Kadoorie Biobank Collaborative Group are provided in the supplementary material online.

1.
International Diabetes Federation.
IDF Diabetes Atlas.
10th ed.
Brussels, Belgium
,
International Diabetes Federation
,
2021
2.
O’Hearn
M
,
Lara-Castor
L
,
Cudhea
F
, et al
. Incident type 2 diabetes attributable to suboptimal diet in 184 countries. Nat Med
2023
;
29
:
982
995
3.
Abbasi
A
,
Peelen
LM
,
Corpeleijn
E
, et al
.
Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study
.
BMJ
2012
;
345
:
e5900
4.
Edlitz
Y
,
Segal
E
. Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards. eLife
2022
;11:e71862
5.
Mahajan
A
,
Spracklen
CN
,
Zhang
W
, et al.;
FinnGen
;
eMERGE Consortium
.
Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation
.
Nat Genet
2022
;
54
:
560
572
6.
Spracklen
CN
,
Horikoshi
M
,
Kim
YJ
, et al
.
Identification of type 2 diabetes loci in 433,540 East Asian individuals
.
Nature
2020
;
582
:
240
245
7.
Santos
R
,
Ursu
O
,
Gaulton
A
, et al
. A comprehensive map of molecular drug targets. Nat Rev Drug Discov
2017
;
16
:
19
34
8.
Suhre
K
,
McCarthy
MI
,
Schwenk
JM
. Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet
2021
;
22
:
19
37
9.
Gadd
DA
,
Hillary
RF
,
Kuncheva
Z
, et al
.
Blood protein levels predict leading incident diseases and mortality in UK Biobank
. 3 May 2023 [preprint].
medRxiv
:
2023.05.01.23288879
10.
Sun
BB
,
Chiou
J
,
Traylor
M
, et al
. Plasma proteomic associations with genetics and health in the UK Biobank. Nature
2023
;
622
:
329
338
11.
Dhindsa
RS
,
Burren
OS
,
Sun
BB
, et al
. Rare variant associations with plasma protein levels in the UK Biobank. Nature
2023
;
622
:
339
347
12.
Rooney
MR
,
Chen
J
,
Echouffo-Tcheugui
JB
, et al
.
Proteomic predictors of incident diabetes: results from the Atherosclerosis Risk in Communities (ARIC) study
.
Diabetes Care
2023
;
46
:
733
741
13.
Elhadad
MA
,
Jonasson
C
,
Huth
C
, et al
.
Deciphering the plasma proteome of type 2 diabetes
.
Diabetes
2020
;
69
:
2766
2778
14.
Huth
C
,
von Toerne
C
,
Schederecker
F
, et al
. Protein markers and risk of type 2 diabetes and prediabetes: a targeted proteomics approach in the KORA F4/FF4 study. Eur J Epidemiol
2019
;34:409–422
15.
Yuan
S
,
Xu
F
,
Li
X
, et al
.
Plasma proteins and onset of type 2 diabetes and diabetic complications: proteome-wide Mendelian randomization and colocalization analyses
.
Cell Rep Med
2023
;
4
:
101174
16.
Molvin
J
,
Pareek
M
,
Jujic
A
, et al
. Using a targeted proteomics chip to explore pathophysiological pathways for incident diabetes–the Malmö Preventive Project. Sci Rep
2019
;9:272
17.
Gold
L
,
Walker
JJ
,
Wilcox
SK
,
Williams
S
.
Advances in human proteomics at high scale with the SOMAscan proteomics platform
.
N Biotechnol
2012
;
29
:
543
549
18.
Assarsson
E
,
Lundberg
M
,
Holmquist
G
, et al
.
Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability
.
PLoS One
2014
;
9
:
e95192
19.
Ferkingstad
E
,
Sulem
P
,
Atlason
BA
, et al
. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet
2021
;53:1712–1721
20.
Chen
ZZ
,
Gao
Y
,
Keyes
MJ
, et al
.
Protein markers of diabetes discovered in an African American cohort
.
Diabetes
2023
;
72
:
532
543
21.
Chen
Z
,
Chen
J
,
Collins
R
, et al.;
China Kadoorie Biobank (CKB) collaborative group
.
China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up
.
Int J Epidemiol
2011
;
40
:
1652
1666
22.
Yao
P
,
Iona
A
,
Kartsonaki
C
, et al
. Conventional and genetic associations of adiposity with 1463 proteins in relatively lean Chinese adults. Eur J Epidemiol
2023
;
38
:
1089
1103
23.
Xu
S
,
Coleman
RL
,
Wan
Q
, et al
. Risk prediction models for incident type 2 diabetes in Chinese people with intermediate hyperglycemia: a systematic literature review and external validation study. Cardiovasc Diabetol
2022
;21:182
24.
Wu
T
,
Hu
E
,
Xu
S
, et al
. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation
2021
;2:100141
25.
Davey Smith
G
,
Hemani
G
.
Mendelian randomization: genetic anchors for causal inference in epidemiological studies
.
Hum Mol Genet
2014
;
23
(
R1
):
R89
R98
26.
Lawlor
DA
.
Commentary: Two-sample Mendelian randomization: opportunities and challenges
.
Int J Epidemiol
2016
;
45
:
908
915
27.
Costanzo
MC
,
von Grotthuss
M
,
Massung
J
, et al
. The Type 2 Diabetes Knowledge Portal: an open access genetic resource dedicated to type 2 diabetes and related traits. Cell Metab
2023
;35:695–710.e6
28.
Wittenbecher
C
,
Ouni
M
,
Kuxhaus
O
, et al
.
Insulin-like Growth Factor Binding Protein 2 (IGFBP-2) and the risk of developing type 2 diabetes
.
Diabetes
2019
;
68
:
188
197
29.
Liu
J
,
Nie
C
,
Xue
L
, et al
.
Growth hormone receptor disrupts glucose homeostasis via promoting and stabilizing retinol binding protein 4
.
Theranostics
2021
;
11
:
8283
8300
30.
Bourebaba
N
,
Ngo
T
,
Śmieszek
A
,
Bourebaba
L
,
Marycz
K
. Sex hormone binding globulin as a potential drug candidate for liver-related metabolic disorders treatment. Biomed Pharmacother
2022
;153:113261
31.
Li
S
,
Shin
HJ
,
Ding
EL
,
van Dam
RM
.
Adiponectin levels and risk of type 2 diabetes: a systematic review and meta-analysis
.
JAMA
2009
;
302
:
179
188
32.
Liu
Y
,
Zhu
D
,
Dong
G
,
Zeng
Y
,
Jiang
P
,
Xiao
Y
.
Liver paraoxonase 3 expression and the effect of liraglutide treatment in a rat model of diabetes
.
Adv Clin Exp Med
2021
;
30
:
157
163
33.
Luo
H
,
Bauer
A
,
Nano
J
, et al
. Associations of plasma proteomics with type 2 diabetes and related traits: results from the longitudinal KORA S4/F4/FF4 study. Diabetologia
2023
;
66
:
1655
1668
34.
Liu
Y
,
Li
H
,
Wang
S
,
Yin
W
,
Wang
Z
. Ibrolipim attenuates early-stage nephropathy in diet-induced diabetic minipigs: focus on oxidative stress and fibrogenesis. Biomed Pharmacother
2020
;129:110321
35.
Lotta
LA
,
Stewart
ID
,
Sharp
SJ
, et al
.
Association of genetically enhanced lipoprotein lipase-mediated lipolysis and low-density lipoprotein cholesterol-lowering alleles with risk of coronary disease and type 2 diabetes
.
JAMA Cardiol
2018
;
3
:
957
966
36.
Farries
G
,
Bryan
K
,
McGivney
CL
, et al
.
Identification of expression quantitative trait loci in the skeletal muscle of thoroughbreds reveals heritable variation in expression of genes relevant to cofactor metabolism
. 24 July 2019 [preprint].
bioRxiv
:
713669
Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at https://www.diabetesjournals.org/journals/pages/license.