The purpose of this study was to develop a model for assessing the 5-year risk of developing type 2 diabetes from a panel of 64 circulating candidate biomarkers.
Subjects were selected from the Inter99 cohort, a longitudinal population-based study of ∼6,600 Danes in a nested case-control design with the primary outcome of 5-year conversion to type 2 diabetes. Nondiabetic subjects, aged ≥39 years, with BMI ≥25 kg/m2 at baseline were selected. Baseline fasting serum samples from 160 individuals who developed type 2 diabetes and from 472 who did not were tested. An ultrasensitive immunoassay was used to measure of 58 candidate biomarkers in multiple diabetes-associated pathways, along with six routine clinical variables. Statistical learning methods and permutation testing were used to select the most informative biomarkers. Risk model performance was estimated using a validated bootstrap bias-correction procedure.
A model using six biomarkers (adiponectin, C-reactive protein, ferritin, interleukin-2 receptor A, glucose, and insulin) was developed for assessing an individual's 5-year risk of developing type 2 diabetes. This model has a bootstrap-estimated area under the curve of 0.76, which is greater than that for A1C, fasting plasma glucose, fasting serum insulin, BMI, sex-adjusted waist circumference, a model using fasting glucose and insulin, and a noninvasive clinical model.
A model incorporating six circulating biomarkers provides an objective and quantitative estimate of the 5-year risk of developing type 2 diabetes, performs better than single risk indicators and a noninvasive clinical model, and provides better stratification than fasting plasma glucose alone.
The prevalence of type 2 diabetes has reached epidemic levels, affecting ∼7% of the U.S. population, and current epidemiological trends indicate that the prevalence will continue to increase dramatically (1). Several long-term prospective clinical trials have shown that interventions can delay or possibly prevent the onset of type 2 diabetes in high-risk individuals (2,3), underscoring the importance of identifying individuals at risk to begin interventions as early as possible and focus resources on those with the highest risk.
The most commonly used method of assessing risk of type 2 diabetes is measuring fasting plasma glucose (FPG); however, the specificity of this test is poor (4,5). Although many individuals are identified as having impaired fasting glucose (IFG), their absolute risk of conversion to diabetes is only 5–10% per year (6). The oral glucose tolerance test (OGTT) is more accurate for risk assessment. However, it is rarely used in practice because it is unpleasant for the patient and requires 2 h to perform. Another challenge is that by the time glucose regulation is abnormal, the underlying disease has been progressing for many years, and complications have already occurred in a significant number of individuals (7). Thus, the rationale of using one variable to assess risk is questionable, when the risk of harm actually varies based on a range of variables and would be better assessed using a multivariable individualized risk score (8).
Several indexes using clinical information and routine laboratory measurements have been developed for assessing type 2 diabetes risk (9,–11). These have never been widely adopted by physicians. Given the limitations of the OGTT, FPG, and indexes that the clinician must calculate, it is clear that an improved method for assessing type 2 diabetes risk, with a convenient format for routine clinical use, would enable physicians to accurately evaluate more individuals.
The dysregulation of many biological pathways precedes the development of overt type 2 diabetes (12). Although many studies have assessed whether levels of a few molecules might predict future diabetes (13,–15), none have quantitatively measured a large number of molecules simultaneously in a sufficient number of samples to robustly evaluate their utility for risk assessment. We undertook a systematic analysis of many candidates in pathways dysregulated in diabetes to search for patterns of biomarkers with more predictive power than individual biomarkers or previously examined biomarker combinations. For this analysis, we selected 632 baseline samples from the Inter99 Study and an ultrasensitive immunoassay to measure many proteins in small amounts of serum.
RESEARCH DESIGN AND METHODS
The Inter99 cohort consists of 61,301 subjects aged 30–60 years from the Danish Civil Registration System. Although this was a lifestyle intervention trial for cardiovascular disease (14), the 5-year rate of progression to type 2 diabetes observed in this study (3.4%) was similar to other estimates of progression for this age-group (16). A sample of 13,016 was randomly selected; of these, 12,934 were eligible and invited for an examination, and 6,784 (52.5%) attended (17). Eligible individuals (n = 6,536) were reinvited after 5 years and 4,511 (69%) attended. Fasting blood samples, lifestyle data, blood pressure, waist circumference, plasma lipids, and OGTT results were collected at baseline and at 5-year time points. We defined an “at-risk” subpopulation as those aged ≥39 years with BMI ≥25 kg/m2 and free of diabetes at baseline. Among these individuals, 174 progressed to type 2 diabetes during the 5-year follow-up (converters), and baseline samples were available for 160, whereas 2,872 did not progress (nonconverters). Diagnosis of type 2 diabetes was defined by a 2-h plasma glucose of ≥11.1 mmol/l in an OGTT or FPG of ≥7.0 mmol/l. Nonconverters (n = 472) were randomly selected in an ∼3:1 ratio to converters.
Clinical and standard laboratory measurements
Blood pressure was obtained and anthropometric measurements, routine laboratory measures (FPG, insulin, and lipids), and the OGTT were performed as described previously (17). Serum was stored at −19°C.
Candidate biomarker selection
Potential biomarkers were identified by searching the PubMed database using search terms relevant to the development of diabetes. Of 260 candidate biomarkers identified as being involved in pathways associated with metabolic or cardiovascular disorders, obesity, cell death, or inflammatory response, we successfully obtained assay reagents for 89. Data from 58 candidate biomarkers met our quality control criteria, which required that results from ≥66% of the samples had to fall within the assay's linear dynamic range.
Molecular assays
Sandwich immunoassays developed for the 58 proteins typically used a monoclonal capture antibody and a fluorescently labeled detection antibody. Biomarker candidates were measured using an ultrasensitive molecular counting technology platform (Singulex, St. Louis, MO). Details regarding assay reagents have been described previously (18). In brief, labeled antibodies were detected with the ZeptX system, in which liquid from each well is pumped through an interrogation space within a capillary flow cell. Laser light (wavelength ∼650 nm) is directed into the interrogation space, and the resulting emission from each labeled antibody (wavelength 668 nm) is measured via a confocal microscope with a photon detector.
For biomarkers in the model, reagents were obtained from R&D Systems (Minneapolis, MN) individually (monomeric adiponectin [ADIPOQ]) or as DuoSet kits (interleukin-2 receptor A [IL-2RA]) and from U.S. Biological (Swampscott, MA) (C-reactive protein [CRP] and ferritin heavy chain 1 [FTH1]). Detection antibodies for ADIPOQ, CRP, and FTH1 were conjugated with Alexa Fluor 647 (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions and purified by ultrafiltration with Microcon YM-30 from Millipore (Billerica, MA). Analytes detected using DuoSet kits used biotinylated detection antibodies and Alexa Fluor 647–conjugated streptavidin (Invitrogen).
One biomarker was measured per 384 microwell plates, using an average of 1.3 μl serum in a total assay volume of 10 μl/well. Biomarker concentrations were calculated as the mean of three replicates. Assays had dynamic ranges of 102–103, intraplate coefficients of variation of ≤5%, and an average lower limit of detection of 10 pg/ml.
Model development process
We devised a model development process applying multiple statistical approaches in which a limited number of the most informative markers would be selected for inclusion. Sixty-four candidate biomarkers were evaluated for inclusion in multimarker models: six routine laboratory measures (FPG, fasting serum insulin, triglycerides, total cholesterol, HDL cholesterol, and LDL cholesterol) and 58 serum proteins. Biomarker candidates were selected for inclusion in the model based on frequency of selection in four statistical learning approaches (for details see online Appendix B, available at http://care.diabetesjournals.org/cgi/content/full/dc08-1935/DC1). We refer to the four approaches as U (univariate logistic regression analyses), E (exhaustive enumeration of small [≤6] multivariate logistic models), H (six different heuristic model-building methods, including forward, backward, and stepwise selection, Kruskal-Wallis, random forest, and Eigengene-based linear discriminant analysis with three different statistical learning algorithms, including logistic regression, linear discriminant analysis, and support vector machines), and B (frequency of selection within 100 bootstrap replicates using the same basic heuristic model-building methods).
Permutation testing was used to establish a threshold of selection frequency for inclusion of a biomarker in the model. For the permutation testing, the entire selection procedure was repeated using a dataset with randomly assigned outcomes. To be included in the model, the selection frequency of a biomarker in the dataset with nonpermuted (true) outcomes had to fall outside the 95% CI of its selection frequency using the dataset with randomly assigned outcomes. To make the model more parsimonious, the biomarkers selected were subjected to backwards selection, sequentially removing biomarkers until all remaining biomarkers were significant at the 90% confidence level.
RESULTS
Baseline characteristics of converter and nonconverter groups are summarized in Table 1. The univariate results for 58 candidate serum biomarkers for 5-year risk of type 2 diabetes are presented in online Appendix A.
. | Converters . | Nonconverters . | P . |
---|---|---|---|
Participants | 160 | 472 | |
Male sex | 110 (68.8) | 279 (59.1) | 0.031 |
NFG and NGT | 12 (7.6) | 226 (49.7) | <0.0001 |
IFG only | 46 (29.1) | 174 (38.2) | 0.0433 |
IGT only | 25 (15.8) | 19 (4.2) | <0.0001 |
Both IFG and IGT | 75 (47.5) | 36 (7.9) | <0.0001 |
Family history | 48 (30.0) | 98 (20.8) | 0.0223 |
Age (years) | 50.2 (45.2–55.0) | 49.8 (44.8–54.8) | <0.0001 |
Height (cm) | 172 (166–179) | 172 (166–179) | 0.9277 |
Weight (kg) | 89 (80–100) | 84 (77–93) | 0.0001 |
BMI (kg/m2) | 29.7 (27.5–32.9) | 27.6 (26.1–30.1) | <0.0001 |
Waist circumference (cm) | 97 (91–109) | 93 (86–99) | <0.0001 |
Hip circumference (cm) | 106 (102–113) | 104 (100–109) | 0.004 |
Systolic blood pressure (mmHg) | 140 (130–150) | 130 (120–144) | <0.0001 |
Diastolic blood pressure (mmHg) | 90 (80–96) | 85 (80–90) | 0.0008 |
Fasting serum total cholesterol (mmol/l) | 5.8 (5.1–6.5) | 5.7 (5.0–6.4) | 0.2513 |
Fasting serum HDL cholesterol (mmol/l) | 1.2 (1.0–1.4) | 1.3 (1.1–1.6) | 0.0013 |
Fasting serum LDL cholesterol (mmol/l) | 3.6 (3.1–4.4) | 3.6 (3.1–4.3) | 0.6898 |
Fasting serum triglycerides (mmol/l) | 1.6 (1.3–2.2) | 1.3 (0.9–1.8) | <0.0001 |
Fasting serum insulin (pmol/l) | 58 (37–81) | 40 (27–59) | <0.0001 |
2-h serum insulin (pmol/l) | 325 (210–486) | 186 (100–298) | <0.0001 |
FPG (mmol/l) | 6.1 (5.7–6.5) | 5.6 (5.3–6.0) | <0.0001 |
2-h plasma glucose (mmol/l) | 8.4 (7.1–9.5) | 6.1 (5.1–7.0) | <0.0001 |
A1C (%) | 6.1 (5.8–6.4) | 5.9 (5.6–6.1) | <0.0001 |
Adiponectin (μg/ml) | 19.5 (9.3–39.6) | 22.2 (12.9–42.6) | 0.0345 |
CRP (μg/ml) | 3.2 (1.5–7.9) | 2.0 (0.8–5.3) | <0.0001 |
Ferritin (ng/ml) | 867 (290–1749) | 483 (168–1045) | <0.0001 |
IL-2RA (pg/ml) | 290 (230–400) | 270 (200–350) | 0.0049 |
. | Converters . | Nonconverters . | P . |
---|---|---|---|
Participants | 160 | 472 | |
Male sex | 110 (68.8) | 279 (59.1) | 0.031 |
NFG and NGT | 12 (7.6) | 226 (49.7) | <0.0001 |
IFG only | 46 (29.1) | 174 (38.2) | 0.0433 |
IGT only | 25 (15.8) | 19 (4.2) | <0.0001 |
Both IFG and IGT | 75 (47.5) | 36 (7.9) | <0.0001 |
Family history | 48 (30.0) | 98 (20.8) | 0.0223 |
Age (years) | 50.2 (45.2–55.0) | 49.8 (44.8–54.8) | <0.0001 |
Height (cm) | 172 (166–179) | 172 (166–179) | 0.9277 |
Weight (kg) | 89 (80–100) | 84 (77–93) | 0.0001 |
BMI (kg/m2) | 29.7 (27.5–32.9) | 27.6 (26.1–30.1) | <0.0001 |
Waist circumference (cm) | 97 (91–109) | 93 (86–99) | <0.0001 |
Hip circumference (cm) | 106 (102–113) | 104 (100–109) | 0.004 |
Systolic blood pressure (mmHg) | 140 (130–150) | 130 (120–144) | <0.0001 |
Diastolic blood pressure (mmHg) | 90 (80–96) | 85 (80–90) | 0.0008 |
Fasting serum total cholesterol (mmol/l) | 5.8 (5.1–6.5) | 5.7 (5.0–6.4) | 0.2513 |
Fasting serum HDL cholesterol (mmol/l) | 1.2 (1.0–1.4) | 1.3 (1.1–1.6) | 0.0013 |
Fasting serum LDL cholesterol (mmol/l) | 3.6 (3.1–4.4) | 3.6 (3.1–4.3) | 0.6898 |
Fasting serum triglycerides (mmol/l) | 1.6 (1.3–2.2) | 1.3 (0.9–1.8) | <0.0001 |
Fasting serum insulin (pmol/l) | 58 (37–81) | 40 (27–59) | <0.0001 |
2-h serum insulin (pmol/l) | 325 (210–486) | 186 (100–298) | <0.0001 |
FPG (mmol/l) | 6.1 (5.7–6.5) | 5.6 (5.3–6.0) | <0.0001 |
2-h plasma glucose (mmol/l) | 8.4 (7.1–9.5) | 6.1 (5.1–7.0) | <0.0001 |
A1C (%) | 6.1 (5.8–6.4) | 5.9 (5.6–6.1) | <0.0001 |
Adiponectin (μg/ml) | 19.5 (9.3–39.6) | 22.2 (12.9–42.6) | 0.0345 |
CRP (μg/ml) | 3.2 (1.5–7.9) | 2.0 (0.8–5.3) | <0.0001 |
Ferritin (ng/ml) | 867 (290–1749) | 483 (168–1045) | <0.0001 |
IL-2RA (pg/ml) | 290 (230–400) | 270 (200–350) | 0.0049 |
Data are n (%) or median (interquartile range) for continuous variables. Data are from 632 subjects in the subsample of 3,032 at-risk individuals with BMI ≥25 kg/m2 and age ≥39 years from the Inter99 cohort. Converters are individuals who developed epidemiologically defined diabetes within 5 years. Nonconverters were randomly selected from the Inter99 cohort in an approximately 3:1 ratio to converters. IFG was defined as FPG of 5.6–6.9 mmol/l. Impaired glucose tolerance (IGT) was defined as 2-h postload glucose of 7.8–11.1 mmol/l. At baseline, 92% of the converters had IFG, IGT, or both, whereas 50% of nonconverters had IFG, IGT, or both. For categorical descriptors, values are counts (percentage of total for that cohort). Differences in frequency between converters and nonconverters were evaluated with a Monte Carlo estimation of the χ2 statistic (2,000 replicates). Differences in medians were evaluated with a Wilcoxon test. NFG, normal fasting glucose; NGT, normal glucose tolerance.
Applying our model development process to all 64 candidate biomarkers (58 serum proteins and 6 routine laboratory measures), we found that CRP, FTH1, glucose, alanine aminotransferase, and insulin were selected by all four approaches (U, E, H, and B); IGF binding protein 2, IL-2RA, and heat shock 70-kDa protein 1B were selected by three approaches (E, H, and B); leptin and interleukin 18 (IL-18) were selected by two approaches (U and E); and ADIPOQ was selected by one approach (E). After backwards selection, the resulting Diabetes Risk Score (DRS) model included six biomarkers (ADIPOQ, CRP, FTH1, glucose, IL-2RA, and insulin). The performance of this model was estimated using the bootstrap resampling approach. Figure 1 compares the area under the receiver operating characteristic (ROC) curves for the fitted performance of this DRS model to assess 5-year type 2 diabetes risk in the dataset (area under the curve [AUC] = 0.78) with that of this DRS model using bootstrap resampling of the dataset (AUC = 0.76). The similarity of the AUCs suggests that this model is not overfit and is likely to be robust when used to assess risk in a different population. A separate analysis is presented in online Appendix B, in which the bootstrap resampling approach to model validation was compared with an approach that used training and validation data subsets. The similarity in performance between the bootstrap estimate of performance on the training set and performance on a sequestered validation dataset validates use of the bootstrap approach to estimate model performance.
Figure 2 compares the AUC of this DRS model with that of several routine laboratory measures (A1C, FPG, fasting serum insulin, 2-h serum insulin, and 2-h plasma glucose from the OGTT), two clinical variables (BMI and sex-adjusted waist circumference), a model using fasting glucose and insulin, and a noninvasive clinical model (age, BMI, waist circumference, and family history of type 2 diabetes in first-degree relatives). The AUC of this DRS model is statistically significantly different from that of single marker measures from fasting blood samples, a model using fasting glucose and insulin, anthropometric measures, and a clinical index, whereas it is equivalent to 2-h glucose (from OGTT) and 2-h insulin (P = 0.18 and P = 0.70, respectively). Adding family history, age, BMI, and waist circumference components of the noninvasive model to this DRS model improved the fit slightly (P = 0.0067, likelihood ratio test) but produced only a marginal performance gain (AUC 0.792 vs. 0.780, P = 0.059). It should also be noted that the DRS average for women is 1.35 lower than that for men (P < 0.0001). However, this sex difference accurately reflects the difference in risk of developing diabetes, and the performance of the DRS is equivalent in both sexes (AUC = 0.770 and 0.783 for women and men, respectively; P = 0.7908).
To extrapolate results from this nested case-control study to the entire at-risk population within the Inter99 cohort and to provide a way to convert a DRS to the absolute risk of developing diabetes for an individual, Bayes' law was applied to adjust for the observed 5.7% 5-year rate of conversion to diabetes for the population with BMI ≥25 kg/m2 and age ≥39 years (see online Appendix C).
Figure 3 compares the stratification of risk achieved by measuring FPG and 2-h glucose to that achieved using this DRS model. Figure 3,A shows that the DRS provides a continuous measure of risk of progression to type 2 diabetes in the at-risk population. Figure 3,B illustrates the risk level by FPG class, using the threshold of 100 mg/dl for IFG. The IFG group has a 5-year conversion risk, which is 1.4-fold higher than the pretest probability, and comprises 56% of the at-risk population. Figure 3 C illustrates the level of risk in each stratum when this DRS is used to stratify the individuals into low-, medium-, and high-risk groups. Individuals in the high-risk group have a 3.5-fold increased risk over the pretest probability and comprise 10% of the population. Individuals in the low-risk group have a 3.5-fold lower risk and comprise 54% of the population, and the remaining medium-risk group has a 1.3-fold increased risk and comprises 36% of the population. As might be expected from the AUC comparison, the risk of development of diabetes in subjects with impaired glucose tolerance (14.6% of the population) is 24.5%, which is similar to the risk in the high-risk DRS group. Yet, the low-risk group identified by DRS has a 1.6% risk of developing diabetes, which is lower than that of subjects with either normal fasting glucose (2.4%) or normal glucose tolerance (2.5%) in this study.
CONCLUSIONS
Previous efforts to identify biomarkers that might be useful in assessing risk of type 2 diabetes have evaluated a limited number of candidates. We sought to explore the predictive power of many molecules in a variety of biological pathways that are known to be altered in diabetes, in addition to glucose homeostasis pathways, hypothesizing that any molecule involved in the pathophysiology of diabetes might provide additional predictive power. The molecular counting technology assay platform, with its small sample volume requirements, permitted the quantitative analysis of many more protein markers than have previously been analyzed in a single study.
The current methods of assessing type 2 diabetes risk are inconvenient, have logistical challenges to implementation, and have poor specificity. A multibiomarker model was developed to assess risk of type 2 diabetes by selecting biomarkers using multiple statistical approaches. The performance of this DRS model is better than that of any other baseline measure of risk and is similar to 2-h glucose levels in an OGTT, a test that is rarely used because of its inconvenience. This DRS identifies high-risk individuals with a four times increased risk of developing diabetes, who comprise ∼10% of the population, and low-risk individuals, who comprise >50% of the population (Fig. 3 C). This DRS model provides a more convenient alternative for obtaining a quantitative risk estimate: a laboratory would measure the biomarker concentrations in a fasting blood sample and return the computed risk score. This DRS model does not depend on anthropometrics or self-reported risk factors (such as family history or tobacco use).
The six biomarkers selected for this DRS model are involved in various biological pathways. Ferritin serves as an antioxidant by binding excess iron, and elevated serum ferritin is a well-established risk factor for future type 2 diabetes (15). Glucose and insulin are critical indicators of metabolic disorders including diabetes and obesity. Adiponectin is involved in the metabolic syndrome and inflammation, and decreased serum adiponectin is a known risk factor for type 2 diabetes (19). CRP and IL-2RA are also involved in inflammatory pathways. Although the association of CRP levels with type 2 diabetes risk has been reported previously (13,20), this is the first study to our knowledge that implicates serum IL-2RA levels in type 2 diabetes risk. In diabetes, effective serum insulin levels are low, whereas levels of circulating glucose and free fatty acids are high, creating an environment of oxidative stress. Such oxidative stress activates inflammatory pathways and ultimately activates T lymphocytes (21). At least one study reported an increase in activated T lymphocytes in patients with type 2 diabetes who were hospitalized for diabetic ketoacidosis (22). Because IL-2RA is upregulated upon T lymphocyte activation, it is possible that increased serum IL-2RA is an indicator of increased levels of activated T-cells.
Because Inter99 was an intervention study, it is possible that the performance of this DRS model observed is lower than might be expected in an observational study. In the Inter99 study, interventions showed small but distinct effects on smoking (23), physical activity (24), and diet (25), although the 5-year conversion rate in this population was similar to that in other populations that did not participate in lifestyle interventions. Any impact of the lifestyle changes on the outcomes in the Inter99 study would not be reflected in the baseline biomarker measurements, which should have made it more difficult to discriminate between those who progressed to type 2 diabetes versus those who did not. In an observational study, it is possible that this DRS model might provide even greater discrimination between those at high versus low risk. The robust performance of the model in an interventional study further strengthens our findings.
In summary, by applying a variety of statistical methods for biomarker selection we developed a DRS model that incorporates six circulating biomarkers. A development process was designed to generate a model that is likely to be generalizable to other populations. This DRS provides superior assessment of diabetes risk compared with fasting plasma glucose alone. In this study, >50% of the subjects had IFG with a risk of developing diabetes only 1.4 times greater than the general population rate. Because this study was limited to overweight middle-aged white individuals, it will be important to replicate these findings in other populations. However, the current results suggest this DRS could be an important tool for identifying the individuals at highest risk of developing type 2 diabetes, a population for whom the most comprehensive prevention strategies should be considered. The improved performance of this model compared with that of single markers demonstrates the value of risk assessment models that incorporate multiple biomarkers from diverse pathophysiological pathways associated with type 2 diabetes.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
APPENDIX
The Inter99 study was initiated by T. Jørgensen (principal investigator [PI]), K. Borch-Johnsen (PI for the diabetes part), T. Thomsen, and H. Ibsen. The present steering group comprises T. Jørgensen (PI), K. Borch-Johnsen (co-PI), and C. Pisinger.
Acknowledgments
K.B.-J. has received honoraria for invited lectures from Novo Nordisk, Bristol-Myers Squibb, Novartis, Pfizer, Hermedico, and AstraZeneca, and K.B.-J.'s pension also has invested in health care companies. No other potential conflicts of interest relevant to this article were reported.
We acknowledge the skillful technical assistance of P. Scott Eastman, Emmie Fernandez, Glenn Hein, Timothy Hamilton, and Jillian Meri, Tethys Bioscience, and thank Laura A. Penny for assisting in the drafting and editing of the manuscript and Anthony Sponzilli for graphic illustrations.