Chronic kidney disease (CKD) in diabetes has a complex molecular and likely multifaceted pathophysiology. We aimed to validate a panel of biomarkers identified using a systems biology approach to predict the individual decline of estimated glomerular filtration rate (eGFR) in a large group of patients with type 2 diabetes and CKD at various stages.
We used publicly available “omics” data to develop a molecular process model of CKD in diabetes and identified a representative parsimonious set of nine molecular biomarkers: chitinase 3-like protein 1, growth hormone 1, hepatocyte growth factor, matrix metalloproteinase (MMP) 2, MMP7, MMP8, MMP13, tyrosine kinase, and tumor necrosis factor receptor-1. These biomarkers were measured in baseline serum samples from 1,765 patients recruited into two large clinical trials. eGFR decline was predicted based on molecular markers, clinical risk factors (including baseline eGFR and albuminuria), and both combined, and these predictions were evaluated using mixed linear regression models for longitudinal data.
The variability of annual eGFR loss explained by the biomarkers, indicated by the adjusted R2 value, was 15% and 34% for patients with eGFR ≥60 and <60 mL/min/1.73 m2, respectively; variability explained by clinical predictors was 20% and 31%, respectively. A combination of molecular and clinical predictors increased the adjusted R2 to 35% and 64%, respectively. Calibration analysis of marker models showed significant (all P < 0.0001) but largely irrelevant deviations from optimal calibration (calibration-in-the-large: −1.125 and 0.95; calibration slopes: 1.07 and 1.13 in the two groups, respectively).
A small set of serum protein biomarkers identified using a systems biology approach, combined with clinical variables, enhances the prediction of renal function loss over a wide range of baseline eGFR values in patients with type 2 diabetes and CKD.
Introduction
Chronic kidney disease (CKD) in patients with diabetes is a global public health burden (1). The two best-validated biomarkers, estimated glomerular filtration rate (eGFR) and albuminuria, are currently used to predict renal prognosis in and enrich clinical trials with high-risk patients. However, the accuracy and precision of predicting renal prognosis is far from optimal, likely because of the complex pathophysiology of CKD in diabetes (2,3). As a consequence, it is difficult to select high-risk populations for clinical trials, which reduces their statistical power and limits the ability to power them to show efficacy of treatment. Another point of concern is that renal function progressively declines in many patients, even when using state-of-the-art therapies. Current treatments address “uniformly active” pathologies such as hyperglycemia or elevated blood pressure, but trials with new drugs interfering with processes active only in subgroups often fail because the “unselected” population recruited is not ideally suited for the targeted therapy. Thus there is a great clinical need to improve the prediction of prognosis for individuals or subgroups, which may be accomplished using novel biomarkers. Novel biomarkers should ideally reflect pathophysiology and thus guide targeted therapy in well-selected cohorts. It is obvious that these goals can only be achieved if biomarker panels capture multiple pathways. Given the complex molecular pathophysiology of CKD in patients with diabetes, it is not surprising that the addition of single biomarkers to risk prediction models does not improve their performance in the general population of patients with CKD (4,5).
Recent advancements in “omics” technologies, followed by innovations in computational sciences, have improved the ability to integrate high-dimensional data using a systems biology approach. This approach offers new ways to describe the pathophysiology of complex diseases. Techniques have been developed that allow us to refine the accuracy of individual predictions based on molecular pathophysiology, account for different stage-specific dominant processes, and uncover novel therapeutic targets/leads. Some recent studies used such a workflow in small cohorts of patients with later stages of CKD of various etiologies. Ju et al. (6) identified epidermal growth factor as a prognostic biomarker in 164 patients, and the test characteristics were validated in two, albeit smaller, cohorts; another group used higher-order descriptive urinary peptide classifiers to predict CKD progression in patients with diabetes (7). A review of these and other studies of prognostic molecular biomarkers in patients with CKD and diabetes was published recently (8).
In the EU Seventh Framework Programme (FP7)–funded project SYSKID (Systems Biology Toward Novel Chronic Kidney Disease Diagnosis and Treatment), our group used publicly available and proprietary experimental data to construct a molecular process model of CKD in diabetes using advanced systems biology techniques (9,10). For in silico–derived molecular processes, we proposed protein biomarkers representing different processes and tested the prognostic power of some of these markers to predict the progression of CKD in an initial small discovery study (11). In the study presented here, we validated these markers in 1,765 patients with type 2 diabetes with incident, early, and later stages of CKD using multiplexed Luminex analysis of serum samples from two prospective interventional studies.
Research Design and Methods
Biomarker Identification and Selection
Biomarker candidates were selected on the basis of a molecular process model representation of CKD in diabetes (3). In brief, large-scale omics profiling was combined with scientific literature mining to derive a set of 2,466 protein-coding genes (PCGs) deregulated in CKD in diabetes. This feature set was mapped onto a hybrid protein interaction network holding more than 16,000 PCGs as nodes and more than 600,000 connecting edges reflecting their interactions, as described by Fechete et al. (12). The resulting subgraph was further segmented into 34 internally highly connected clusters of PCGs using MCODE (13). These processes covered 688 PCGs; the size of each ranged from 3 to 128 nodes (3) (Fig. 1). Sixteen markers that best represented the four largest processes, holding a total of 398 PCGs, were selected and measured in the discovery study (11). Gene ontologies (GOs) of biological processes enriched in PCGs of these four clusters included terms such as regulation of cell proliferation, epithelial-to-mesenchymal transition, regulation of apoptosis, regulation of inflammatory response, collagen metabolic process, and response to hypoxia, among others. Vascular endothelial growth factor-A needed to be removed from further analysis because of minimal value spread, and epidermal growth factor and interleukin-1β were removed because of a lack of data points below the lower assay detection limit. With the remaining 13 marker candidates and available clinical parameters, a multivariable model was designed, resulting from least absolute shrinkage and selection operator selection and bootstrap resampling for loss of eGFR. Finally, a subset of nine molecular markers significantly improved the prediction of accelerated eGFR decline.
The average patient age in the discovery study was 63.5 years and was comparable between the discovery and the validation cohorts. Renal function, however, was more preserved in the discovery study, with an average baseline eGFR of 77.9 mL/min/1.73 m2, compared with average baseline eGFR values of 33.5 and 69.4 mL/min/1.73 m2 in the SMACRO and DIRECT studies, respectively. The average HbA1c value (7.7%) in the discovery cohort was slightly lower than those in the SMACRO (7.9%) and the DIRECT (8.2%) studies, and the average BMI of 32.4 kg/m2 in the discovery cohort was between the average BMI values in the DIRECT (29.5 kg/m2) and the SMACRO (33.1 kg/m2) studies. The average rate of eGFR decline in the discovery study was −2.0 ± 4.7 mL/min/1.73 m2/year over a median of 4 years.
Biomarker Protein Measurement
The marker proteins (chitinase 3-like protein 1 [CHI3L1], growth hormone 1 [GH1], hepatocyte growth factor [HGF], matrix metalloproteinase [MMP] 2, MMP7, MMP8, MMP13, tyrosine kinase [TIE2], and tumor necrosis factor receptor-1 [TNFR1]) in baseline serum samples from two longitudinal studies including participants with diabetes and different stages of CKD were measured using standard multiplexed Luminex technology.
A Human Premixed Multi-Analyte Kit (catalog no. LXSAH-10; R&D Systems, Minneapolis, MN) with a custom assay panel of the markers based on Luminex xMAP technology was used to prepare and analyze samples. We performed all measurements using a Luminex 200 machine (Bio-Rad, Hercules, CA) with xPONENT software, according to the manufacturer’s instructions. Samples were diluted 1:2 with Calibrator Diluent RD6–52. Samples were washed with a vacuum manifold (Millipore, Darmstadt, Germany) and an ATMOS C 261 Aspirator (ATMOS MedizinTechnik, Lenzkirch, Germany). A Luminex 200 Performance Verification Kit containing an xMAP control (catalog no. LX200CONK25; Bio-Rad) was used as the control and a Luminex 200 Calibration Kit with an xMAP calibrator (catalog no. LX200CALK25; Bio-Rad) as the calibrator. The standard curves for each cytokine were generated using the premixed lyophilized standards provided in the kits.
In a 50-μL sample, 50 events per bead were analyzed at a flow rate of 60 μL/min. Minimum events were set to zero, and the median fluorescence was measured. Doublet discriminator gates for two respective measurements were set at 7,500 and 15,500.
Study Design, Data Sources, and Selection of Study Participants
To capture a wide range of CKD stages, we analyzed samples from two prospective interventional studies of patients with type 2 diabetes. The DIRECT study (770 available samples included, 364 from patients in the active treatment group receiving candesartan) included normoalbuminuric, normotensive, or treated hypertensive patients with mild to moderately severe retinopathy (14). In addition, we analyzed 995 available samples (489 from patients receiving active therapy) from the sulodexide macroalbuminuria (SMACRO) trial, which evaluated the renoprotective effects of sulodexide (>900 mg/day) in patients with renal impairment and significant proteinuria (15). The participants from these two studies were grouped according to their baseline eGFR (≥60 and <60 mL/min/1.73 m2); the characteristics of the two populations are shown in Table 1.
Statistical Analysis
Characteristics of patients by eGFR study group were described using mean and SD for normally distributed variables, and frequency and percentage for categorical variables, respectively. We used the Student t test to compare continuous variables and the χ2 test or Fisher exact test to compare categorical variables between groups.
To establish an eGFR slope prediction formula, we estimated two separate random-coefficient linear mixed models—one for patients with eGFR ≥60 mL/min/1.73 m2 at baseline, and one for those with eGFR <60 mL/min/1.73 m2 at baseline. We included only patients with at least 3 months of follow-up, and in each of the mixed models we used the repeated measurements of eGFR, including the baseline value, as the dependent variable, and clinical predictors and protein markers as fixed effects. The rationale for including the baseline eGFR in the dependent variable is that it is assumed to be subject to the same random variation as later values, and therefore should not be treated as a fixed value. It also provides useful information about random, intraindividual variation. Interactions of all variables with time of measurement were also included. The random part of the model consisted of a patient-specific random intercept and a random slope, imposing no restrictions on the structure of their covariance. We eliminated protein marker effects if this increased the Akaike information criterion, yielding a model with optimal predictive performance. This corresponds to selection at an α level of 0.157. In this elimination, we kept the hierarchy of the main effect and the interaction of the markers with time by first eliminating insignificant interactions and then evaluating main effects only if corresponding interactions had already been dropped from the model. Clinical variables were not eliminated because they were selected based on background knowledge. The clinical covariables were sex, age, smoking status, log urinary albumin-to-creatinine ratio, BMI, total cholesterol, mean arterial pressure, and HbA1c.
The models were then described by reporting regression coefficients and associated 95% CIs for estimating the eGFR “intercept” and “slope,” that is, the change per year (Tables 2 and 3). To improve predictions, available baseline values of eGFR were incorporated into predictions. We computed adjusted R2 values to estimate the explained variation of the model (i.e., unadjusted R2 values were multiplied by [N − K − 1]/[N − 1], where N and K denote the number of patients and the number of fixed [main and interaction] effects in the model, respectively). To assess the importance of each predictor, we also decomposed the adjusted R2 values into contributions from the group of clinical variables and the group of protein markers by averaging the adjusted R2 values resulting from models with one group only and the reduction in R2 when a group was completely omitted from the full model. The same was computed separately for each protein marker, scaling the resulting values to add up to the total adjusted R2 contribution of the protein markers.
All analyses were done on a complete-case basis. A P value less than 0.05 was considered statistically significant, and all reported P values are two-sided. We used SAS 9.4 TS 1M2 for Windows (SAS Institute, Inc., Cary, NC) for all analyses.
Results
Biomarker Selection
Nine markers remained significantly associated with the outcome of interest and were used in the analysis in this study: CHI3L1, GH1, HGF, MMP2, MMP7, MMP8, MMP13, TIE2, and TNFR1.
We observed a wide spread of the nine biomarker values in the baseline serum samples of the two studies; a box-and-whisker plot of the concentrations is provided in Supplementary Fig. 1A and B. The calculated decrease of eGFR per year per included patient is depicted in Supplementary Fig. 2A and B.
Performance of the Marker Panel in Predicting eGFR Slopes
Tables 2 and 3 show selected regression coefficients for estimating eGFR slope by demographic and clinical variables and protein markers. To compare the importance of the markers in predicting eGFR, the contribution of each marker to the adjusted R2 value, as well as that of all markers and of all clinical variables together to the adjusted R2 value, were computed and are provided in Table 4. The regression coefficients of some markers were different in the two study groups, with different baseline eGFRs and occasionally even with opposite signs, and some of the markers were selected in one group only (TIE2, TNFR1), supporting the necessity of stage-specific dynamic marker panel modeling. Nonetheless, the biomarkers contributed 15.2% and 33.4% to the explained variances in patients with baseline eGFR levels ≥60 and <60 mL/min/1.73 m2, respectively. Further, 19.6% and 30.6% of explained variance were contributed by clinical and demographic parameters, respectively.
The mean (2.5th, 97.5th percentiles) predicted trajectory of eGFR over time is illustrated in Fig. 2, and the distributions of observed values are superimposed in a box-and-whisker plot. The calibration of the model—that is, the assessment of the goodness of fit of the predicted to the observed eGFR values for the two groups—is shown in Fig. 3A and B. Given the high intraindividual variability in the progression of eGFR over time, our models were well calibrated and not overfitted (calibration slopes of observed vs. predicted values with baseline eGFR levels ≥60 and <60 mL/min/1.73 m2: 1.069 [P = 0.012 for test against a perfect calibration of 1] and 1.137 [P < 0.0001], respectively), suggesting that the in silico biomarker selection was successful and, in particular, models were not overfitted (calibration slopes <1). Calibration-in-the-large, that is, the overall unbiasedness of the eGFR predictions, was adequate as well, and deviations from optimal calibration, although statistically significant, were clinically irrelevant (−1.125 and 0.956, respectively; both P < 0.0001 against perfect unbiasedness).
Conclusions
To our knowledge, this study validates for the first time that a systems biology approach to omics data can identify a combination of serum biomarkers (in this case, nine) that are able to predict the decline of eGFR in a large population of patients with type 2 diabetes across a wide range of glomerular filtration rates. The biomarker models for eGFR ≥60 and <60 mL/min/1.73 m2 performed similarly to those including established clinical risk factors, and a combination of both improved the accuracy, indicating that additional information is provided by the markers.
We used the molecular interaction network model of CKD in diabetes that was developed during the FP7–funded project SYSKID (10). The network consists of various processes characterized by PCGs with a high-grade interaction and goes beyond conventional molecular analysis tools such as Gene Ontology or PANTHER, as the complexity of interactions between pathways is also taken into account. Our systems biology analysis provided the rationale to focus on only a subset of PCGs reported to be deregulated in CKD (688 of 2,466). The in silico processes generated directly reflect the pathophysiology of CKD and guided the selection of the biomarker panel, which was tested for its prognostic potential.
We hypothesized that the progression of CKD in diabetes is complex, with high interindividual variability, and over time, even the high intraindividual variability of the molecular processes is involved. First, we concluded that the biomarkers selected were able to predict the prognosis, which demonstrated that the representative processes are of pathophysiological importance. Next, we were able to show that these processes are stage specific (some biomarkers predict prognosis in early disease, others in late disease) by testing the biomarker in a group of patients with a wide range of renal dysfunction at baseline. Finally, the wide spread of baseline eGFR values is an indication of considerable interindividual variability of CKD pathophysiology in diabetes. Given this complexity, our model approach seems superior to many conventional risk factor analyses. Even though eGFR and age are valuable biomarkers for risk assessment, our panel is directly linked to the pathophysiology of the disease and thus not only predicts prognosis but also could allow the identification of new treatment targets and, importantly, guide targeted therapy.
Some biomarkers included in the panel have already been investigated by others. A group from the Joslin Diabetes Center found that TNFR1 serum concentration predicted the incidence of end-stage renal disease and eGFR loss to values <60 mL/min/1.73 m2 in patients with type 2 and type 1 diabetes and preserved renal function (eGFR >60 mL/min/1.73 m2 and no or mild proteinuria) (16,17). Those authors did not find a correlation with other glomerular or tubular biomarkers or clinical variables. TNFR1 data related to eGFR loss in advanced CKD stages and patients with proteinuria are lacking. In our study, with no further adjustment, the effect of TNFR1 goes in the same direction as in the study by Niewczas et al. (16), whereas adjusting for clinical variables and for biomarkers turns this effect in the opposite direction. This is a result of confounding with those other variables. Niewczas et al., for example, adjusted for clinical predictors and did not adjust for other biomarkers. Thus there is no contradiction between the studies since the sign of the slope depends on the covariables in the model.
An additional likely explanation for this intriguing finding is the competing risk of death in patients with elevated TNFR1, as was recently shown in patients with type 2 diabetes and advanced proteinuric CKD (18). Thrailkill et al. (19) found MMP2 to be dysregulated in plasma and in particular in urine of patients with type 1 diabetes. Based on the latter result, those authors suggested that MMP2 excretion might directly mirror intrarenal pathology. Among the other MMPs, only urinary MMP9 (20) and serum MMP7 (21) were investigated previously (not MMP7, MMP8, or MMP13).
Several experimental and clinical studies showed that HGF, which is also part of our panel, is associated with renoprotection and may even serve as a therapeutic target in patients with diabetes and CKD (22,23). In this context, even though HGF was not included in the prediction models, a strong pathophysiological relevance should not be excluded. GH1, which was negatively associated with eGFR slope (and thus faster progression) in our analysis, induced transforming growth factor-β and accelerated podocyte depletion and proteinuria in an experimental model of diabetic nephropathy (24). CHI3L1 is a member of the glycosyl hydrolase 18 family and is involved in stress response. We found CHI3L1 to be a strong contributor to eGFR loss in patients with higher baseline GFR (i.e., ≥60 mL/min/1.73 m2), which is in line with a study by Żurawska-Płaksej et al. (25).
In contrast to the targeted approaches mentioned above, our parsimonious biomarker panel representing molecular features deregulated in CKD covers a wide range of the critically involved molecular pathways and might therefore better allow patient subclassification (3,5,12). In addition, the processes can also be directly linked to clinical phenotypes. For example, TIE2, which is associated with disease progression, represents process unit 1. Using GO term analysis, PCGs in process unit 1 preferentially are associated with regulation of cell proliferation, cell signaling, and cell surface receptor–linked signal transduction, among other processes, and one could speculate that medications interfering with these pathways might be tested in CKD. HGF as a proxy for process unit 6 is assigned to the enriched GO terms epithelial-to-mesenchymal transition, antiapoptosis, and cell proliferation. TNFR1 as a candidate from process unit 7 is assigned to the enriched GO biological process regulation of inflammatory response. The set of matrix metalloproteinases (MMP2, MMP7, MMP8, and MMP13) from process unit 8 are involved in the collagen metabolic process and thus are linked to fibrotic processes, one of the hallmarks of CKD in patients with diabetes.
Certainly, our approach has some limitations; these were, however, carefully addressed by several sensitivity analyses. Omission of some of the markers in the panel always led to inferior model performance, and the addition of other random markers, which by chance were available for some samples, did reduce the adjusted R2. Furthermore, all patients in this study had type 2 diabetes, and thus results may not be generalizable to other populations.
We were able to derive a representative set of parsimonious biomarkers from an in silico model of CKD in diabetes elaborated from omics data via sophisticated bioinformatics. The biomarker panel was able to predict renal function loss in a large group of patients with type 2 diabetes and a wide range of baseline eGFR values.
Summary
We conclude that a small set of biomarkers identified by a systems biology approach enhances the prediction of renal function loss using standard clinical variables in patients with type 2 diabetes with a wide range of baseline eGFR.
Clinical trial reg. nos. NCT00252694 and NCT00130312, clinicaltrials.gov.
G.M. and H.J.L.H. contributed equally to this study.
Article Information
Funding. Funding for DIRECT and SMACRO was provided by AstraZeneca and Takeda Pharmaceuticals, and Keryx Biopharmaceuticals, respectively. Funding for this study was provided by SYSKID, an EU FP7 research project (SysKid HEALTH–F2–2009–241544).
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
The funding sources had no role in the design, conduct, or analysis of the study.
Author Contributions. G.M., H.J.L.H., D.d.Z., P.R., and R.O. designed the study. C.A., A.H., and P.P. performed the systems biology analysis. G.H., A.K., and M.P. conducted the statistical analysis. A.K. and J.S. designed the Luminex assays and measured the samples. All authors contributed substantially to the analysis of data and preparation of the manuscript. R.O. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.