We tested the Complement proteome’s prognostic accuracy for kidney outcomes in T1D using a machine learning (ML) approach.

Methods: This prospective cohort study comprised 193 Joslin Kidney Study subjects with T1D and an overt diabetic kidney disease at baseline followed for years. We performed urinary measurements of the Complement proteins (n=82) using aptamer proteomics. The outcomes of interest included developing end-stage kidney disease (ESKD) in or 3 years and a kidney slope. We tested biostatistical logistic regression (LR) and 7 ML models (principal component (PCA) ; decision tree: random forest (RF) and generalized boosting (GB) ; penalized regression: elastic net (EN) , lasso (LS) and ridge (RD) ; and neural network (NN)) .

Results: The LR model with the top protein had decent model accuracy (c=0.80-0.86) across the kidney outcomes, which was further improved in the ML model with proteins (PCA) . The performance of the 7 ML was comparable or higher with the best ML model being EN (AUC=0.80-0.90) . Accuracy was better for the shorter follow-up period or slope-based outcomes (Figure 1) . Models fed with 82 proteins did not much improve performance.

Conclusions: Multiple Complement proteins are strongly associated with short-to-long term kidney outcomes in T1D and offer attractive prognostic accuracy in ML models to complement biostatistical tools.


Z.Md dom: None. S.Moon: None. S.Pickett: None. S.Dillon: None. M.Niewczas: None.


National Institutes of Health (RO1 DK123459) Beatson FoundationHearst Foundation

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at http://www.diabetesjournals.org/content/license.