Metabolomics, with its wealth of data, offers a valuable avenue for enhancing predictions and decision-making in diabetes. This observational study, aimed to leverage machine learning (ML) algorithms to predict the 4-year risk of developing T2DM using targeted quantitative metabolomic data. A cohort of 279 cardiovascular risk patients, who underwent coronary angiography and who were initially free of T2DM according to ADA criteria, were followed for up to 4 years. During this time, 11.5% newly developed T2DM. Targeted metabolomics (Biocrates, AUSTRIA) was performed at baseline, using liquid chromatography (LC), - mass spectroscopy (MS), and flow injection analysis (FIA) - MS respectively. After preprocessing the metabolomics data set, 362 variables were used for ML, employing the caret package (CRAN, R). The dataset was divided into training and test sets (75:25 ratio), and we used an oversampling approach to address the classifier (T2DM incidence) imbalance. The Multilayer Perceptron (MLP) after size-tuning demonstrated the most promising predictive capabilities, exhibiting a sensitivity of 63%, a specificity of 79%, and an accuracy of 77%. The most important variables (top20, figure) were ceramides, bile acids, and hexoses.
In conclusion, ML analysis of large metabolomic data is a promising tool for identifying individuals at risk of developing T2DM and opens avenues for personalized and early intervention strategies.
A. Leiherer: None. A. Muendlein: None. C.H. Saely: None. T. Plattner: None. B. Larcher: None. A. Mader: None. A. Vonbank: None. R. Laaksonen: None. P. Fraunberger: None. H. Drexel: None.