Gestational diabetes mellitus (GDM) predisposes pregnant individuals to perinatal complications and long-term diabetes and cardiovascular diseases. We developed and validated metabolomic markers for GDM in a prospective test-validation study. In a case-control sample within the PETALS cohort (GDM n = 91 and non-GDM n = 180; discovery set), a random PETALS subsample (GDM n = 42 and non-GDM n = 372; validation set 1), and a case-control sample within the GLOW trial (GDM n = 35 and non-GDM n = 70; validation set 2), fasting serum untargeted metabolomics were measured by gas chromatography/time-of-flight mass spectrometry. Multivariate enrichment analysis examined associations between metabolites and GDM. Ten-fold cross-validated LASSO regression identified predictive metabolomic markers at gestational weeks (GW) 10–13 and 16–19 for GDM. Purinone metabolites at GW 10–13 and 16–19 and amino acids, amino alcohols, hexoses, indoles, and pyrimidine metabolites at GW 16–19 were positively associated with GDM risk (false discovery rate <0.05). A 17-metabolite panel at GW 10–13 outperformed the model using conventional risk factors, including fasting glycemia (area under the curve: discovery 0.871 vs. 0.742, validation 1 0.869 vs. 0.731, and validation 2 0.972 vs. 0.742; P < 0.01). Similar results were observed with a 13-metabolite panel at GW 17–19. Dysmetabolism is present early in pregnancy among individuals progressing to GDM. Multimetabolite panels in early pregnancy can predict GDM risk beyond conventional risk factors.
Introduction
The global prevalence of gestational diabetes mellitus (GDM) has increased by >35–90% over the past decades, to 6–12% among pregnant individuals (1,2). Given the plethora of wide-ranging adverse health sequelae, including perinatal complications and long-term diabetes and cardiovascular diseases, among women and their offspring, GDM represents a growing urgent worldwide public health concern (3). In routine clinical practice, GDM is screened and diagnosed in late pregnancy at 24–28 weeks of gestation; however, metabolic perturbations may have begun in early pregnancy (4,5). Identification of early predictive biomarkers for GDM is warranted to inform early risk stratification, prevention, and treatment strategies.
Disturbances in metabolic programming during pregnancy have been implicated in the development of GDM (6). While glucose and carbohydrate metabolism have been extensively studied for GDM pathophysiology, other key pathways, including amino acid and lipid metabolism, suggested to play key roles in this process, remain understudied (7). Notably, the conventional single-factor epidemiologic approach to identifying predictors has been impeded by the difficulty in measuring the holistic metabolic profile and inability to address interplays among fuel substrates (8). In contrast, untargeted metabolomic approaches provide a comprehensive and systematic snapshot of multiple metabolic pathways involving numerous small molecules, revealing predictive biomarkers and disease mechanisms (9). However, metabolomic studies for GDM remain sparse (10), with small sample sizes (mostly <50 GDM cases); skewed racial/ethnic distribution (primarily Caucasian based); lack of standardized diagnosis of GDM; variations in tissues (blood, urine, or amniotic fluid), fasting status (largely nonfasting), and metabolomic techniques (targeted vs. untargeted); and, most importantly, lack of external or even internal validation, which have collectively limited the generalizability of previous findings (11–17).
To address these critical data gaps, we conducted a prospective discovery and validation study to examine the associations of fasting serum untargeted metabolomics from early to mid-pregnancy with risk of GDM among pregnant individuals with multiracial/ethnic backgrounds in a large integrated clinical setting, where universal screening for and standardized diagnosis of GDM were implemented. We developed and validated machine-learning models using multimetabolite panels in early and mid-pregnancy for GDM prediction.
Research Design and Methods
Study Design and Population
PETALS (Pregnancy Environment and Lifestyle Study) is a population-based longitudinal multiracial/ethnic cohort study. The study design has been described in detail elsewhere (18). This population was drawn from the membership of Kaiser Permanente Northern California (KPNC), an integrated health care delivery system serving 4.5 million members, who are highly representative of the entire population living in the served geographic area (19). After weekly searches of the electronic health records, pregnant individuals aged 18–45 years carrying singletons were invited to participate in the study before 11 weeks of gestation. Fasting blood samples were collected after an 8- to 12-h overnight fast at study clinic visits 1 (gestational weeks [GW] 10–13; baseline) and 2 (GW 16–19). Anthropometric measurements and questionnaires on health history and lifestyle were completed at visit 1. The study was approved by the Kaiser Foundation Research Institute Human Subjects Committee. Written informed consent was obtained from all participants.
In this clinical setting, pregnant women were universally screened for GDM with the 50-g 1-h glucose challenge test at GW ∼24–28. If the screening test was abnormal (>7.8 mmol/L), a diagnostic 100-g 3-h oral glucose tolerance test (OGTT) was performed in the morning after a 12-h fast. GDM was ascertained by meeting any of the following criteria used at KPNC: 1) two or more plasma glucose values after the OGTT meeting or exceeding the Carpenter-Coustan thresholds (1-h 10 mmol/L, 2-h 8.7 mmol/L, and 3-h 7.8 mmol/L), as recommended by the American College of Obstetricians and Gynecologists (20), or 2) fasting glucose ≥5.1 mmol/L measured alone or during the OGTT, as recommended by the International Association of Diabetes and Pregnancy Study Groups and American Diabetes Association (21,22). Plasma glucose measurements were performed using the hexokinase method at the KPNC regional laboratory, which participated in the College of American Pathologists’ accreditation and monitoring program (23).
Of the 3,346 pregnant individuals who completed visit 1 (baseline) in the PETALS cohort, 162 (4.8%) did not have data on screening for GDM; of these 162 women, 32 had a pregnancy loss (1.0%), 46 (1.4%) were no longer KPNC members after visit 1, and 84 (2.5%) were not screened. Among participants screened for GDM, 194 met the Carpenter-Coustan criteria and 116 met the isolated fasting glucose threshold for the diagnosis of GDM. To improve the homogeneity of GDM diagnosis and generalizability of our findings, we only included women with GDM based on the Carpenter-Coustan criteria in the PETALS discovery set and validation set 1 as described below (20). We first designed a nested case-control study within the prospective PETALS cohort as the discovery set (Fig. 1A), including 91 GDM cases and 180 non-GDM controls who delivered between April 2015 and January 2018. GDM cases and non-GDM controls were matched at a ratio of one to two (with two controls missing blood samples) to cases according to age (±5 years), race/ethnicity, calendar time for enrollment (±3 months), and GW at baseline clinic visit (±3 weeks). To derive the validation set 1, we randomly selected ∼15% of women in the PETALS cohort who delivered between April 2014 and May 2019, were not selected in the discovery set, and had fasting serum collected at study clinic visits 1 and 2 (i.e., GDM n = 42 and non-GDM n = 372) (Fig. 1A).
We further derived validation set 2 from the GLOW (Gestational Weight Gain and Optimal Wellness) randomized controlled trial, which aimed to reduce excess gestational weight gain through a behavioral lifestyle intervention (24). Women with a prepregnancy BMI between 25.0 and 40.0 kg/m2 who were aged ≥18 years and carrying singleton pregnancies were recruited before GW 13 and completed a baseline preintervention study visit at GW 8–15, during which fasting (≥8 h) blood samples were collected. We identified 35 women with GDM based on the Carpenter-Coustan criteria and matched them at a one-to-two ratio to 70 non-GDM women according to age (±5 years), race/ethnicity, calendar time for enrollment (±3 months), and GW at baseline visit (±3 weeks) (Fig. 1B). Of note, the intervention did not affect the incidence of GDM compared with the control group, and blood samples for metabolomic measurement were collected at the baseline visit before the intervention was conducted (24).
Untargeted Metabolomic Data Acquisition and Processing
Fasting serum samples were stored at −80°C before analysis for the discovery set and both validation sets in the same biorepository facility. Untargeted metabolomic data were generated by established assays at the West Coast Metabolomics Center at University of California Davis (25). Primary metabolites, such as sugars, hydroxyl acids, and amino acids, were analyzed by a Leco Pegasus IV time-of-flight mass spectrometer using splitless injection into an Agilent gas chromatograph. For quality control (QC) and normalization, one blank negative control extraction was prepared per 10 samples from empty Eppendorf tubes as starting material to monitor carryover and chemical artifacts, in addition to one BioreclamationIVT plasma (cat. no. HMPLEDTA) QC sample per 10 study samples. Raw spectra for serum and tissue metabolites were processed by BinBase software (26), which matches each sample mass spectrum datum and retention index against the MassBank.us public libraries and the licensed NIST20 library. For retention time correction, C8-C30 fatty acid methyl esters were added as internal standards. Data from the LECO instrument software ChromaTOF were processed by the BinBase database (27). Missing peaks were not imputed but were automatically replaced by local noise values from raw data. Metabolites were retained if the median peak intensities were at least three times higher than local noise. Reported metabolites were quantified using ion peak heights of deconvoluted unique ions and were normalized by the sum of all annotated metabolite intensities. Additional data QC strategies included 1) systematic error removal using random forest normalization to account for batch effect and improve normality (28), 2) removal of compounds with >50% missing values in the preprocessing stage for the mass spectrum data and retention index, and 3) removal of compounds with high technical variance (coefficient of variation >50%). A total of 157 known metabolites were annotated meeting QC criteria, and 602 unknown metabolites were detected. We included 144 known metabolites with a coefficient of variation <20.0% (average 5.8%; range 1.2–17.8%) in our analysis. Of the 144 known metabolites, there were missing data on peak intensities for three metabolites (missing rate range 0.4–1.5%), which were imputed using the minimum peak intensity of each metabolite divided by 2.
Covariates
Potential covariates included age at childbirth (continuous), race/ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic, Asian/Pacific Islander, or other/unknown), education (high school or less, some college/associate degree, or college degree or higher), nulliparity (yes or no), prepregnancy BMI (non-Asians: underweight [BMI <18.5 kg/m2], normal weight [18.5–24.9 kg/m2], overweight [25.0–29.9 kg/m2], or obese [≥30.0 kg/m2] and Asians: underweight [<18.5 kg/m2], normal weight [18.5–22.9 kg/m2], overweight [23.0–27.4 kg/m2], or obese [≥27.5 kg/m2]) (29), chronic hypertension (yes or no), family history of diabetes (yes or no), history of GDM (yes or no), and GW at blood collection (continuous). Information on race/ethnicity, age, education, and family history of diabetes was collected by structured questionnaire administered at the baseline visit. Medical history was extracted from electronic health records.
Statistical Methods
Differences in participant characteristics between cases and controls in the discovery set were assessed by mixed-effect linear regression for continuous variables and binomial/multinomial logistic regression with generalized estimating equations for binary/multilevel categorical variables, accounting for the case-control matching.
In the univariate analysis, we conducted conditional logistic regression to examine the associations of individual metabolites at GW 10–13 and 16–19 with risk of GDM, respectively, adjusting for aforementioned covariates. We also examined the associations between changes in metabolites across the two time points, as assessed by the ratio of individual metabolite peak intensity at GW 16–19 divided by that at GW 10–13, and risk of GDM. The Benjamini-Hochberg false discovery rate (FDR) controlling method was used to adjust for multiple comparisons. We present adjusted odds ratios of GDM in relation to individual metabolites using volcano plots according to the effect size of odds ratios and significance level of P values. We also present radar plots to visualize these findings according to super pathways as determined by the automated chemical classification with a comprehensive computable taxonomy (i.e., CalssyFire) (30).
We further conducted multivariate analysis using the chemical similarity enrichment analysis (ChemRICH) to map biochemical clusters and facilitate biologic interpretation (31). ChemRICH is a statistical enrichment approach based on chemical similarity rather than sparse biochemical knowledge annotations; it yielded greater statistical power compared with the univariate analysis focusing on associations of individual metabolites with GDM risk. ChemRICH identified study-specific nonoverlapping sets of metabolites by combining chemical similarity and classification ontologies. The P values of metabolite clusters were obtained using the Kolmogorov-Smirnov test. An FDR-adjusted P value <0.05 indicates a statistically significantly enriched compound cluster.
For metabolomic marker discovery, we developed sequential predictive models for GDM risk using model 1 (conventional risk factors including aforementioned covariates and fasting serum glucose concentrations), model 2 (a multimetabolite panel at GW 10–13 or 16–19, respectively), and model 3 (conventional risk factors from model 1 and the multimetabolite panel from model 2). To achieve maximum predictability, the multimetabolite panels were selected among all metabolites using least absolute shrinkage and selection operator (LASSO) regression to develop more interpretable and parsimonious models (32). We plotted receiver operating characteristic curves and evaluated the incremental prediction capacity of models 2 and 3 beyond model 1 by comparing area under the curve (AUC) statistics using the DeLong test (33). To derive results generalizable to the entire cohort, samples from the case-control discovery set were reweighted using sampling weights developed via a weighted likelihood approach based on the inverse probability of selected GDM cases or non-GDM controls versus their counterparts in the entire PETALS cohort, respectively. Specifically, GDM cases had a sampling probability of 91 over the total number of women with GDM in the PETALS cohort (n = 310). Sampling probability of non-GDM controls was calculated using a logistic regression model in the entire cohort excluding GDM cases, with matching factors as predictors. Each 95% CI around an estimate of distribution of the reweighted sample contained the original estimate in the PETALS cohort, confirming effective reweighting (Supplementary Table 1). To avoid overfitting, 10-fold cross-validation was performed to derive conservative estimates within the discovery set. We further evaluated the predictive performance of multimetabolite panels identified in the discovery set in validation sets 1 and 2.
Data and Resource Availability
Extracted data are available within the publication and its Supplementary Material. A deidentified analytic data set used in this study can be shared with qualified researchers, subject to approval by the Kaiser Foundation Research Institute Human Subjects Committee and by the human subjects committees at the institutions requesting the data and a signed data sharing agreement. Please send all requests to the corresponding author of this article.
Results
Participant Characteristics
In the PETALS discovery set, women with GDM were more likely to be overweight or obese before pregnancy and have chronic hypertension, family history of diabetes, and history of GDM compared with non-GDM controls (all P < 0.05; Table 1). Similar patterns between women with and without GDM were observed in validation sets 1 and 2 (Supplementary Table 2). Participant characteristics in validation set 1 (a random PETALS sample) were similar to those in the entire PETALS cohort. All women in validation set 2 were overweight or obese before pregnancy per the study design and on average older than women in the PETALS discovery set and validation set 1 (≥35 years: 33.3 vs. 28.4 and 22.7%, respectively), representing a higher-risk group for GDM.
. | All (n = 271) . | GDM (n = 91) . | Non-GDM* (n = 180) . | P† . |
---|---|---|---|---|
Age at delivery, years | 0.09 | |||
<25 | 21 (7.7) | 7 (7.7) | 14 (7.8) | |
25–29 | 51 (18.8) | 13 (14.3) | 38 (21.1) | |
30–34 | 122 (45.0) | 42 (46.2) | 80 (44.4) | |
≥35 | 77 (28.4) | 29 (31.9) | 48 (26.7) | |
Race/ethnicity | 0.48 | |||
White | 59 (21.8) | 19 (20.9) | 40 (22.2) | |
Hispanic | 89 (32.8) | 30 (33.0) | 59 (32.8) | |
Black | 25 (9.2) | 5 (5.5) | 20 (11.1) | |
Asian/Pacific Islander | 82 (30.3) | 34 (37.4) | 48 (26.7) | |
Other/unknown | 16 (5.9) | 3 (3.3) | 13 (7.2) | |
Education | 0.89 | |||
High school or less | 31 (11.4) | 10 (11.0) | 21 (11.7) | |
Some college | 109 (40.2) | 37 (40.7) | 72 (40.0) | |
College graduate or above | 131 (48.3) | 44 (48.4) | 87 (48.3) | |
Nulliparity | 121 (44.6) | 38 (41.8) | 83 (46.1) | 0.50 |
Prepregnancy BMI, kg/m2‡ | 0.01 | |||
Underweight/normal weight | 18 (19.8) | 64 (35.6) | 82 (30.3) | |
Overweight | 35 (38.5) | 54 (30.0) | 89 (32.8) | |
Obese | 38 (41.8) | 62 (34.4) | 100 (36.9) | |
Chronic hypertension | 16 (5.9) | 9 (9.9) | 7 (3.9) | 0.04 |
Family history of diabetes | 66 (24.4) | 34 (37.4) | 32 (17.8) | 0.001 |
History of GDM | 18 (6.6) | 17 (18.7) | 1 (0.6) | <0.0001 |
. | All (n = 271) . | GDM (n = 91) . | Non-GDM* (n = 180) . | P† . |
---|---|---|---|---|
Age at delivery, years | 0.09 | |||
<25 | 21 (7.7) | 7 (7.7) | 14 (7.8) | |
25–29 | 51 (18.8) | 13 (14.3) | 38 (21.1) | |
30–34 | 122 (45.0) | 42 (46.2) | 80 (44.4) | |
≥35 | 77 (28.4) | 29 (31.9) | 48 (26.7) | |
Race/ethnicity | 0.48 | |||
White | 59 (21.8) | 19 (20.9) | 40 (22.2) | |
Hispanic | 89 (32.8) | 30 (33.0) | 59 (32.8) | |
Black | 25 (9.2) | 5 (5.5) | 20 (11.1) | |
Asian/Pacific Islander | 82 (30.3) | 34 (37.4) | 48 (26.7) | |
Other/unknown | 16 (5.9) | 3 (3.3) | 13 (7.2) | |
Education | 0.89 | |||
High school or less | 31 (11.4) | 10 (11.0) | 21 (11.7) | |
Some college | 109 (40.2) | 37 (40.7) | 72 (40.0) | |
College graduate or above | 131 (48.3) | 44 (48.4) | 87 (48.3) | |
Nulliparity | 121 (44.6) | 38 (41.8) | 83 (46.1) | 0.50 |
Prepregnancy BMI, kg/m2‡ | 0.01 | |||
Underweight/normal weight | 18 (19.8) | 64 (35.6) | 82 (30.3) | |
Overweight | 35 (38.5) | 54 (30.0) | 89 (32.8) | |
Obese | 38 (41.8) | 62 (34.4) | 100 (36.9) | |
Chronic hypertension | 16 (5.9) | 9 (9.9) | 7 (3.9) | 0.04 |
Family history of diabetes | 66 (24.4) | 34 (37.4) | 32 (17.8) | 0.001 |
History of GDM | 18 (6.6) | 17 (18.7) | 1 (0.6) | <0.0001 |
Data are given as n (%).
Case-control ratio of one to two, with two GDM cases; each had only one matched control with biospecimens available.
P values for differences between case and control participants were obtained by mixed-effect linear regression models for continuous variables and binomial/multinomial logistic regression with generalized estimating equations for binary/multilevel categorical variables, accounting for matched case-control pairs.
Non-Asians were categorized as underweight (BMI <18.5 kg/m2), normal weight (18.5–24.9 kg/m2), overweight (25.0–29.9 kg/m2), or obese (≥30.0 kg/m2). Asians were categorized as underweight (<18.5 kg/m2), normal weight (18.5–22.9 kg/m2), overweight (23.0–27.4 kg/m2), or obese (≥27.5 kg/m2).
Univariate Prospective Associations of Individual Metabolites With Risk of GDM
Among all 144 known metabolites, the distribution according to metabolic super pathway was as follows: amino acids, peptides, and analogs (26.4%); lipids and lipid-like molecules (19.4%); organic oxygen compounds (20.8%); organoheterocyclic compounds (12.5%); other organic acids and derivatives (13.9%); and xenobiotics (6.9%) (Supplementary Fig. 1). Univariate prospective associations of individual metabolites clustered by super pathway with risk of GDM are presented in radar plots (Supplementary Fig. 2). At GW 10–13, 15 metabolites were significantly and positively associated with GDM risk and one metabolite was inversely associated with GDM risk (all FDR <0.05) (Fig. 2; see effect sizes in Supplementary Fig. 3). At GW 16–19, 28 and two metabolites were positively and inversely associated with GDM risk, respectively (all FDR <0.05) (Fig. 2; see effect sizes in Supplementary Fig. 3). Across the two gestational periods, changes in six metabolites were positively associated with risk of GDM and one metabolite was inversely associated with risk of GDM (P < 0.05); however, none persisted after FDR adjustment (Supplementary Table 3).
Multivariate Chemical Enrichment Analysis
To further facilitate biologic interpretation, we used ChemRICH. Among all the metabolite clusters enriched at GW 10–13 (Supplementary Fig. 4A), those significantly and positively associated with GDM risk were predominantly purinones (P = 0.0005), aromatic amino acids (P = 0.011), pyrimidines (P = 0.017), indoles (P = 0.019), acidic amino acids (P = 0.031), and polyamines (P = 0.040), whereas the acyclic acid cluster was significantly and inversely associated with risk of GDM (P = 0.024) (Fig. 3A and Supplementary Table 4). After FDR adjustment, only the purinone set at GW 10–13 remained significantly and positively associated with GDM risk (FDR 0.011).
Among all the metabolite clusters enriched at GW 16–19 (Supplementary Fig. 4B), in addition to the significant clusters at GW 10–13 (all except polyamines), the amino alcohol, hexose, sugar acid, guanidine, carbocyclic acid, unsaturated fatty acid, and glucuronate clusters were positively associated with GDM risk (all P < 0.05), whereas only the amino acid (basic and other), amino alcohol, hexose, indole, purinone, and pyrimidine clusters remained significant after FDR adjustment (all FDR 0.041) (Fig. 3B and Supplementary Table 4). On the other hand, the amide cluster with allantoic acid as the key metabolite was significantly and inversely associated with GDM risk (P = 0.008; FDR 0.041). Across the two gestational periods, increased concentrations of the hexose cluster with fructose as the key metabolite were positively associated with risk of GDM (P = 0.022; data not shown).
Multimetabolite Panels for GDM Prediction Using Machine Learning
To evaluate the incremental predictability of metabolites beyond conventional risk factors, including fasting serum glucose (model 1 as reference), we developed multimetabolite panels at GW 10–13 and 16–19 using 10-fold cross-validation (model 2 specific to each gestational window) and an additive model including predictors in models 1 and 2 as model 3 at each gestational window, respectively. At GW 10–13, LASSO regression identified a 17-metabolite panel (three amino acids, four lipid metabolites, two purine and pyrimidine metabolites, and eight carbohydrate metabolites; model 2), which demonstrated superior predictive performance compared with model 1 (10-fold cross-validation AUC [95% CI] 0.832 [0.777–0.887] vs. 0.742 [0.677–0.807]; Pmodel 2 vs. 1 = 0.021) (Fig. 4A; see predictive performance statistics in Supplementary Table 5 and model optimization in Supplementary Fig. 5A). The addition of the 17-metabolite panel to conventional risk factors (model 3) demonstrated further incremental predictability beyond model 1 (0.871 [0.824–0.918] vs. 0.742 [0.677–0.807]; Pmodel 3 vs. 1 <0.001). This multimetabolite panel illustrated robust predictive performance in both validation sets 1 (AUC in models 1–3 0.731, 0.771, and 0.869, respectively; Pmodel 2 vs. 1 = 0.062; Pmodel 3 vs. 1 = 0.001) and 2 (0.742, 0.907, and 0.972, respectively; Pmodel 2 vs. 1 = 0.002; Pmodel 3 vs. 1 <0.001) (Table 2). Similarly, at GW 16–19, LASSO regression identified a 13-metabolite panel (three amino acids, three lipid metabolites, two purine and pyrimidine metabolites, and five carbohydrate metabolites; model 2), which demonstrated superior predictive performance compared with model 1 (10-fold cross-validation AUC [95% CI] of models 1–3 0.732 [0.660–0.803], 0.797 [0.728–0.865], and 0.838 (0.775–0.900), respectively; Pmodel 2 vs. 1 = 0.012; Pmodel 3 vs. 1 = 0.004) (Fig. 4B; see predictive performance statistics in Supplementary Table 5 and model optimization in Supplementary Fig. 5B). Similar and robust predictive performance of this 13-metabolite panel was observed in validation set 1 (AUC in models 1–3 0.719, 0.774, and 0.830, respectively; Pmodel 2 vs. 1 = 0.017; Pmodel 3 vs. 1 = 0.007) (Table 2), with no corresponding data in validation set 2.
. | Validation set 1* . | Validation set 2† . |
---|---|---|
GW 10–13 | ||
Model 1‡ | 0.731 (0.634–0.827) | 0.742 (0.639–0.846) |
Model 2§ | 0.771 (0.693–0.850) | 0.907 (0.850–0.965) |
Model 3ǁ | 0.869 (0.800–0.937) | 0.972 (0.940–0.999) |
Pmodel 2 vs. 1¶ | 0.062 | 0.002 |
Pmodel 3 vs. 1¶ | 0.001 | <0.001 |
GW 16–19 | ||
Model 1‡ | 0.719 (0.649–0.789) | NA |
Model 2# | 0.774 (0.682–0.867) | NA |
Model 3ǁ | 0.830 (0.743–0.917) | NA |
Pmodel 2 vs. 1¶ | 0.017 | NA |
Pmodel 3 vs. 1¶ | 0.007 | NA |
. | Validation set 1* . | Validation set 2† . |
---|---|---|
GW 10–13 | ||
Model 1‡ | 0.731 (0.634–0.827) | 0.742 (0.639–0.846) |
Model 2§ | 0.771 (0.693–0.850) | 0.907 (0.850–0.965) |
Model 3ǁ | 0.869 (0.800–0.937) | 0.972 (0.940–0.999) |
Pmodel 2 vs. 1¶ | 0.062 | 0.002 |
Pmodel 3 vs. 1¶ | 0.001 | <0.001 |
GW 16–19 | ||
Model 1‡ | 0.719 (0.649–0.789) | NA |
Model 2# | 0.774 (0.682–0.867) | NA |
Model 3ǁ | 0.830 (0.743–0.917) | NA |
Pmodel 2 vs. 1¶ | 0.017 | NA |
Pmodel 3 vs. 1¶ | 0.007 | NA |
Data are given as AUC (95% CI) unless otherwise indicated.
NA, not applicable.
Validation set 1 was a random sample of 42 GDM and 372 non-GDM women in the PETALS cohort.
Validation set 2 was a case-control study of 30 GDM cases and 60 non-GDM controls in the GLOW randomized controlled trial.
Model 1 included conventional risk factors: age, race/ethnicity, family history of diabetes, chronic hypertension, history of gestational diabetes, prepregnancy BMI, gestational age at blood collection, and serum glucose levels.
Model 2 included a 17-metabolite panel selected by LASSO regression at GW 10–13 (1,5-anhydroglucitol, 1-monoolein, 2,3-dihydroxybutanoic acid, 2-hydroxyglutaric acid, 5,6-dihydrouracil, alanine, α-aminoadipic acid, β-alanine, β-sitosterol, cellobiose, citramalic acid, citric acid, lactic acid, N-acetylputrescine, β-tocopherol, uric acid, and urea).
Model 3 included conventional risk factors in model 1 and metabolites in model 2.
P value was obtained by DeLong test.
Model 2 included a 13-metabolite panel selected by LASSO regression at GW 16–19 (1,5-anhydroglucitol, 2,3-dihydroxybutanoic acid, 2-aminobutyric acid, α-aminoadipic acid, arachidic acid, aspartic acid, citric acid, hydrocinnamic acid, lauric acid, oleic acid, quinic acid, uracil, and uridine).
Discussion
In this well-characterized prospective test and validation study including women with diverse racial/ethnic backgrounds, comparing women with GDM diagnosed by objective glucose thresholds in late pregnancy with their counterparts with euglycemia, we observed distinct metabolic profiles as early as GW 10–13. We report novel findings on the prospective positive associations of fasting serum indoles, pyrimidines, and amino alcohols and inverse associations of allantoic acid in early and mid-pregnancy with risk of GDM, together with other known pathways, including amino acids, purinones, and hexoses. By using machine-learning algorithms, we developed and validated predictive multimetabolite panels at GW 10–13 and 16–19 for GDM risk, with incremental predictability beyond conventional risk factors, including fasting serum glucose levels. Our findings suggest the potential value of metabolomic profiling for early prediction of GDM risk.
Our findings illustrate that previously unreported metabolic pathways may be implicated in the development of GDM. Our novel finding of a positive association of indole (the main metabolite produced from dietary tryptophan by gut microbiota) metabolism with risk of GDM was consistent with a previous cross-sectional study that reported higher fecal indole concentrations among women with GDM versus non-GDM women in late pregnancy (34). Data in animal models also suggest that prolonged exposure to indoles inhibits glucagon-like peptide-1 secretion, which further impairs insulin secretion (35). Interestingly, we also observed for the first time a suggestive trend of elevated levels of the polyamine cluster, especially the microbiota-derived N-acetylputrescine, at GW 10–13 among women with GDM (P = 0.04; FDR 0.14), which is consistent with previous observations of its positive associations with obesity and diabetes among nonpregnant individuals (36). Collectively, our data suggest that microbiota-derived indole and polyamine metabolites could affect host glucose homeostasis, and the role of interactions among the microbiome, metabolome, and host in the development of GDM warrants further investigation.
Pyrimidines, as building blocks of DNA and RNA, are vital elements involved in a wide range of biologic functions, including carbohydrate metabolism, and have been linked to risk of diabetes and diabetic complications among nonpregnant individuals (37). We detected elevated levels of the pyrimidine nucleotide pathway (5,6-dihydrouracil, thymine, uracil, and uridine) with uracil as the key metabolite at both GW 10–13 and 16–19, although the significant chemical enrichment persisted only at GW 16–19 after FDR adjustment. Uracil is phosphorylated from uridine, with thymidine and thymine as downstream products, the dysregulation of which has been implicated in hyperglycemia and diabetic endothelial dysfunction (38).
Levels of D-erythro-sphingosine, a derivative of ceramides, in the amino alcohol pathway were also higher at GW 16–19 among women who later developed GDM. Despite the lack of previous data among pregnant women, animal data have demonstrated that high-fat diet–induced ceramides and sphingosine are implicated in insulin resistance via stimulation of plasminogen activator inhibitor-1 (39), providing insights into the biologic plausibility of D-erythro-sphingosine upregulation in GDM development.
We also observed inverse associations of the amide pathway with allantoic acid as the key metabolite at GW 10–16 with GDM. Allantoic acid is hydrolyzed from allantoin, which is the more soluble final product of purine catabolism in nonprimates (vs. uric acid in humans). Notably, allantoic acid has been suggested as a potential biomarker for dietary intake of soybean plants, which in turn has been linked to a lower risk of GDM among Japanese women (40). Further investigation of metabolomic markers of dietary intake and their roles in the pathophysiology of GDM is warranted.
Consistent with previous findings, among both pregnant and nonpregnant individuals, our findings extended the literature by confirming several identified metabolic pathways and specific metabolites in association with GDM risk. Purinone metabolism was the only pathway (hypoxanthine, uric acid, and xanthine) at both GW 10–13 and 16–19 positively associated with GDM after FDR adjustment. As the end product of purinone metabolism, uric acid is synthesized by oxidation of hypoxanthine and xanthine via xanthine oxidoreductase, which can be a source of reactive oxygen species. Our observation is consistent with previous data linking hyperuricemia in early pregnancy to increased risk of GDM (41). Hexoses, a group of 6-carbon monosaccharides, including glucose, fructose, fucose, and levoglucosan, were elevated at GW 16–19 prior to GDM diagnosis. Because glucose is the preferred metabolic substrate during embryonic development, carbohydrate metabolism precedes amino acid and lipid metabolism in most circumstances. The elevated levels of hexoses in mid-pregnancy may indicate perturbations in carbohydrate metabolism but may also suggest dysregulation in amino acid or lipid metabolism. Indeed, we observed upregulation of a broad spectrum of amino acids, especially in mid-pregnancy, among women with GDM, suggesting that glucagon-regulated amino acid catabolism may be attenuated. In particular, we observed positive associations of glutamine (the preeminent gluconeogenic amino acid) and citrulline (80% derived in the intestine from glutamine) at GW 16–19 with GDM risk. Consistently, data among patients with type 2 diabetes have illustrated reduction in glutamine oxidation, likely resulting from competition with glucose and fatty acids as fuels, increased gluconeogenesis, and release of amino acids from tissues other than skeletal muscle (42).
While the univariate and multivariate analyses focused on associations of individual or clusters of metabolites with GDM risk to elucidate the underlying pathophysiology, machine learning–based prediction was aiming to identify significant markers from among all known metabolites to achieve maximum predictability. The multimetabolite panels at GW 10–13 and 16–19 had similar predictability for GDM risk, offering flexibility in the examination window to reduce potential clinic and participant burden. A few metabolites were commonly selected at the two gestational periods, including 1,5-anhydroglucitol, 2,3-dihydroxybutanoic acid, α-aminoadipic acid, and nucleotides and catabolism derivatives (i.e., uric acid, uracil, uridine, and urea). Consistently, previous studies among nonpregnant populations have shown that 1,5-anhydroglucitol is a validated marker of short-term glycemic control (43) and α-aminoadipic acid is a marker for diabetes up to 12 years before disease onset (44). Whether these biomarkers even in the preconception period could predict risk of GDM warrants further investigation. Notably, across the various methods, nucleotide metabolites were significantly associated with GDM risk but also selected as predictive markers, highlighting their essential pathophysiologic role in glucose metabolism and predictive value in GDM risk (45).
Strengths of our study include its methodologic rigor. In a clinical setting with universal GDM screening, we used a standardized clinical diagnosis of GDM based on the Carpenter-Coustan criteria in both the discovery and validation sets to minimize case-control misclassification and clinical heterogeneity and improve generalizability of our findings. We profiled untargeted metabolomics in early to mid-pregnancy, ensuring temporal precedence of metabolomic profiling to GDM diagnosis in late pregnancy. Furthermore, fasting serum samples guaranteed little variability in metabolomic measurement because of the fasting status. Importantly, we performed rigorous internal cross-validation and external validation in two different sets to minimize the impact of data overfitting and selection bias on predictive multimetabolite panels, which was lacking in most previous studies. Some potential limitations of our study merit discussion. Despite the significant prospective associations observed between metabolic pathways and specific metabolites and risk of GDM, we cannot determine whether these metabolites were causal factors for GDM or markers of a prediagnostic pathophysiologic state. Functional studies focusing on targeted pathways and metabolites are warranted to shed further light on the mechanisms underlying the development of GDM. Our sample size was relatively modest compared with metabolomic studies among nonpregnant populations. However, given the unique challenges in recruitment, retention, and participant burden (especially for fasting blood collection) within a relatively short, intense, and stressful time period of pregnancy, our study is among the largest prospective and longitudinal studies of fasting serum untargeted metabolomics from early to mid-pregnancy in relation to risk of GDM. Confirmation of our findings in other pregnant populations is warranted.
In conclusion, in this prospective test and validation study, we observed in early pregnancy a subclinical dysmetabolism among women who subsequently progressed to GDM in late pregnancy, compared with their euglycemic counterparts. At GW 10–13 and 16–19, we detected elevated levels of microbiota-derived indole metabolites, purinone and pyrimidine nucleotides, sphingosines, hexoses, and a broad range of amino acid pathways, in addition to lower levels of allantoic acid with the primarily exogenous source from soybean products, in association with risk of GDM. The developed and validated multimetabolite panels may inform early risk assessment of GDM and facilitate early prevention and treatment of GDM and its complications. Additional studies are warranted to investigate whether these metabolites may serve as early preventive or intervention targets. Future work focusing on identifying potentially modifiable upstream risk factors (e.g., dietary factors related to interactions among microbiome, metabolome, and host; physical activity; and other behavioral factors) for these key metabolomic markers for GDM is warranted.
This article contains supplementary material online at https://doi.org/10.2337/figshare.19609845.
O.F. and A.F. are co-senior authors.
See accompanying article, p. 1620.
Article Information
Acknowledgments. The authors thank all staff and participants in PETALS and the GLOW randomized controlled trial for their valuable contributions.
Funding. This research was supported by the National Institutes of Health (NIH) Building Interdisciplinary Research Careers in Women’s Health Program (grant K12HD052163) and National Institute of Diabetes and Digestive and Kidney Diseases (grant K01DK120807) (Y.Z.) and the NIH National Institute of Environmental Health Sciences (grant R01ES019196), National Institute of Child Health and Human Development (grant R01HD073572), and Office of Directors (grants UG3OD023289 and UH3OD023289) (A.F.). D.K.B. was supported by the NIH National Institute of Environmental Health Sciences (grants U2CES026561, U2CES026555, P30ES023515, and U2CES030859) and National Center for Advancing Translational Sciences (grant UL1TR001433).
The funders had no role in study design, data collection or analysis, decision to publish, or preparation of the manuscript.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. Y.Z., O.F., and A.F. contributed to the study concept and design. Y.Z., D.K.B., J.F., O.F., and A.F. contributed to data acquisition. All authors contributed to analysis and interpretation of data. Y.Z. drafted the manuscript. All authors contributed to the critical revision of the manuscript for important intellectual content. Y.Z. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.