Coffee may protect against multiple chronic diseases, particularly type 2 diabetes, but the mechanisms remain unclear.
Leveraging dietary and metabolomic data in two large cohorts of women (the Nurses’ Health Study [NHS] and NHSII), we identified and validated plasma metabolites associated with coffee intake in 1,595 women. We then evaluated the prospective association of coffee-related metabolites with diabetes risk and the added predictivity of these metabolites for diabetes in two nested case-control studies (n = 457 case and 1,371 control subjects).
Of 461 metabolites, 34 were identified and validated to be associated with total coffee intake, including 13 positive associations (primarily trigonelline, polyphenol metabolites, and caffeine metabolites) and 21 inverse associations (primarily triacylglycerols [TAGs] and diacylglycerols [DAGs]). These associations were generally consistent for caffeinated and decaffeinated coffee, except for caffeine and its metabolites that were only associated with caffeinated coffee intake. The three cholesteryl esters positively associated with coffee intake showed inverse associations with diabetes risk, whereas the 12 metabolites negatively associated with coffee (5 DAGs and 7 TAGs) showed positive associations with diabetes. Adding the 15 diabetes-associated metabolites to a classical risk factor–based prediction model increased the C-statistic from 0.79 (95% CI 0.76, 0.83) to 0.83 (95% CI 0.80, 0.86) (P < 0.001). Similar improvement was observed in the validation set.
Coffee consumption is associated with widespread metabolic changes, among which lipid metabolites may be critical for the antidiabetes benefit of coffee. Coffee-related metabolites might help improve prediction of diabetes, but further validation studies are needed.
Coffee is one of the most popular beverages worldwide. Accumulating evidence indicates that long-term coffee intake is associated with lower risks of various chronic diseases, including type 2 diabetes, cardiovascular disease, and some types of cancer (1). Therefore, the 2015–2020 U.S. Dietary Guidelines recommend moderate coffee consumption as part of a healthy dietary pattern (2).
Despite these data, the biological mechanisms underlying the benefit of coffee remain unclear. Coffee contains >1,000 bioactive compounds. We and others have shown that long-term coffee intake may reduce insulin resistance and inflammation and modulate hormonal levels that are key to cardiometabolic diseases and cancer (3,4). Recently, high-throughput metabolomic profiling has shown great promise in improving our understanding about the biological effects of nutritional factors and may help identify novel biomarkers for risk prediction and targets for intervention (5).
Several observational studies have examined the metabolomic profiles associated with coffee consumption (6–9). Although some metabolites have been commonly identified, such as trigonelline, caffeine metabolites, and quinate, findings for other metabolites have been inconsistent. This inconsistency may be partly due to limited coverage of the metabolomic platforms, small sample size, and insufficient control for confounding by lifestyle and other dietary factors. Moreover, few studies have investigated whether the identified metabolites influence chronic disease risk.
Therefore, we performed a systematic analysis of plasma metabolomics to 1) identify and validate metabolites associated with intake of total, caffeinated, and decaffeinated coffee and 2) prospectively evaluate the association of coffee-related metabolites with diabetes risk and the added predictivity of these metabolites for diabetes. We drew data from two large U.S. cohorts, the Nurses’ Health Study (NHS) and NHSII.
Research Design and Methods
Study Design and Population
The NHS enrolled 121,700 female nurses aged 30–55 years in 1976, and the NHSII enrolled 116,429 female nurses aged 25–42 years in 1989. Details about the two cohorts have previously been published (10). In brief, mailed questionnaires were administered biennially to assess lifestyle and medical history, with the follow-up rates exceeding 90% for each cycle in both cohorts. Blood samples were collected from 32,826 women in NHS in 1989–1990 and 29,611 women in NHSII in 1996–1999. Women who provided blood samples had dietary and lifestyle profiles similar to those of women who did not (11). Samples were returned by overnight mail with an ice pack, and >95% of them arrived within 24 h of blood draw. Upon arrival, samples were immediately centrifuged and aliquoted into cryotubes as plasma, buffy coat, and red blood cells and stored in liquid nitrogen freezers.
The current study was designed in two phases. First, we performed a cross-sectional analysis to identify and validate coffee-related metabolites. We drew participants with available plasma metabolomic data from previous substudies in the cohorts. After exclusion of participants who had a history of diabetes, CVD, or cancer prior to blood draw or had missing data on coffee consumption or metabolites, a total of 949 women with 427 named metabolites and 646 women with 413 metabolites were included in the discovery and validation sets, respectively. Details about participant selection are provided in the Supplementary Materials and Supplementary Fig. 1.
Then we conducted a nested, 1:3 matched case-control study (207 case and 621 control subjects) to prospectively evaluate the metabolites associated with coffee intake in relation to risk of type 2 diabetes. We developed a diabetes prediction model using the identified metabolites and validated its performance in another case-control study (250 case and 750 control subjects). Details about the case and control subject selection are provided in the Supplementary Materials. Briefly, among participants with available metabolomic data, we identified incident cases of diabetes that occurred after blood draw until the end of follow-up (NHS, 30 June 2012, and NHSII, 30 June 2013). Using risk set sampling, we randomly selected three control subjects for each diabetes case subject among participants who were alive and free of diabetes at the time of diagnosis of the case subjects, matched on age at blood draw ±1 year, study cohort (NHS, NHSII), and fasting status (Supplementary Fig. 1).
The study protocol was approved by the Institutional Review Board of the Brigham and Women’s Hospital and the Human Subjects Committee Review Board of the Harvard T.H. Chan School of Public Health.
A validated 131-item food-frequency questionnaire (FFQ) was administered every 4 years for collection of updated dietary data. Participants were asked how often (ranging from “never or less than once per month” to “six or more times per day”), on average, they consumed a standard portion size of each food item during the previous year. For coffee, the questionnaire inquired about the frequency of consumption for caffeinated and decaffeinated coffee, separately, in an 8-oz (237-mL) cup. We calculated total coffee consumption as the sum of these two types of coffee. The validity and reproducibility of FFQs have previously been reported (12), with a high correlation between coffee consumption assessed by FFQs and four 1-week diet records (r = 0.78). In the current study, to capture regular dietary intake that is most relevant to metabolite levels and reduce within-person variability, we calculated the average of intake from the two FFQs administered most proximately to the time of blood draw (i.e., 1986 and 1990 for the NHS and 1995 and 1999 for NHSII). The distribution of coffee intake in each FFQ cycle has been shown to be highly stable in our cohorts (13).
All the metabolomics data used in the study were generated at the Broad Institute using liquid chromatography–tandem mass spectrometry (LC-MS) as previously described (14,15). In brief, high-resolution, accurate mass profiling data were acquired using LC-MS systems comprised of Nexera X2 UHPLC systems (Shimadzu Corp., Marlborough, MA) coupled to a Q Exactive or Exactive Plus Orbitrap Mass Spectrometer (Thermo Fisher Scientific, Waltham, MA). Hydrophilic interaction liquid chromatography with positive-ion mode mass spectrometry detection was used to separate polar metabolites, while C18 chromatography with negative-ion mode detection and C8 chromatography with positive-ion mode detection were used to profile metabolites of intermediate polarity and lipids, respectively (see more details in Supplementary Materials). Raw data were processed using TraceFinder 3.3 software (Thermo Fisher Scientific) and Progenesis QI (Nonlinear Dynamics, Newcastle upon Tyne, U.K.). For each of the methods, metabolite identities were confirmed using authentic reference standards or reference samples. For assessment of temporal drift of the metabolomics profiling, pooled reference plasma samples were interspersed and analyzed at intervals of ∼20 participant samples. In each substudy, we removed unnamed metabolites, metabolites with no between-person variations, and metabolites with a mean coefficient of variation of >25% or an intraclass correlation coefficient of <0.40 among drift reference samples. Metabolites with <10% missingness were included, and missing data for each metabolite were imputed using half of the minimum measured value. Finally, a total of 427 metabolites were included in the analysis (Supplementary Table 1).
For accounting for batch effect and improving normality, metabolite measurements were natural log transformed and standardized using z scores (SDs from the mean) within each substudy. In the phase I analysis, we used linear regression to examine the association of coffee intake with each of the metabolites after adjusting for potential confounders. (See details in Table 2 footnote. Details of covariate assessment are described in Supplementary Materials.) The results were presented as percentage difference in metabolite levels per 1-cup increment in coffee intake, using the following exponential function: [exp (β-coefficient) − 1] × 100% (3). We considered total coffee intake as our primary exposure and ran the analysis in the discovery and validation sets separately. In the secondary analysis, we examined caffeinated and decaffeinated coffee separately in the pooled samples. Sensitivity analyses were performed among never smokers to minimize any confounding by smoking and among participants who were selected as control subjects in the source case-control studies.
In the phase II analysis, we calculated odds ratios (ORs) and 95% CIs for the association between coffee-related metabolites (per 1-SD increase) and diabetes risk using conditional logistic regression. We adjusted for family history of diabetes in addition to other covariates included in the phase I analysis. We also performed principal component analysis (PCA) of coffee-related metabolites and examined the top two component scores in relation to diabetes risk.
To assess the predictivity of coffee-related metabolites for diabetes, we calculated the C-statistics using receiver operating characteristic analysis. We considered three models: model 1, based on established risk factors, including age, BMI, physical activity, smoking, and family history of diabetes; model 2, based on the coffee-related metabolites that were associated with diabetes risk in the current study; and model 3, based on a combination of established factors and metabolites. A nonparametric method was used to compare the C-statistics between these models (16). Furthermore, we calculated the integrated discrimination improvement and the net reclassification improvement to evaluate the added predictive ability of metabolites (17).
All analyses were performed using R 3.2.5 (R Foundation for Statistical Computing, Vienna, Austria) and SAS, version 9.4 (SAS Institute, Cary, NC). All statistical tests were two sided. We calculated the false discovery rate (FDR) to correct for multiple testing at α = 0.05 significance level (18).
Identification of Coffee-Related Metabolites
Table 1 shows basic characteristics of study participants according to frequency of total coffee consumption. Participants in the discovery (n = 949) and validation (n = 646) subsets had similar characteristics. Those who drank more coffee were more likely to be current smokers.
In the discovery phase, 73 of the 427 metabolites were associated with total coffee intake after multiple testing correction (FDR P value <0.05). Of the 73 metabolites, 10 were unavailable in the validation stage. Among the remaining 63 metabolites, 34 were validated to be associated with coffee intake (Table 2), of which 13 were positively associated with total coffee intake, including trigonelline, caffeine and its metabolites (5-acetylamino-6-amino-3-methyluracil [AAMU], 1,7-dimethyluric acid, and 7-methylxanthine), polyphenol metabolites (cinnamoylglycine and 4-hydroxyhippuric acid), phenyllactic acid, cytosine, l-carnitine, and three cholesteryl esters (CEs). The other 21 metabolites were inversely associated with total coffee intake, mostly triacylglycerols (TAGs) and diacylglycerols (DAGs). Correlation analysis of the 34 metabolites showed that most lipid species were positively correlated and metabolites of the same class or sharing fatty acid chains tended to cluster together (Supplementary Fig. 2).
Most of the 34 identified metabolites demonstrated consistent associations with caffeinated and decaffeinated coffee, except for caffeine and its metabolites, which were associated with caffeinated coffee only (Table 2).
Association of Coffee-Related Metabolites With Diabetes Risk
Basic characteristics of 457 diabetes case subjects and 1,371 matched control subjects (in the training and validation sets, respectively) at blood draw are shown in Supplementary Table 4. The multivariate-adjusted OR of diabetes per 1 cup/day increase in total coffee consumption was 0.95 (95% CI 0.82, 1.09) in the training set, 0.93 (95% CI 0.83, 1.03) in the validation set, and 0.95 (95% CI 0.89, 1.02) in the combined analysis.
Among the 34 coffee-related metabolites, 30 were available for the diabetes analysis. Cinnamoylglycine, phenyllactic acid, docosahexaenoic acid, and C52:5 TAG were unavailable due to high missingness (>50%). As shown in Table 3, after adjustment for potential confounders and correction for multiple testing, three metabolites positively associated with coffee intake showed an inverse association with diabetes risk (C18:1 CE, C20:4 CE, and C18:2 CE), with the adjusted ORs ranging from 0.59 to 0.61 per 1-SD increment. In contrast, 12 metabolites negatively associated with coffee showed a positive association with diabetes, including 5 DAGs and 7 TAGs, with the adjusted ORs ranging from 1.35 to 2.06 per 1-SD increment. Of the 15 metabolites associated with diabetes, 14 were available in the validation case-control set and showed consistent associations with diabetes (Supplementary Table 5).
The PCA analysis of the 30 coffee-related metabolites showed that the first three components explained 44.9%, 15.1%, and 10.3% of the total variation, respectively (Supplementary Figs. 3–5). The first principal component represented a mix of DAGs and TAGs and was positively associated with diabetes risk (OR 1.15 per 1-unit increase in the component score [95% CI 1.08, 1.22]), the second principal component mainly represented CEs and was inversely associated with diabetes risk (OR 0.80 [95% CI 0.71, 0.89]), and the third principal component represented caffeine and its metabolites and was not associated with diabetes (OR 0.96 [95% CI 0.86, 1.08]) (Table 3).
For the diabetes prediction analysis (Fig. 1), we obtained similar C-statistics for the model using the 15 diabetes-associated metabolites and that using the classical risk factors. Adding metabolites to the classical risk factor model increased the C-statistic from 0.79 (95% CI 0.76, 0.83) to 0.83 (95% CI 0.80, 0.86) (P < 0.001) in the training set and from 0.75 (95% CI 0.72, 0.78) to 0.78 (95% CI 0.75, 0.82) (P < 0.001) in the validation set. The C-statistics were essentially unchanged after addition of coffee consumption to the models (data not shown). Among individual metabolites, C50:2 TAG and C34:1 DAG showed the highest C-statistics in the training set (both 0.74) and validation set (0.68 and 0.69, respectively) (Supplementary Table 6).
Compared with the models with only classical risk factors, statistically significant (P < 0.0001) higher net reclassification improvement was observed for the model combining classical factors and metabolites (training set, 62.2% [95% CI 47.2, 77.1], and validation set, 47.7% [95% CI 33.8, 61.7]) (Supplementary Table 7). Similarly, we found statistically significant improvement for the integrated discrimination improvement (P < 0.0001) for the model combining classical factors and metabolites.
Leveraging the integrated dietary and metabolomic data in two large cohorts of women, we identified and validated 34 plasma metabolites associated with coffee intake. These metabolites can be divided into two groups: internal exposure markers of coffee intake (trigonelline, polyphenol metabolites, and caffeine and its metabolites) and metabolomic response to long-term coffee consumption such as lipid metabolites. Linking these metabolites to diabetes risk, we found that three CEs positively associated with coffee intake were associated with lower risk of diabetes, whereas 12 lipid metabolites (DAGs and TAGs) negatively associated with coffee were associated with higher risk of diabetes. Moreover, the receiver operating characteristic analysis indicated that coffee-related metabolites might be useful to improve the prediction of diabetes beyond established risk factors. These findings provide new insights into the health benefit of coffee and suggest the potential of coffee-related metabolites for improved diabetes prediction.
A meta-analysis of 18 prospective cohort studies showed 7% lower risk of diabetes per 1 cup of coffee consumed per day (19). In addition, our previous findings in the NHS and NHSII support a beneficial role of both caffeinated and decaffeinated coffee in diabetes (20,21). In the current study, we found a similar inverse association between coffee intake and diabetes risk, although the 95% CIs contained one, likely due to the modest sample size. Also, we found that higher concentrations of trigonelline, an established marker of coffee exposure, were associated with lower risk of diabetes (OR per 1 SD 0.88 [95% CI 0.77, 0.99]) in the combined training and validation sets.
Coffee is known to impact lipid metabolism, but the biomarkers and specific pathways remain poorly understood. In our study, most of the measured metabolites were lipids, including CEs, DAGs, TAGs, phosphocholines, phosphatidylethanolamines, lysophosphatidylcholines, and lysophoshatidylethanolamines (LPEs), thus allowing us to identify specific lipid metabolites associated with long-term coffee intake. We observed that coffee intake was positively associated with three CEs (C20:4 CE, C18:1 CE, and C18:2 CE) and inversely associated with 21 lipid species, primarily DAGs and TAGs. In support of our findings, several observational studies have reported an inverse association between coffee intake and TAG concentrations (22,23). Experimental evidence indicated that coffee polyphenols (e.g., chlorogenic acid) significantly lowered plasma free fatty acid, triglyceride, and total cholesterol concentrations, partly by inhibition of cholesterol synthesis and stimulation of fatty acid β-oxidation activity in the liver (24,25). This may explain our findings of the inverse associations of coffee intake with DAGs and TAGs.
High plasma TAG concentrations and low HDL cholesterol have been linked to increased diabetes risk, and dyslipidemia is likely caused by increased free fatty acid flux secondary to insulin resistance (26). We showed that DAGs and TAGs negatively associated with coffee intake were associated with higher diabetes risk. Consistent with our findings, two prospective cohort studies found that adjustment for TAGs attenuated the association between coffee intake and diabetes risk (22,27). On the other hand, CEs positively associated with coffee showed an inverse association with diabetes risk. Similar findings have been reported in other studies (28). CEs are mainly located in plasma lipoproteins, particularly HDL, which delivers excess cholesterol from peripheral tissues to the liver for excretion (29). It is possible that coffee intake may reduce diabetes risk by elevating CE and HDL levels. Indeed, a clinical trial showed a significant increase in serum HDL cholesterol after consumption of 4–8 cups of coffee/day for 3 months (30). Alternatively, our observed associations between coffee and lipid metabolites may represent noncausal effects of insulin resistance, which is a direct cause of diabetes.
Consistent with previous studies (7,31), we found a strong association of total and caffeinated coffee intake with caffeine and three caffeine metabolites, including AAMU, 1,7-dimethyluric acid, and 7-methylxanthine. In a recent study in the Women’s Health Initiative, these four metabolites were all associated with lower inflammatory potential of the diet, in line with the anti-inflammatory effect of coffee (18). The main route of caffeine metabolism is carried out by CYP1A2 and CYP2A6 in the liver, through N-3-demethylation to paraxanthine before catalyzation to AAMU, 1,7-dimethyluric acid, and 7-methylxanthine (32). Regarding the role of caffeine in diabetes development, while short-term human studies showed that caffeinated coffee intake might induce acute reduction in insulin sensitivity (33,34), a meta-analysis of 28 prospective studies found that both caffeinated and decaffeinated coffee consumption was associated with a lower risk of diabetes (35). Our previous study also found that both caffeinated and decaffeinated coffee intake were associated with favorable profiles of various biomarkers in key metabolic and inflammatory pathways (3). These results are consistent with our current observation for a null association of caffeine and caffeine metabolites with diabetes risk, suggesting that components in coffee other than caffeine may drive the beneficial effect on diabetes.
Coffee is a rich source of polyphenols. We identified two coffee-related metabolites, cinnamoylglycine and 4-hydroxyhippuric acid, that are produced in microbial metabolism of polyphenols. Human gut contains a diversity of microbial populations. Specific bacterial enzymes can transform polyphenols into smaller metabolites through deglycosylation, dehydroxylation, and demethylation before these metabolites undergo further metabolism in systemic circulation (36). Cinnamoylglycine is the glycine conjugate of cinnamic acid, which is a gut microbial metabolite of polyphenols (37); 4-hydroxyhippuric acid, another microbial product derived from polyphenol metabolism, is produced by hepatic conjugation of 4-hydroxybenzoic acid with glycine (38). In line with our study, these metabolites have previously been associated with higher consumption of polyphenol-rich foods including coffee (39). To our knowledge, no prior studies have assessed these polyphenol metabolites in relation to diabetes risk. In our nested case-control studies, only 4-hydroxyhippuric acid was available, and its association with diabetes was null. Further studies are needed to further examine the relationship between polyphenol metabolites and risk of diabetes.
Our study has several strengths, including the large sample size, repeated assessment of caffeinated and decaffeinated coffee intake prior to blood draw, collection of detailed covariate data for confounding control, and a multimetabolomics approach for analysis of a wide range of biochemical compounds. Moreover, we performed internal validation for the coffee metabolites and their associations with diabetes risk, using samples from the same source cohorts and the same metabolomic platforms in a single laboratory. Our study also has several limitations. First, there may be measurement errors associated with dietary assessment by FFQs. However, the FFQ used in our cohorts has been validated for assessment of long-term coffee intake. Second, we did not have information on the coffee bean type, roast and preparation methods, or the amount of sugar/cream added into coffee, which may influence metabolic response to coffee consumption. For example, compared with unfiltered coffee, filtered coffee contained much less lipid content. During the follow-up period of our cohorts, most of the coffee consumed in the U.S. was filtered coffee. Thus, whether our findings are applicable to unfiltered coffee needs to be examined in future studies. Third, our study is observational and unable to establish causality. However, we performed rigorous multivariable adjustment to minimize the influence of residual confounding. Also, our observations are supported by biological evidence. Fourth, metabolomic profiling was conducted only once. However, our pilot study of repeated assessments over 1–2 years showed that levels of most metabolites were highly stable over time (intraclass correlation coefficient or Spearman correlation >0.65) (40). Fifth, the identified lipid species are interrelated with each other, making it difficult to disentangle their independent associations with coffee intake and diabetes risk. Finally, participants in our study were all women and predominately Whites, limiting the generalizability of our findings. Further confirmation in men and other racial/ethnic groups is needed.
In summary, we identified a panel of 34 plasma metabolites associated with coffee intake. We also provide evidence that coffee may reduce diabetes risk by modulating lipid metabolism and that coffee-related metabolites may improve prediction of diabetes risk beyond classical risk factors. Future prospective and interventional studies are needed to confirm our findings.
This article contains supplementary material online at https://doi.org/10.2337/figshare.12675611.
Acknowledgments. The authors thank the participants and staff of the NHS and NHSII for their valuable contributions.
Funding. This work was supported by the American Cancer Society Mentored Research Scholar Grant (MRSG-17-220-01-NEC [to M.S.]), by U.S. National Institutes of Health grants (UM1 CA186107 to M.J. Stampfer; R01 CA49449 to S.E. Hankinson; U01 CA176726 and R01 CA050385 to W.C.W. and A.H.E.; P01 CA087969 and R01 CA163451 to S.S. Tworoger; R01 AR057327 to E.W. Karlson; R01 NS045893 and R01 NS089619 to A. Ascherio; P01 CA087969 to R.M. Tamimi; K24 DK098311, R01 CA137178, R01 CA202704, and R01 CA176726 to A.T.C.; K99 CA215314 and R00 CA215314 to M.S.; and R01 DK112940 to F.B.H.), by the Department of Defense (W81XWH-12-1-0561), and by grants from National Natural Science Foundation of China (81973127 to D.H.) and Natural Science Foundation of Jiangsu Province (BK20190083 to D.H.). A.T.C. is a Stuart and Suzanne Steele MGH Research Scholar.
Grants to individuals who are not authors of this work contributed to the establishment of cohorts but not directly to the current analysis. The authors assume full responsibility for analyses and interpretation of these data.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. D.H. performed statistical analysis and drafted the manuscript. O.A.Z., X.H., M.G.-F., X.J., J.L., L.L., A.H.E., C.B.C., A.T.C., Z.H., H.S., K.M.W., L.A.M., Q.S., F.B.H., W.C.W., and E.L.G. contributed to the acquisition of data, interpretation of the results, and revision of the manuscript. M.S. was responsible for study design. D.H. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.