Electronic health records (EHR) are an increasingly common data source for studying disease risks, having large sample sizes and frequently sampled clinical predictors/outcomes. In contrast to population-based cohorts, EHRs are not population-representative random samples, as subjects tend to be sicker, with higher healthcare utilization. We compared risk predictions of the same outcome (time to incident CVD) in the same population of interest (DM2 patients >50 years) using 2 data sources: waves 1993-2014 of the nationally representative Health and Retirement Survey (HRS) and 2009-2017 New Langone Health Epic data. In each dataset, we estimated the incidence rate (IR) of subsequent CVD in DM2 patients and adjusted hazard ratios (HR) of CVD risk factors using the Cox-regression model. The HRS sample included 2,739, of whom 423 developed CVD. The NYU-Epic sample included 9,179, of whom 1,264 developed CVD. The estimated subsequent CVD IRs were 28.5 (95% CI: 25.9-31.3) (HRS) and 67.1 (95% CI: 63.5-90.9) (NYU-Epic) per 1,000 person-years since DM2 onset. HR estimates were comparable between the datasets for most demographic covariates/biomarkers (Table). Our findings showed that the EHR sample was enriched with a higher proportion of sicker DM2 patients. Our study supports EHR use in DM2 research: EHR-based HR estimates of CVD risk factors have promise to be population-generalizable despite higher CVD IRs in the EHR samples.
J. Zhong: None. C. Blaum: None. J. Yu: None. R. Ferris: None. J. Ha: None. Y. Xia: None. M. Kabeto: None. C. Cigolle: None.
National Institute on Aging