Background: The Assessing the Burden of Diabetes by Type in Children, Adolescents and Young Adults (DiCAYA) Network aims to provide timely estimates of diabetes prevalence and incidence using large scale electronic health record (EHR) data. However, patients who self-select into health systems are likely to differ demographically and by health status than the overall source population of interest, which could limit the generalizability of surveillance estimates.

Methods: Using simulations, we evaluated the potential of established bias correction methods (inverse probability weighting, post stratification, raking and multilevel regression with post stratification (MRP)) to mitigate selection biases within EHR data. The EHR sample was assumed to have overrepresentation of younger, higher socioeconomic status (SES) and white individuals than the source. Simulations varied the selection processes (e.g. dependent on diabetes status) and misspecification in the adjustment methods. The performance of each method was assessed as the absolute difference between the estimated prevalence in the sample and true prevalence in the simulated source.

Results: When selection was dependent on demographics alone, the mean bias in the crude prevalence estimate was -0.7 percentage points (pp) . Each method performed well in removing this bias when demographics that affected selection were adequately accounted for. Biases remained when SES was misclassified in the adjustment process, with best performance seen for MRP (-0.5 pp) . When selection was dependent on diabetes status, the mean bias in the crude prevalence estimate was 1.3 pp and each method similarly worsened this bias (2.3 pp) .

Conclusions: These methods are easy to implement and can provide reliable surveillance estimates if factors that are associated with selection into the EHR can be accounted for. However, novel methods may be needed when selection is dependent on diabetes status or other unmeasured factors.


S.Conderino: None. M.Rosenman: None. V.W.Zhong: None. K.Reynolds: Research Support; Amgen Inc., Merck & Co., Inc., Novartis Pharmaceuticals Corporation. S.Park: None. L.H.Utidjian: None. J.Divers: None. R.Anthopolos: None. L.Thorpe: None. B.Cai: None. H.Shao: Board Member; BRAVO4HEALTH, LLC. T.C.Ong: None. T.L.Crume: None. B.S.Schwartz: None. H.Kirchner: None.


Centers for Disease Control and Prevention (1 U18DP006633-01-00)

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at