Readmissions are an undesirable outcome for which hospitals are being held accountable. Despite a growing literature, there is little consensus on the optimal method of sampling hospitalizations for calculating readmission rate and modeling methods for analyzing risk factors.
We compared 2 sampling and 3 multivariable predictive modeling strategies in a retrospective cohort of adult patients with diabetes discharged from an academic medical center between 2004 and 2012. Diabetes was defined by ICD-9-CM code 250 or documentation of taking a diabetes-specific medication. The outcome for all models was all-cause readmission within 30 days of discharge (30d readmission). Models were built on 46 sociodemographic, clinical and administrative variables.
With sampling only the first discharge per patient (N=17,284), the 30d readmission rate was 10.2% vs. 20.3% for all discharges per patient (N=44,203). Model performance varied by sampling and modeling strategy. Logistic regression (LR) without generalized estimating equations (GEE) using all discharges had the highest overall performance.
Sampling only the first discharge per patient underestimates the readmission rate and is associated with misleading measures of model performance. LR without GEE may yield overly optimistic measures of model performance. Because commonly used approaches may produce misleading results, consensus is needed.
Performance of logistic regression models to predict all-cause 30d readmission.
Model and Sampling Strategy | ||||
Performance Measurement(95% CI) | First Discharges without GEE | All Discharges without GEE | All Discharges with GEE | All Discharges with weighted GEE |
Area Under ROC Curve (C-statistic) | 0.821 (0.806, 0.835) | 0.811 (0.803, 0.818) | 0.803 (0.796, 0.811) | 0.791 (0.783, 0.798) |
Correlation Coefficient | 0.366 (0.345, 0.386) | 0.450 (0.438, 0.462) | 0.437 (0.425, 0.449) | 0.415 (0.403, 0.427) |
Coefficient of Determination, % | 14.96 (14.06, 15.86) | 21.00 (20.39, 21.61) | 18.40 (17.85, 18.96) | 15.83 (15.32, 16.34) |
Brier Score | 0.080 (0.075, 0.084) | 0.129 (0.126, 0.132) | 0.132 (0.128, 0.135) | 0.135 (0.132, 0.139) |
GEE=generalized estimating equations with an exchangeable correlation structure |
Model and Sampling Strategy | ||||
Performance Measurement(95% CI) | First Discharges without GEE | All Discharges without GEE | All Discharges with GEE | All Discharges with weighted GEE |
Area Under ROC Curve (C-statistic) | 0.821 (0.806, 0.835) | 0.811 (0.803, 0.818) | 0.803 (0.796, 0.811) | 0.791 (0.783, 0.798) |
Correlation Coefficient | 0.366 (0.345, 0.386) | 0.450 (0.438, 0.462) | 0.437 (0.425, 0.449) | 0.415 (0.403, 0.427) |
Coefficient of Determination, % | 14.96 (14.06, 15.86) | 21.00 (20.39, 21.61) | 18.40 (17.85, 18.96) | 15.83 (15.32, 16.34) |
Brier Score | 0.080 (0.075, 0.084) | 0.129 (0.126, 0.132) | 0.132 (0.128, 0.135) | 0.135 (0.132, 0.139) |
GEE=generalized estimating equations with an exchangeable correlation structure |
H. Zhao: None. S. Tanner: None. S. Golden: None. S.G. Fisher: None. D.J. Rubin: Research Support; Self; AstraZeneca, Boehringer Ingelheim Pharmaceuticals, Inc..