As described, several variables had considerable levels of missing values. We chose to impute missing values before making our prediction model. When applied to future patients with complete data, a model based on an imputed dataset should predict more accurately than one based on complete cases only. Relying on complete cases actually introduces more bias than imputing missing values.
Our goal was a prediction tool for counseling a patient at baseline. We avoided explicit assumptions regarding future patient behavior (e.g., adherence) or response (e.g., A1C changes), which are unknown at baseline. By incorporating specific dynamics in the prediction model, we would be assuming something about a patient without confidence. Rather, we considered all available patient data at baseline to make a prediction. A subsequent prediction that could incorporate more information would have superior accuracy (such as a model developed by Clarke et al.) (3), but we wanted a model at baseline, when this information was not yet known.
We avoided using stepwise variable selection. Although this method produces a more parsimonious prediction model, it is expected to be less accurate. Final models following stepwise variable selection have predictors with coefficients that are exaggerated in absolute value. We assumed our predictors were routinely available, noninvasive, and at minimal cost.
We regret not locating Yang and colleagues (4) during our literature review. However, we would have proceeded building our model and could not have compared it with the one of Yang and colleagues because we lack retrospective data for some of their predictors such as cancer and peripheral arterial disease. We encountered a similar issue with the model developed by Clarke et al. that required amputation and blindness as baseline predictors.
It is frustrating when a predictor variable has a counterintuitive association with the outcome. Nomograms make this relationship very explicit by presenting the effect graphically. One should keep in mind, however, the presence of collinearity. Moving a patient on a single axis and holding him fixed on all others is an artificial situation. Realistically, when a patient's predictor variable value increases (e.g., BMI), other predictors likely change as well (e.g., blood pressure, etc.). A patient may be receiving nomogram points from movement on one axis but losing points on other axes, potentially resulting in a net decrease, rather than increase, in total points. Although such collinearity can cloud interpretation of variables in a nomogram, it does not have a deleterious effect on the prediction. Nonetheless, removing variables with counterintuitive directions of effect might make the tool more attractive but might reduce predictive accuracy.
Ultimately, a prediction tool should predict more accurately than its alternatives. More tools (and comparisons of rival tools) are needed to understand what approach to patient counseling would provide the highest accuracy presently available. We are unaware of another tool that uses our predictors (or a subset) to predict our outcome and that can be used at baseline for an individual patient.
Please see ref. 2 for a list of the potential conflicts of interest relevant to this article.