Decision making in medicine relies heavily on clinical studies, preferably randomized controlled clinical trials. Although the evidence obtained from such research is invaluable in guiding health care decisions, this source of information leaves many gaps. The results of a trial are directly applicable only to the population recruited and the protocol used. Few trials (if any) address all the characteristics of the patient population for whom the intervention is appropriate or address multiple interdependent conditions and treatments. Also, clinical trials seldom observe health outcomes over long periods (>5–10 years) and less frequently consider the long-term economic impacts of the interventions.
In the face of these problems, clinicians and policy makers have traditionally had to rely on their judgement. Unfortunately, this also has serious shortcomings, as demonstrated by wide variations in practice patterns, conflicts in guidelines, and high rates of inappropriate care. It is unrealistic to expect the human mind to be able to address the complexity, variability, and uncertainties of health and disease.
To address these important issues, decision makers are increasingly turning to computer modeling as a technology that can provide more informed answers to questions that have not been, or will not be, answered by clinical trials. In the context of this report, a “computer model” is a set of mathematical equations, with algorithms for combining these equations with computer software. There are many different types of models that are applicable to diabetes (1–10) and have been used to address a variety of clinical and economic questions (8–23). If properly constructed, validated, and applied, these models can be very powerful decision-making aids.
As medicine turns to modeling to make more informed decisions, it is imperative that those who develop this technology do so in a way that justifies trust and confidence in the results they obtain. For users to gain confidence that a modeling study, and the model itself, accurately represents the “real” world, modelers must convey the details of the model and its application and show that the model can calculate or predict outcomes from real clinical or observational studies.
To standardize the description and validation of diabetes models, the American Diabetes Association convened a workgroup of diabetes modelers to develop guidelines on these subjects. Our intention was to define criteria that, if followed, would build confidence for a model accurately performing its intended function.
This article does not discuss specific methods for building or using models or for performing economic evaluations. Many other groups have provided detailed recommendations on those topics (7,24–31). Here we focus on the steps that can be taken by model developers to ensure that others can reproduce their results and to build confidence that models are accurate, useful, and reliable tools.
Transparency
Developers of all models (including proprietary models) should provide a complete description of the model’s structure, inputs, equations, algorithms, assumptions, and data sources so that others can understand how the model was built and applied. These should be described in sufficient detail to enable other scientists to potentially reproduce the model and its results. If a previous version of the model has been published, any changes to the model should be described clearly and completely. If a model is too complex to be completely described in a journal publication, the authors should provide a mechanism (e.g., webpage) whereby the model description can be accessed. If a model is proprietary, reviewers should respect appropriate conditions of confidentiality and intellectual property. The wider the access to a model, and the more components of the model that are open to public review, the better.
Validation
The appropriate validation of a model depends on the applications for which it is to be used. No model is valid for all purposes and all populations. Validations provide greater assurance that predictions made by the model are correct. The developers of models should describe the level and extent to which their model has been validated for any particular application.
The first level of validation is internal testing and debugging. The second level, “internal validation,” ensures that the model reproduces results of the studies or datasets used to build the model. If a model, or parts of the model, is built on a dataset, then there may be opportunity for an additional level of validation called “data splitting.” Here the model, or components of the model, is built using part of a dataset with another part of the same dataset being reserved for validation. The dataset can be split using random methods or can be split temporally (30).
The highest level of validation is “external validation,” which refers to the ability of the model to accurately predict the results of studies that were not used to build the model (29). Especially impressive is an external validation for which the results of a trial were unpublished and unknown to the modeler at the time the model’s predictions were released, although opportunities to do this are relatively rare.
Depending on the structure of the model and the scope of the studies to which it will be applied, these validation techniques can be applied to the entire model or to parts of the model. In some cases, one study will provide internal validation for one part of a model and external validation for another part of the model.
Internal and external validations should apply, as precisely as possible, to the study population and the protocol of the study that generated the data for validation. Also, validations should report the absolute value of the predicted outcomes and, in the case of a trial, the absolute difference in outcomes between groups. Each validation should involve the complete spectrum of subjects in a study, to the greatest extent that can be known from the available data. Ideally, the modeler should obtain and use patient-specific data from the study. The next best situation is to use data reflecting the distribution and correlation of patient characteristics and other variables. In some cases, the developer of a model will only be able to calculate results for an “average subject” in the study.
Evaluation of the success of a validation must take into account the effects of sampling error, reflecting the sample size of the validation cohort. Confidence intervals (CIs) around the observed results in the validation cohort should be reported. Tabular or graphical reporting of validation analyses is preferred.
Models should be externally validated against as many clinical studies as possible, ideally against every large/major published study applicable to the patient populations, treatments, or outcomes that the model is intended to address. Multiple validations are needed because even very accurate models will fail on some studies, simply because of unanticipated features of the population, treatment, or design of the study. Also, any model can occasionally succeed simply by chance. A series of successes and near misses on a large number of studies is a good indicator of the fundamental accuracy of the model. Validating a model against a large number of studies also makes clear that modelers are not picking studies that they believe their model will be successful against or are selectively reporting results. Examples of good trials upon which to do external validations are the DCCT (Diabetes Control and Complications Trial), UKPDS (United Kingdom Prospective Diabetes Study), HOPE (Heart Outcomes Prevention Evaluation), DPP (Diabetes Prevention Program), 4S (Scandinavian Simvastatin Survival Study), HPS (Heart Protection Study), ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial).
If a claim is to be made that a model is a “diabetes model” in some general sense, rather than a model of a particular clinical study (without any claims to be useful for other populations or treatments), then when performing multiple validations the model should not be changed to fit each new study. Rather, the same model with the same parameters should be used for all validations.
If a validation against a particular study fails, then the modeler should attempt to determine why the failure occurred and consider revising the model to achieve a match. If this is accomplished, then the exercise should not be called a successful external validation but instead a recalibration.
When a model is recalibrated or redesigned to fit a clinical study, steps must be taken to ensure that the revised model is still valid for all the previous validation exercises against other clinical studies.
It is almost impossible for any model to be successful at every attempt at an external validation. Successive attempts at model validation will yield a series of successes and failures, with the failures hopefully leading to improved versions of the model. With new versions of the model, the proportion of successful external validations can be expected to increase.
Uncertainty
Assessment of uncertainty is an essential part of reporting the results from modeling studies. The methods chosen to obtain CIs and/or sensitivity analyses should be reported. Models are affected by the following five major types of uncertainty (26).
Ignorance.
Occasionally there will be little evidence upon which to set the values of a parameter. Ignorance about a parameter should be addressed by a sensitivity analysis.
Known variability.
When a parameter is known to take different values in different settings (for example, the cost of photocoagulation in different hospitals), a sensitivity analysis should also be carried out.
Statistical variability.
This arises when a parameter has been derived from analysis of a cohort. This can occur when the modeler has analyzed cohort data or when the modeler is using a previously published result. The cohort is a sample and, therefore, the parameter is subject to statistical (sampling) variability. In this situation, the following should be reported.
1) CIs for the parameter, if available.
2) If practical, CIs for model results that relied on this parameter.
3) If 2 is not practical, a sensitivity analysis of the effect of varying the parameters of most interest.
Monte-Carlo variability.
This is sometimes called “first-order uncertainty.” If random numbers are used to run the model, the results will vary from one run to the next. In general, this uncertainty should be reduced, as far as possible, by averaging over multiple simulation runs of identical inputs. The number of runs should be stated, with evidence that this is appropriate for the application and evidence that the random number generator is appropriate.
Uncertainty arising from model design.
Since models are a simplification of reality, there is a possibility that the design of a model (e.g., its structure, choice of data sources) is inaccurate for its intended purpose. The impact of these simplifications is potentially important but difficult to quantify (30). One way decision makers can address this is to seek results from more than one model. In addition, successful multiple internal and external validations of a model help build confidence in the result that is generated.
Special requirements of diabetes modeling
Several features of diabetes pose challenges for models. First, the complications of diabetes may take years or even decades to occur, so models must have long-term horizons and include mortality as a competing risk. Second, diabetes affects multiple organ systems, resulting in many types of complications. These complications not only share common risk factors but also are linked in that one complication may affect the likelihood of others. Therefore, models must include interdependence between complications.
Third, patients with diabetes typically receive a large number of different treatments simultaneously, and these affect a diverse range of outcomes (e.g., ACE inhibitors can prevent cardiovascular and renal disease). For this reason, diabetes models must include a wide range of complications and treatment effects.
Fourth, some complications, such as myocardial infarction, may be rapidly fatal, whereas others, such as blindness, greatly reduce a person’s quality of life but not necessarily life expectancy. Therefore, models should include both the quality and length of a person’s life. Fifth, some diabetes complications, such as blindness, impose small costs on payors but large indirect costs on patients and their families. Therefore, authors and users of models should select their perspective carefully and explicitly state it in their analysis.
Sixth, there can be a long delay between the onset of type 2 diabetes and clinical diagnosis. Models should be able to make this distinction. Finally, the diagnostic criteria for diabetes and related conditions have changed over time. Developers of models should therefore be specific about the criteria used to diagnosis and classify diabetes and other conditions in the model.
Summary
Computer modeling in medicine can be an important tool to help clinicians and policy makers make better decisions. Whereas this technology is very well established in other fields of science such as physics, chemistry, and the environmental sciences, and is applied throughout our daily lives, medicine has been relatively slow to adopt it.
One major reason for the reluctance to use computer models and believe in the results they produce is the perception that our understanding of physiology, disease processes, and medicine is still crude. However, clinicians and policy makers are likely underestimating the breadth and depth of our knowledge, and today’s models have become very sophisticated. Another often-heard criticism for not trusting the results generated by computer models is that predicting the future is pure guesswork. But computer models that have undergone extensive internal and external validation should be considered reliable and accurate and thus a very useful aid to decision making.
The guidelines that we have described in this report should be adopted by modelers, and they should provide readers with a better understanding of what they should expect in a model. Their implementation will undoubtedly advance the discipline and make it more credible by promoting transparency and rigor in validation.
APPENDIX
American Diabetes Association Consensus Panel members
Rito Bergemann, MD, PhD; Jonathan Brown, MPP, PhD; Wiley Chan, MD, Philip Clarke, PhD; David Eddy, MD, PhD; William Herman, MD, MPH; Andrew J. Palmer, BSC, MBBS; Stephan Roze, PhD; Richard Stevens, PhD; and Ping Zhang, MD.
References
A complete list of American Diabetes Association Consensus Panel members can be found in the appendix.
A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.