Decision making in medicine relies heavily on clinical studies, preferably randomized controlled clinical trials. Although the evidence obtained from such research is invaluable in guiding health care decisions, this source of information leaves many gaps. The results of a trial are directly applicable only to the population recruited and the protocol used. Few trials (if any) address all the characteristics of the patient population for whom the intervention is appropriate or address multiple interdependent conditions and treatments. Also, clinical trials seldom observe health outcomes over long periods (>5–10 years) and less frequently consider the long-term economic impacts of the interventions.

In the face of these problems, clinicians and policy makers have traditionally had to rely on their judgement. Unfortunately, this also has serious shortcomings, as demonstrated by wide variations in practice patterns, conflicts in guidelines, and high rates of inappropriate care. It is unrealistic to expect the human mind to be able to address the complexity, variability, and uncertainties of health and disease.

To address these important issues, decision makers are increasingly turning to computer modeling as a technology that can provide more informed answers to questions that have not been, or will not be, answered by clinical trials. In the context of this report, a “computer model” is a set of mathematical equations, with algorithms for combining these equations with computer software. There are many different types of models that are applicable to diabetes (110) and have been used to address a variety of clinical and economic questions (823). If properly constructed, validated, and applied, these models can be very powerful decision-making aids.

As medicine turns to modeling to make more informed decisions, it is imperative that those who develop this technology do so in a way that justifies trust and confidence in the results they obtain. For users to gain confidence that a modeling study, and the model itself, accurately represents the “real” world, modelers must convey the details of the model and its application and show that the model can calculate or predict outcomes from real clinical or observational studies.

To standardize the description and validation of diabetes models, the American Diabetes Association convened a workgroup of diabetes modelers to develop guidelines on these subjects. Our intention was to define criteria that, if followed, would build confidence for a model accurately performing its intended function.

This article does not discuss specific methods for building or using models or for performing economic evaluations. Many other groups have provided detailed recommendations on those topics (7,2431). Here we focus on the steps that can be taken by model developers to ensure that others can reproduce their results and to build confidence that models are accurate, useful, and reliable tools.

Developers of all models (including proprietary models) should provide a complete description of the model’s structure, inputs, equations, algorithms, assumptions, and data sources so that others can understand how the model was built and applied. These should be described in sufficient detail to enable other scientists to potentially reproduce the model and its results. If a previous version of the model has been published, any changes to the model should be described clearly and completely. If a model is too complex to be completely described in a journal publication, the authors should provide a mechanism (e.g., webpage) whereby the model description can be accessed. If a model is proprietary, reviewers should respect appropriate conditions of confidentiality and intellectual property. The wider the access to a model, and the more components of the model that are open to public review, the better.

The appropriate validation of a model depends on the applications for which it is to be used. No model is valid for all purposes and all populations. Validations provide greater assurance that predictions made by the model are correct. The developers of models should describe the level and extent to which their model has been validated for any particular application.

The first level of validation is internal testing and debugging. The second level, “internal validation,” ensures that the model reproduces results of the studies or datasets used to build the model. If a model, or parts of the model, is built on a dataset, then there may be opportunity for an additional level of validation called “data splitting.” Here the model, or components of the model, is built using part of a dataset with another part of the same dataset being reserved for validation. The dataset can be split using random methods or can be split temporally (30).

The highest level of validation is “external validation,” which refers to the ability of the model to accurately predict the results of studies that were not used to build the model (29). Especially impressive is an external validation for which the results of a trial were unpublished and unknown to the modeler at the time the model’s predictions were released, although opportunities to do this are relatively rare.

Depending on the structure of the model and the scope of the studies to which it will be applied, these validation techniques can be applied to the entire model or to parts of the model. In some cases, one study will provide internal validation for one part of a model and external validation for another part of the model.

Internal and external validations should apply, as precisely as possible, to the study population and the protocol of the study that generated the data for validation. Also, validations should report the absolute value of the predicted outcomes and, in the case of a trial, the absolute difference in outcomes between groups. Each validation should involve the complete spectrum of subjects in a study, to the greatest extent that can be known from the available data. Ideally, the modeler should obtain and use patient-specific data from the study. The next best situation is to use data reflecting the distribution and correlation of patient characteristics and other variables. In some cases, the developer of a model will only be able to calculate results for an “average subject” in the study.

Evaluation of the success of a validation must take into account the effects of sampling error, reflecting the sample size of the validation cohort. Confidence intervals (CIs) around the observed results in the validation cohort should be reported. Tabular or graphical reporting of validation analyses is preferred.

Models should be externally validated against as many clinical studies as possible, ideally against every large/major published study applicable to the patient populations, treatments, or outcomes that the model is intended to address. Multiple validations are needed because even very accurate models will fail on some studies, simply because of unanticipated features of the population, treatment, or design of the study. Also, any model can occasionally succeed simply by chance. A series of successes and near misses on a large number of studies is a good indicator of the fundamental accuracy of the model. Validating a model against a large number of studies also makes clear that modelers are not picking studies that they believe their model will be successful against or are selectively reporting results. Examples of good trials upon which to do external validations are the DCCT (Diabetes Control and Complications Trial), UKPDS (United Kingdom Prospective Diabetes Study), HOPE (Heart Outcomes Prevention Evaluation), DPP (Diabetes Prevention Program), 4S (Scandinavian Simvastatin Survival Study), HPS (Heart Protection Study), ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial).

If a claim is to be made that a model is a “diabetes model” in some general sense, rather than a model of a particular clinical study (without any claims to be useful for other populations or treatments), then when performing multiple validations the model should not be changed to fit each new study. Rather, the same model with the same parameters should be used for all validations.

If a validation against a particular study fails, then the modeler should attempt to determine why the failure occurred and consider revising the model to achieve a match. If this is accomplished, then the exercise should not be called a successful external validation but instead a recalibration.

When a model is recalibrated or redesigned to fit a clinical study, steps must be taken to ensure that the revised model is still valid for all the previous validation exercises against other clinical studies.

It is almost impossible for any model to be successful at every attempt at an external validation. Successive attempts at model validation will yield a series of successes and failures, with the failures hopefully leading to improved versions of the model. With new versions of the model, the proportion of successful external validations can be expected to increase.

Assessment of uncertainty is an essential part of reporting the results from modeling studies. The methods chosen to obtain CIs and/or sensitivity analyses should be reported. Models are affected by the following five major types of uncertainty (26).

Ignorance.

Occasionally there will be little evidence upon which to set the values of a parameter. Ignorance about a parameter should be addressed by a sensitivity analysis.

Known variability.

When a parameter is known to take different values in different settings (for example, the cost of photocoagulation in different hospitals), a sensitivity analysis should also be carried out.

Statistical variability.

This arises when a parameter has been derived from analysis of a cohort. This can occur when the modeler has analyzed cohort data or when the modeler is using a previously published result. The cohort is a sample and, therefore, the parameter is subject to statistical (sampling) variability. In this situation, the following should be reported.

1) CIs for the parameter, if available.

2) If practical, CIs for model results that relied on this parameter.

3) If 2 is not practical, a sensitivity analysis of the effect of varying the parameters of most interest.

Monte-Carlo variability.

This is sometimes called “first-order uncertainty.” If random numbers are used to run the model, the results will vary from one run to the next. In general, this uncertainty should be reduced, as far as possible, by averaging over multiple simulation runs of identical inputs. The number of runs should be stated, with evidence that this is appropriate for the application and evidence that the random number generator is appropriate.

Uncertainty arising from model design.

Since models are a simplification of reality, there is a possibility that the design of a model (e.g., its structure, choice of data sources) is inaccurate for its intended purpose. The impact of these simplifications is potentially important but difficult to quantify (30). One way decision makers can address this is to seek results from more than one model. In addition, successful multiple internal and external validations of a model help build confidence in the result that is generated.

Several features of diabetes pose challenges for models. First, the complications of diabetes may take years or even decades to occur, so models must have long-term horizons and include mortality as a competing risk. Second, diabetes affects multiple organ systems, resulting in many types of complications. These complications not only share common risk factors but also are linked in that one complication may affect the likelihood of others. Therefore, models must include interdependence between complications.

Third, patients with diabetes typically receive a large number of different treatments simultaneously, and these affect a diverse range of outcomes (e.g., ACE inhibitors can prevent cardiovascular and renal disease). For this reason, diabetes models must include a wide range of complications and treatment effects.

Fourth, some complications, such as myocardial infarction, may be rapidly fatal, whereas others, such as blindness, greatly reduce a person’s quality of life but not necessarily life expectancy. Therefore, models should include both the quality and length of a person’s life. Fifth, some diabetes complications, such as blindness, impose small costs on payors but large indirect costs on patients and their families. Therefore, authors and users of models should select their perspective carefully and explicitly state it in their analysis.

Sixth, there can be a long delay between the onset of type 2 diabetes and clinical diagnosis. Models should be able to make this distinction. Finally, the diagnostic criteria for diabetes and related conditions have changed over time. Developers of models should therefore be specific about the criteria used to diagnosis and classify diabetes and other conditions in the model.

Computer modeling in medicine can be an important tool to help clinicians and policy makers make better decisions. Whereas this technology is very well established in other fields of science such as physics, chemistry, and the environmental sciences, and is applied throughout our daily lives, medicine has been relatively slow to adopt it.

One major reason for the reluctance to use computer models and believe in the results they produce is the perception that our understanding of physiology, disease processes, and medicine is still crude. However, clinicians and policy makers are likely underestimating the breadth and depth of our knowledge, and today’s models have become very sophisticated. Another often-heard criticism for not trusting the results generated by computer models is that predicting the future is pure guesswork. But computer models that have undergone extensive internal and external validation should be considered reliable and accurate and thus a very useful aid to decision making.

The guidelines that we have described in this report should be adopted by modelers, and they should provide readers with a better understanding of what they should expect in a model. Their implementation will undoubtedly advance the discipline and make it more credible by promoting transparency and rigor in validation.

American Diabetes Association Consensus Panel members

Rito Bergemann, MD, PhD; Jonathan Brown, MPP, PhD; Wiley Chan, MD, Philip Clarke, PhD; David Eddy, MD, PhD; William Herman, MD, MPH; Andrew J. Palmer, BSC, MBBS; Stephan Roze, PhD; Richard Stevens, PhD; and Ping Zhang, MD.

1.
Sonnenberg FA, Beck JR: Markov models in medical decision making: a practical guide.
Med Decis Making
13
:
322
–338,
1993
2.
Eastman RC, Javitt JC, Herman WH, Dasbach EJ, Zbrozek AS, Dong F, Manninen D, Garfield SA, Copley-Merriman C, Maier W, Eastman JF, Kotsanos J, Cowie CC, Harris M: Model of complications of NIDDM. I. Model construction and assumptions.
Diabetes Care
20
:
725
–734,
1997
3.
Brown JB, Russell A, Chan W, Pedula K, Aickin M: The global diabetes model: user friendly version 3.0.
Diabetes Res Clin Pract
50 (Suppl. 3)
:
S15
–S46,
2000
4.
Palmer AJ, Brandt A, Gozzoli V, Weiss C, Stock H, Wenzel H: Outline of a diabetes disease management model: principles and applications.
Diabetes Res Clin Pract
50 (Suppl. 3)
:
S47
–S56,
2000
5.
Eddy DM, Schlessinger L: Archimedes: a trial-validated model of diabetes.
Diabetes Care
26
:
3093
–3101,
2003
6.
Eddy DM, Schlesinger L: Validation of the Archimedes diabetes model.
Diabetes Care
26
:
3102
–3110,
2003
7.
Weinstein MC, O’Brien B, Hornberger J, Jackson J, Johannesson M, McCabe C, Luce BR, the ISPOR Task Force on Good Research Practices–Modeling Studies: Principles of good practice for decision analytic modeling in health-care evaluation: report of the ISPOR Task Force on Good Research Practices–Modeling Studies.
Value Health
6
:
9
–17,
2003
8.
Palmer AJ, Roze S, Valentine WJ: The CORE diabetes model: projecting long term clinical and economic outcomes in type 1 and type 2 diabetes.
Curr Med Res Opin
. In press
9.
Clarke PM, Gray AM, Briggs A, Farmer A, Fenn P, Stevens R, Matthews D, Stratton IM, Holman R: A model to estimate the lifetime health outcomes of patients with type 2 diabetes: the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS 68).
Diabetologia
. In press
10.
Stevens R, Kothari V, Adler AI, Stratton IM, Holman RR: The UKPDS risk engine: a model for the risk of coronary heart disease in type 2 diabetes (UKPDS 56).
Clin Sci
101
:
671
–679,
2001
11.
The Diabetes Control and Complications Trial Research Group: Lifetime benefits and costs of intensive therapy as practiced in the diabetes control and complications trial.
JAMA
276
:
1409
–1415,
1996
12.
Eastman RC, Javitt JC, Herman WH, Dasbach EJ, Copley-Merriman C, Maier W, Dong F, Manninen D, Zbrozek AS, Kotsanos J, Garfield SA, Harris M: Model of complications of NIDDM. II. Analysis of the health benefits and cost-effectiveness of treating NIDDM with the goal of normoglycemia.
Diabetes Care
20
:
735
–744,
1997
13.
Vijan S, Hofer TP, Hayward RA: Estimated benefits of glycemic control in microvascular complications in type 2 diabetes.
Ann Intern Med
127
:
788
–795,
1997
14.
Yudkin JS, Chaturverdi N: Developing risk stratification charts for diabetic and nondiabetic subjects.
Diabet Med
16
:
219
–227,
1997
15.
Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB: Prediction of coronary heart disease using risk factor categories.
Circulation
97
:
1837
–1847,
1998
16.
CDC Diabetes Cost-Effectiveness Study Group, Centers for Disease Control and Prevention: The cost-effectiveness of screening for type 2 diabetes.
JAMA
280
:
1757
–1763
17.
Stevens R, Adler A, Gray A, Briggs A,Holman R: Life-expectancy projection by modeling and computer simulation (UKPDS 46).
Diabetes Res Clin Pract
50 (Suppl. 3)
:
S5
–S13,
2000
18.
Vijan S, Hofer TP, Hayward RA: Cost-utility analysis of screening intervals for diabetic retinopathy in patients with type 2 diabetes mellitus.
JAMA
280
:
889
–896,
2000
19.
CDC Diabetes Cost-effectiveness Group: Cost-effectiveness of intensive glycemic control, intensified hypertension control, and serum cholesterol level reduction for type 2 diabetes.
JAMA
287
:
2542
–2551,
2002
20.
Kothari V, Stevens RJ, Adler AI, Stratton IM, Manley SE, Neil HAW, Holman RR: Risk of stroke in type 2 diabetes estimated by the UKPDS risk engine (UKPDS 60).
Stroke
33
:
1776
–1781,
2002
21.
Bagust A, Hopkinson PK, Maier W, Currie CJ: An economic model of the long-term health care burden of type II diabetes.
Diabetologia
44
:
2140
–2155,
2001
22.
Caro JJ, Ward AJ, O’Brien JA: Lifetime costs of complications resulting from type 2 diabetes in the U.S.
Diabetes Care
25
:
476
–481,
2002
23.
Nelson KM, Boyko EJ: Predicting impaired glucose tolerance using common clinical information: data from the Third National Health and Nutrion Examination Survey.
Diabetes Care
26
:
2058
–2062,
2003
24.
Drummond MF, Jefferson TO: Guidelines for authors and peer reviewers of economic submissions to the BMJ.
BMJ
313
:
275
–283,
1998
25.
Gold ML, Siegel JE, Russell LB, Weinstein MC:
Cost-Effectiveness in Health and Medicine: The Report of the Panel on Cost-Effectiveness in Health and Medicine
. New York, Oxford University Press,
1996
26.
Siegel JE, Torrance GW, Russell LB, Luce BR, Weinstein MC, Gold MR: Guidelines for pharmacoeconomics studies: recommendations from the panel on cost effectiveness in health and medicine: Panel on Cost Effectiveness in Health and Medicine (Review).
Pharmacoeconomics
11
:
159
–168,
1997
27.
Andrew HB: Handling uncertainty in cost-effectiveness models.
Pharmacoeconomics
17
:
479
–500,
2000
28.
Hougaard P:
Statistics for Biology and Health: Analysis of Multivariate Survival Data
. New York, Springer-Verlag,
2000
29.
Haddix A, Teutsch S, Corso P:
Prevention Effectiveness: A Guide to Decision Analysis and Economic Evaluation
. New York, Oxford University Press,
2003
30.
Altman DG, Royston P: What do we mean by validating a prognostic model?
Stat Med
19
:
453
–473,
2000
31.
Chatfield C: Model uncertainty, data mining and statistical inference.
J R Stat Soc Ser A Stat Soc
158
:
419
–466,
1995

A complete list of American Diabetes Association Consensus Panel members can be found in the appendix.

A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.