OBJECTIVE—We sought to systematically ascertain the quality of randomized controlled trials (RCTs) in diabetes.
RESEARCH DESIGN AND METHODS—We identified the 10 most recently published trials as of 31 October 2003 in each of six general medical, five diabetes, and five metabolism and nutrition journals and further enriched our sample with 10 additional RCTs from each of five journals that published the most eligible RCTs in a year. We explored the association between trial characteristics and reporting quality using univariate analyses and a preplanned multivariate regression model.
RESULTS—After excluding redundant reports of included trials and one trial that measured outcomes on the health system and not on patients, we included 199 RCTs: 119 assessed physiological and other laboratory outcomes, 42 assessed patient-important outcomes (e.g., morbidity and mortality, quality of life), and 38 assessed surrogate outcomes (e.g., disease progression or regression, HbA1c, cholesterol). Fifty-three percent were of low methodological quality, as were one-third (36–40%) of trials reporting patient-important or surrogate outcomes and two-thirds (64%) of laboratory investigations. Independent predictors of low quality were nonprofit funding source (odds ratio 3.1 [95% CI 1.5–6.2]), measure of physiological and laboratory outcomes (2.3 [1.2–4.4]), and cross-over design (2.3 [1.1–4.8]), all characteristics of laboratory clinical investigations.
CONCLUSIONS—There is ample room for improving the quality of diabetes trials. To enhance the practice of evidence-based diabetes care, trialists need to pay closer attention to the rigorous implementation and reporting of important methodological safeguards against bias in randomized trials.
A key principle of evidence-based practice is that one should seek to apply the best available evidence from clinical research (1,2). The expression “best available” suggests a hierarchy of evidence; one ought to draw stronger inferences from evidence that comes from high-quality studies with optimal safeguards to prevent random and systematic error (bias). Most hierarchies of evidence about interventions place high-quality randomized controlled trials (RCTs) at the top of the hierarchy (3). Following this principle, diabetes practitioners should pay particular attention to RCTs to guide their practice.
Not all RCTs share the same quality; that is, not all RCTs yield unbiased results. In the laboratory clinical investigation tradition, “quality” has referred almost exclusively to the rigorousness and reproducibility of the experimental procedures performed on the volunteers as well as the precise and accurate nature of the laboratory determinations. In conducting systematic reviews of RCTs in diabetes (4–6), we have noticed that investigators seem to pay little attention (as judged by the extent to which they report these methods) to methodological safeguards that limit the introduction of bias into RCTs. As a result, these potentially biased RCTs could mislead clinicians. Readers only have access to the methods as reported. When reports leave out critical information about methodological safeguards against bias, readers cannot ascertain whether those safeguards were present or not during trial conduct.
To our knowledge, there is no contemporary systematic assessment of the quality of RCTs in diabetes. Ten years ago, McIver and Dinneen (7) conducted the first evaluation of the quality of RCTs in diabetes and found it lacking. In updating their work, we sought to systematically evaluate the methodological quality of RCTs in diabetes (as reported in major medical journals), taking into account the advances made in the last 10 years in the conduct and reporting of RCTs.
RESEARCH DESIGN AND METHODS
Eligibility criteria
Eligible articles were reports of studies describing random allocation of human participants to at least two interventions, one of which was a control intervention; participants had to be patients with any form of diabetes or people at risk for developing any form of diabetes. When several reports referred to the same trial, we retained the original report.
Search strategy
To describe the best RCTs in diabetes, we sought to identify RCTs published in the top journals judged by 2003 impact-factor rankings. Furthermore, we purposefully selected both general journals (n = 6, New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine, BMJ, and Archives of Internal Medicine) as well as pertinent specialty journals (n = 5 in diabetes: Diabetes, Diabetes Care, Diabetic Medicine, Diabetologia, and Diabetes and Metabolism Research; and n = 5 in metabolism and nutrition: Journal of Clinical Endocrinology and Metabolism, Metabolism: clinical and experimental, American Journal of Clinical Nutrition, Journal of the American College of Nutrition, and European Journal of Clinical Nutrition).
To identify RCTs published in each of these journals, Y.G.W. (under the supervision of V.M.M.) conducted an online search using the PubMed interface (www.pubmed.gov) and the search terms “journal name” [Journal] and (diabet* [title] or niddm [title] or iddm [title]), limited to human studies in English indexed with abstracts. We then retrieved each apparently eligible abstract and read the article in full to determine its eligibility. We identified the 10 most recently published and eligible RCTs from each journal as of 31 October 2003. We further enriched our sample with 10 additional RCTs from each of the five journals that published the most eligible RCTs in a year.
Data extraction
We developed, pilot tested, and used a standardized form to abstract data from each of the eligible RCTs about research methods (allocation concealment, blinding, intention to treat, and loss to follow-up), statistical reporting (between-arm versus within-arm comparisons and use of CIs versus P values), and funding source. Also, we noted the patients enrolled (number and description) and the intervention types (e.g., drugs, procedures, diet).
Outcome and trial classification
To classify the reported primary outcomes and adverse events, we noted outcomes expected to directly impact on patients’ quality of life (which we refer to as “patient-important outcomes” [8]), those that assess the response to physiological and other laboratory maneuvers (“physiological and laboratory outcomes”), and those that lie intermediate to these two classes, such as measures that may indicate an increased risk for patient-important outcomes (“surrogate outcomes”).
We considered trials to have low methodological quality when they met three or more of these criteria (online appendix [available at http://care.diabetesjournals.org]): inadequate (or not reported) allocation concealment, inadequate (or not reported) blinding of patients and of caregivers, failure to adhere to the intention-to-treat principle, or a reporting of >10% (or did not report information to calculate) loss to follow-up. Allocation concealment refers to the extent to which researchers assessed eligibility and to which enrolled patients were kept unaware of the randomization sequence such that they could not predict the arm of the trial to which the next patient would be allocated. Examples of adequate allocation concealment include central (online or phone-in) randomization and medication dispensing in coded containers. The explicit reporting of blinding of patients or caregivers or the statement that the trial was “double blind” and tested one intervention against a placebo qualified as adequate blinding. Given that most modern trials estimate small treatment effects, loss to follow-up >10% was considered inadequate. Adherence to the intention-to-treat principle requires minimal loss to follow-up and minimal cross over. We limited the assessment of adherence to this principle to the explicit reporting of the conduct of analyses by the intention-to-treat principle (i.e., statement or evidence that patients were analyzed in the arm to which they were randomized). In the case of trials with several reports (e.g., the index report referenced papers published earlier and describing the methods in detail), we considered all available reports to ascertain methodological quality.
Reproducibility
Since these classifications required judgments, we sought to standardize the use of the form through iterative use, item reduction or clarification, and re-review by the four abstractors (Y.G.W., P.A., S.B., and V.M.M.). We extracted data in duplicate until we achieved adequate reproducibility (chance-adjusted interrater reliability [κ] >90%) and thereafter continued with individual data extraction.
Statistical analyses
This study uses descriptive statistics to characterize the quality of RCTs in diabetes. All univariate analyses explored associations to generate hypotheses related to the quality of RCTs and its relation to RCT characteristics.
From the literature and our previous work, we proposed the following predictors of quality of RCTs: journal of publication (general versus specialist [9]), publication of Consolidated Standards of Reporting Trials (CONSORT) flow chart (as a surrogate marker for adherence to CONSORT statement [10,11]), parallel versus cross-over design (there are no recently published and widely endorsed standards for reporting of cross-over trials), and funding source (12). We also put forth that trials seeking to impact clinical practice by measure of patient-important outcomes and surrogate outcomes were of better quality than laboratory investigations.
To test these predictors, we constructed a multivariable model with all these predictors entered at once and tested this model, using logistic regression, with low quality (yes/no) as the dependent variable. Associations were described using odds ratios (ORs) and their associated 95% CIs.
RESULTS
After screening 357 potentially eligible abstracts, we selected 209 eligible RCTs, of which 9 were additional reports of RCTs already represented in the sample (e.g., subanalyses of the Diabetes Control and Complications Trial and U.K. Prospective Diabetes Study). One RCT focused its intervention on the delivery of health care and reported only process outcomes (i.e., degree of adherence to guidelines) and was excluded from further description. Table 1 describes the characteristics of the included 199 RCTs classified by type of outcome.
Types of trials and funding
Typical RCTs measuring patient-important and surrogate outcomes were single-center parallel-design drug trials enrolling patients with type 2 diabetes and funded by for-profit agencies (e.g., pharmaceutical companies). Typical RCTs measuring physiological and other laboratory measures were single-center parallel or cross-over design drug trials enrolling patients with type 2 diabetes and funded through mixed or not-for-profit funding (e.g., National Institutes of Health).
Outcomes
Most trials reported physiological and other laboratory outcomes (n = 118). Trials describing patient-important outcomes (n = 42) reported on the effect of interventions on one or more of the following: mortality (n = 12), major morbidity (such as myocardial infarction or stroke, n = 10), minor morbidity (such as transient ischemic attack and severe hypoglycemia, n = 5), physical or mental disability (n = 4), discomfort that hinders daily living (such as minor hypoglycemia, n = 8), and specific measure of quality of life (n = 7). Trials reporting surrogate outcomes (n = 38) reported on the effect of interventions on disease progression or regression (n = 12) or on laboratory measures such as HbA1c (A1C) or cholesterol measures (n = 31).
Reported methodological quality of RCTs and its predictors
Table 2 describes the methodological characteristics of the included RCTs; 106 trials (53%) were considered of low methodological quality. While 36 and 40% of trials reporting patient-important or surrogate outcomes were of low quality, respectively, 64% of the laboratory investigations were of low reported methodological quality.
Apart from lack of publication of a patient flow chart (OR 1.0 [95% CI 0.5–2.5]) and publication in a specialist journal (1.5 [0.7–3.2]), all variables included in the predefined model were independent significant predictors. Independent predictors of low quality were nonprofit funding source (3.1 [1.5–6.2]), report of physiological and laboratory outcomes (2.3 [1.2–4.4]), and cross-over design (2.3 [1.1–4.8]). Much of the variability in methodological quality, however, remains unexplained; the complete model significantly predicted 14% of the variability in quality in the included RCTs (P < 0.0001).
Report elements to support evidence-based clinical decision making
Laboratory investigations were as likely as RCTs reporting patient-important and surrogate outcomes to report adverse events (n = 64 [54%] vs. n = 37 [46%]). These trials also followed fewer patients (median of 54 vs. 160 or 474, respectively) for a shorter period (median of 16 vs. 33 or 136 weeks, respectively). Laboratory investigations were less likely than RCTs measuring patient-important or surrogate outcomes to use estimation (CIs) to describe the precision of the results (n = 24 [20%] vs. n = 40 [50%], respectively). Also, we found that 13 of 78 (17%) parallel-design laboratory investigations failed to present the results of comparisons between intervention and control arms, presenting before-after comparisons within each arm.
CONCLUSIONS
RCTs in diabetes published in pertinent top journals, both general and specialized, have important deficiencies in their report of key methodological features. These deficiencies are most remarkable in laboratory investigations with trials that measured patient-important outcomes showing better reporting. Many RCTs measured patient-important outcomes, but very few of these assessed nonpharmacological interventions. Despite the worldwide explosion of diabetes as a major public health problem, most trials came from researchers working in the northern hemisphere.
Limitations and strengths
By the nature of our selection process (i.e., sampling exclusively from top journals), our work likely overestimates the quality of diabetes RCTs in general as well as the proportion of these trials that measured patient-important outcomes. Thus, the deficiencies in reporting or methodological quality documented here may very well represent a “best case” scenario, therefore strengthening any calls for improvement in the conduct and report of RCTs in diabetes. Further, our reproducible methods using multiple judges and our focused analyses strengthen the inferences drawn from these data.
Comparison with previous research
McIver and Dinneen (7) evaluated 79 RCTs related to type 2 diabetes published nearly 10 years ago (1994–1995) and indexed in Medline. Almost half (42%) of the trials they assessed were published in the journals Diabetes Care or Diabetic Medicine. We sampled 28% of our RCTs from these two journals. Compared with our cohort of RCTs, RCTs in the McIver and Dinneen cohort enrolled fewer patients (median patients randomized 40 [range 5–2,769]), followed them for shorter periods (median duration 22 weeks [0.2–260]), and were less likely to measure patient-important outcomes (9%). However, the proportion of RCTs adequately reporting allocation methods (15%) and blinding (58%, limited to “double blind”) seems similar 10 years later.
Of note, recent empirical evidence supports the explicit reporting of the allocation concealment process (13–15), of which groups (participants, clinicians, data collectors, data analysts, and judicial assessors of outcomes) were blinded (not just the term “double blind” [16,17]) and the extent to which the trial was conducted under the intention-to-treat principle (18,19). While the revised CONSORT statement offers guidelines for the reporting of parallel RCTs (10), including the reporting of harms (20), we and others have documented gaps in the reporting of RCTs, even in RCTs published in journals that endorse, but fail to enforce, CONSORT (9,21,22).
Also, it is important to separate the reported methodological quality of an RCT with the actual rigor of its conduct. Generally critical readers have assumed that if something important (e.g., a methodological safeguard against bias) was not mentioned, it likely did not happen. We undertook an evaluation of this assumption (in which we interviewed the authors and compared their answers with what they reported) and found it excessively pessimistic (23). For example, while only 58% of trials reported adequate allocation concealment, >90% of the trials actually had implemented an adequate strategy to conceal the allocation sequence from personnel enrolling patients and assessing their eligibility. Extrapolating those results to this cohort of RCTs, it is likely that RCTs in diabetes have better methods than may be apparent from the careful review we conducted of their published reports. To mitigate this uncertainty, it is fair to suggest to trialists to work toward improving the report of their trials by adhering more closely to standards of reporting such as the CONSORT statement. Journals also can do their part in enforcing adherence to the CONSORT standards they endorse (9). Furthermore, key reporting elements that are thought to use too much printed space (even though CONSORT sets a minimal reporting requirement) by editors and authors could appear in electronic appendixes in journal websites linked to the original publication instead of being edited out.
Despite our sampling from top journals, only 20% of the included diabetes trials reported patient-important outcomes. While there is ample room for improving the reporting of key methodological features (e.g., only one in five reports allocation concealment) and of adverse events in these trials, they follow more patients for a longer time and half of them lost <4% of patients to follow-up. This situation is encouraging because these trials play a major role in informing evidence-based clinical decision making and clinical policy.
There are, however, some other areas of concern. First, it is worrisome that 17% of laboratory investigations report before-after comparisons within experimental and control groups but do not report the relevant comparison (i.e., between outcomes in the experimental and control groups). In these trials, the inferences drawn ignore the inferential strength that the randomization offers, thus foregoing the advantages associated with having a concomitant control group with similar prognosis. Second, diabetes trialists seem to favor hypotheses testing and reporting of P values rather than the more informative CIs. For clinicians, only knowing the answer to the test of a hypothesis (i.e., whether treatment changed the outcome) is often insufficient and could be misleading (24). It is often more informative to know the extent to which the treatment works (e.g., the size of the reduction in risk) and the precision around this estimate (i.e., the CI). For example, CIs can help readers determine whether a trial was not large enough to rule out important treatment effects when the association between intervention and outcomes is “not significant” (25). We believe these concerns should be corrected as a service to readers seeking to use RCT evidence in their clinical practice.
In conclusion, RCTs in diabetes published in pertinent top journals, both general and specialized, have important deficiencies in their report of key methodological features. These deficiencies are most remarkable in laboratory investigations with trials that measured patient-important outcomes showing better reporting with some room for improvement. To enhance the practice of evidence-based diabetes care, trialists need to pay closer attention to the rigorous implementation and reporting of important methodological safeguards against bias. Furthermore, clinicians and their patients with diabetes need more RCTs assessing the effect of promising interventions, including nonpharmacological interventions, on patient-important outcomes.
Article Information
While conducting this work, V.M.M. was a Mayo Foundation Scholar and P.A.-C. held a postgraduate research fellowship at the Instituto Carlos III, Spanish Ministry of Health. P.A.-C.’s activities are supported in part by the Red Temática MBE (FIS G03/090).
References
Additional information on this article can be found in an online appendix available at http://care.diabetesjournals.org.
A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.