Although A1C is now recommended to diagnose diabetes, its test performance for diagnosis and prognosis is uncertain. Our objective was to assess the test performance of A1C against single and repeat glucose measurements for diagnosis of prevalent diabetes and for prediction of incident diabetes.
We conducted population-based analyses of 12,485 participants in the Atherosclerosis Risk in Communities (ARIC) study and a subpopulation of 691 participants in the Third National Health and Nutrition Examination Survey (NHANES III) with repeat test results.
Against a single fasting glucose ≥126 mg/dl, the sensitivity and specificity of A1C ≥6.5% for detection of prevalent diabetes were 47 and 98%, respectively (area under the curve 0.892). Against repeated fasting glucose (3 years apart) ≥126 mg/dl, sensitivity improved to 67% and specificity remained high (97%) (AUC 0.936). Similar results were obtained in NHANES III against repeated fasting glucose 2 weeks apart. The accuracy of A1C was consistent across age, BMI, and race groups. For individuals with fasting glucose ≥126 mg/dl and A1C ≥6.5% at baseline, the 10-year risk of diagnosed diabetes was 88% compared with 55% among those individuals with fasting glucose ≥126 mg/dl and A1C 5.7–<6.5%.
A1C performs well as a diagnostic tool when diabetes definitions that most closely resemble those used in clinical practice are used as the “gold standard.” The high risk of diabetes among individuals with both elevated fasting glucose and A1C suggests a dual role for fasting glucose and A1C for prediction of diabetes.
Although A1C is now recommended for the diagnosis of diabetes (1,2), its precise test performance is uncertain. The lack of a single, clear “gold standard” poses a challenge for determining the performance of A1C. Previous diagnostic studies of A1C have relied exclusively on a single elevated fasting or 2-h glucose values as gold standards (3,–5). However, because glucose determinations are inherently more variable than A1C (6), these convenient gold standards are likely to reduce the apparent accuracy of A1C as a diagnostic test. A stronger gold standard would rely on repeated glucose determinations on different days (2), i.e., the recommended approach to diagnosis of diabetes in clinical practice. Alternatively, A1C and fasting glucose can be compared head-to-head against the subsequent development of clinically diagnosed diabetes as the gold standard. We hypothesized that 1) A1C would perform well as a diagnostic and prognostic test for diabetes across its full range and at the American Diabetes Association–recommended threshold of 6.5% and 2) that its performance would be best when judged against stronger, most clinically relevant gold standards.
RESEARCH DESIGN AND METHODS
Atherosclerosis risk in communities study
The Atherosclerosis Risk in Communities (ARIC) study is a community-based cohort study of 15,792 black or white adults aged 45–64 years at baseline sampled from four U.S. communities. The first clinical examinations (visit 1) took place during 1987–1989, with three follow-up visits approximately every 3 years. Information on diabetes status including self-reported physician diagnosis, diabetes medication use, and fasting glucose measurements was obtained from all participants at each clinical examination (7). Visit 2 (1990–1992), attended by 14,348 participants, was the only visit for which stored whole blood samples were available for measurement of A1C and is the baseline for the present study. We excluded participants who identified race/ethnicity as other than white or black, who had a self-reported physician diagnosis of diabetes or diabetes medication use (visit 1 or visit 2), who were nonfasting, or who were missing variables of interest; thus, the final sample size for our main analyses was 12,485 individuals.
Second examination of the third national health and nutrition examination survey
The Second Examination of the Third National Health and Nutrition Examination Survey (NHANES III Second Exam) was a substudy of NHANES III, conducted in 1988–1994 (8). A nonrandom sample of 2,596 participants in the original NHANES III was selected for participation in this substudy; these data comprise one of the largest available databases of short-term repeat laboratory and physical examination measurements in humans (6,9,10). We excluded NHANES III Second Exam participants who were aged <18 years (n = 387), who were fasting <8 h (n = 1,411), who reported a prior physician diagnosis of diabetes (n = 53), or who were missing glucose or A1C values (n = 54). After these exclusions, 691 participants remained for analysis. Oral glucose tolerance tests (OGTTs) were only performed on individuals who were aged 40–74 years and who had morning examinations; all analyses of 2-h glucose measurements were limited to the smaller sample of individuals with valid OGTT data (n = 317).
Measurement of glucose and A1C
In the ARIC study, serum glucose was measured from blood collected at each visit using the hexokinase method. We thawed and assayed frozen whole blood samples collected at ARIC study visit 2 (1990–1992) for measurement of A1C using high-performance liquid chromatography (Tosoh 2.2 Plus in 2003–2004 and Tosoh G7 in 2007–2008, Tosoh Corporation, Tokyo, Japan) (standardized to the Diabetes Control and Complications Trial [DCCT] assay) (11). Because A1C data are only available at visit 2 in the ARIC study, this was used as the baseline examination in the present study.
In addition to plasma glucose measurements, the NHANES III examination included a 2-h 75-g OGTT in adults aged 40–75 years. A1C measurements were obtained using the Diamat high-performance liquid chromatography assay (Bio-Rad Laboratories) (standardized to the DCCT assay). Because variant hemoglobins can interfere with A1C measurement by the Diamat assay, samples with evidence of interference were reanalyzed by affinity chromatography. The NHANES III repeat examinations were conducted approximately 2 weeks after the first examination by trained personnel following the same standardized protocols (6). Detailed information on data collection and laboratory procedures in NHANES III are described elsewhere (12,13).
Prevalent undiagnosed diabetes
In the ARIC and NHANES III studies, we used the repeat glucose values available to compare different definitions of prevalent undiagnosed diabetes (Table 2). In the ARIC study, we generated two principle definitions of prevalent diabetes (definition 1: a single fasting glucose value ≥126 mg/dl at baseline [visit 2]; and definition 2: fasting glucose values ≥126 mg/dl at two separate examinations). In the ARIC study, the two fasting glucose measurements took place at the clinical examinations that were 3 years apart. In the NHANES III subpopulation, the two clinical examinations took place ∼2 weeks apart (mean 17 days). In the NHANES III subpopulation, we also examined individuals with undiagnosed diabetes defined by single and repeat 2-h glucose values ≥200 mg/dl.
Incident diabetes in the ARIC study
On-going longitudinal follow-up of ARIC study participants also provided us the opportunity to assess the performance of baseline A1C for classification of incident diabetes. We used two definitions of incident diabetes: a visit-based definition (definition A) and an interview-based definition (definition B). For definition A, we used a standard time-to-diabetes definition based on glucose measurements, self-reported diagnosis, or medication use for a maximum of 6 years of follow-up (14). For definition B, we used self-reported information on diabetes diagnosis and medication use during the visits and subsequent annual telephone calls for a maximum of 15 years of follow-up (15).
We examined population characteristics by diagnostic categories of A1C (<6.5% and ≥6.5%) among individuals without a history of diagnosed diabetes in the ARIC and NHANES III populations. We calculated the sensitivity, specificity, and positive and negative likelihood ratios at all cutoffs of A1C for each definition of prevalent and incident diabetes. We calculated the area under the receiver operator characteristic (ROC) curve (AUC) for A1C overall and by subgroups of the population in the ARIC study only (the sample size for the NHANES III subpopulation was insufficient for subgroup analyses): age (<60 or ≥60 years), race/ethnicity (white or African American), and BMI categories (<25, 25–<30, or ≥30 kg/m2). We also conducted sensitivity analyses in individuals with anemia. We used data from the ARIC study to estimate the subsequent risk of diabetes during follow-up (15). Using the Kaplan-Meier method, we estimated the 10-year cumulative incidence of diagnosed diabetes according to clinical categories of baseline fasting glucose and A1C.
The characteristics of the study populations by A1C <6.5% and ≥6.5% are shown in Table 1. Among NHANES III participants with A1C ≥6.5% at the baseline examination, 80% (61–92%) also had A1C ≥6.5% at the second examination ∼17 days later. Among participants with fasting glucose ≥126 mg/dl at baseline, 70% (50–86%) had fasting glucose ≥126 mg/dl at the second examination. In the ARIC study, 60% (57–64%) of participants with a fasting glucose ≥126 mg/dl at the baseline examination also had a fasting glucose ≥126 mg/dl at the 3-year follow-up visit. Repeat A1C measurements were not available in the ARIC study.
The AUCs for A1C for the identification of diabetes using multiple definitions are shown in Table 2. In the ARIC study, the highest AUC was for diabetes defined by two fasting glucose measurements ≥126 mg/dl, 3 years apart (definition 2): AUC 0.936 (95% CI 0.920–0.952). The accuracy of A1C was consistent across age and race groups, but there was a significant difference in the accuracy across BMI categories (supplementary Table, available in an online appendix at http://care.diabetesjournals.org/cgi/content/full/dc10-1235/DC1). For definition 1, the accuracy of A1C was lower among normal-weight individuals compared with overweight and obese individuals. However, for definition 2, A1C demonstrated high and consistent accuracy across all BMI groups. The AUCs were also high in NHANES for both definitions (Table 2). The AUC for A1C for a single fasting glucose at baseline as the gold standard in NHANES III (definition 1) was (0.938 [95% CI 0.881–0.995]), but the sample size was small (n = 29 cases of diabetes), with corresponding imprecision reflected in the wide CIs. Two-hour glucose measurements were also available in NHANES III, and analyses using single and repeated 2-h glucose measurements demonstrated similarly high AUCs. For instance, the AUC for A1C was 0.959 (0.899–1.00) for diabetes defined as a fasting glucose ≥126 and 2-h glucose ≥200 mg/dl at the baseline examination. The AUCs for fasting glucose ≥126 mg/dl and 2-h glucose ≥200 mg/dl at both the baseline and follow-up examination ∼17 days later were identical (0.959 [0.885–1.000]), but, again, the number of cases of diabetes in this study population was small (n = 14).
The ROC curves for A1C in the ARIC study comparing definitions 1 and 2 are shown in Fig. 1. The sensitivity and specificity of A1C ≥6.5% for identifying cases of diabetes by definition 1 (single fasting glucose ≥126 mg/dl) were 47 and 98%, respectively. For definition 2 (fasting glucose ≥126 mg/dl at two time points, 3 years apart), the sensitivity was 67% and the specificity was 97%. Detailed information on the sensitivity, specificity, and positive and negative likelihood ratios of all A1C cutoffs for each definition of diabetes in ARIC and NHANES III are presented in the supplementary Table. The prevalence of anemia (hemoglobin <13 g/dl in men and <12 g/dl in women ) was 9.9% in the ARIC population. In analyses among individuals with anemia, the overall performance (AUC), sensitivity, and specificity of A1C were similar (data not shown).
Follow-up data available in the ARIC study allowed us to examine the performance of A1C for prediction of subsequent diabetes (Table 2). The AUC for the visit-detected cases of diabetes during 6 years of follow-up (definition A) was 0.827 (0.813–0.840). The AUC for diagnosed (interview-based) diabetes risk during a median of 14 years of follow-up (definition B) was 0.733 (0.722–0.745). However, the AUC for interview-based diagnosed cases during the first 6 years of follow-up only was similar to that for the visit-detected cases: 0.826 (0.800–0.851).
The 10-year risk of diagnosed diabetes was 12% (11–13%) and the 15-year risk was 25% (24–26%) in the ARIC study. The 10-year risks of diagnosed diabetes stratified by categories of baseline and A1C and fasting glucose are shown in Fig. 2.
The accuracy of A1C for diagnosis of prevalent diabetes was high for all reference definitions and robust across subpopulations (all AUCs >0.80). However, the sensitivity and specificity of A1C ≥6.5% to identify cases of undiagnosed diabetes varied, depending on the definition of undiagnosed diabetes used as the gold standard. Use of repeat glucose tests to define diabetes substantially reduced the proportion of individuals in the discordant cells, particularly individuals with fasting glucose (≥126 mg/dl) and A1C (<6.5%). In this study, a value of A1C ≥6.5% strongly predicted a subsequent diagnosis of diabetes, even among individuals classified as having undiagnosed diabetes by fasting glucose. We observed that those with high A1C but without elevated fasting glucose had a lower incidence of diagnosed diabetes than those meeting both the glucose and A1C criteria. Such results suggest a dual role for A1C and glucose for the prediction of diabetes.
New guidelines for the use of A1C for the diagnosis of diabetes recommend that an elevated A1C or fasting glucose or OGTT results be confirmed on a second occasion or that the diagnosis be confirmed by a different test on the same occasion. Analyses of short-term (6) and long-term reliability (17) have demonstrated that A1C is significantly less variable than fasting glucose or 2-h glucose. Our data suggest an A1C ≥6.5% confirmed by a fasting glucose ≥126 mg/dl on the same occasion has a high positive predictive value (88%) for 10-year diabetes risk. We observed a substantial risk of diagnosed diabetes at high levels of A1C regardless of the baseline fasting glucose level, consistent with previous analyses of these data (15). However, the observed diabetes risk among individuals in discordant A1C and fasting glucose risk categories suggests the utility of using both A1C and fasting glucose to reduce misclassification and accurately identify individuals at risk of future diabetes.
An A1C cut point of 6.5% is highly specific but has low sensitivity for the identification of prevalent undiagnosed diabetes by these definitions. The low sensitivity of A1C does not necessarily mean that A1C has poor test performance; it may be that A1C is more appropriately distinguishing those individuals at risk for subsequent complications. Recommendations for use of A1C for diagnosis, monitoring glycemic control, and guiding therapy in diabetes are largely based on the association of A1C with long-term microvascular outcomes (2). Epidemiological studies have also demonstrated associations among A1C and risk of diabetes, cardiovascular disease, and all-cause mortality in nondiabetic individuals (15), with substantial increases in risk observed at A1C values ≥6.0%. Prospective epidemiological data demonstrate that A1C is a stronger predictor of vascular complications than fasting glucose (15).
As has been shown in previous studies (18,19), blacks were substantially more likely to have elevated A1C values than whites. Nonetheless, we did not observe substantial racial differences in the overall performance of A1C for the identification or prediction of diabetes.
This study highlights a weakness in the literature related to diabetes diagnosis: the lack of confirmatory glucose testing to replicate the typical clinical scenario in which multiple tests are conducted before a diagnosis is made. In studies with no confirmatory testing, A1C will seem to miss a proportion of fasting glucose-defined cases of diabetes. This is partly a function of the short-term variability in fasting glucose. When repeat testing is used, this discordance is reduced. We have shown previously that A1C has substantially lower within-person variability compared with fasting and 2-h glucose (6). Whereas repeating fasting glucose tests can substantially reduce misclassification, the high reliability of A1C suggests that a single measurement is sufficient for diagnostic classification unless laboratory error or interference is suspected.
For diabetes and other asymptomatic conditions that are clinically defined, diagnostic testing studies should use multiple reference standards, ideally emphasizing those that most closely replicate the clinical setting. Previous studies have highlighted the low sensitivity of A1C for detecting diabetes defined by a single fasting or 2-h glucose measurement (3,5,20,21). Our data demonstrated improved performance of A1C compared with stronger gold standards that relied on repeated glucose determinations on different days.
A number of limitations of this study should be considered. This study was not a traditional diagnostic testing study, and we did not conduct a head-to-head comparison of the accuracy of glucose compared with A1C; for this, a third reference standard would be needed. The small number of cases of diabetes by any definition among participants in the NHANES III subsample resulted in imprecision of our estimates and prevented further subgroup analyses in this population. We also did not have information on 2-h glucose at the time of A1C measurement in the ARIC study. Nonetheless, the ARIC study is one of the largest U.S. cohort studies of A1C with information on the development of diabetes. The additional analyses of NHANES demonstrated consistency across populations and allowed us to examine definitions of diabetes based on short-term repeat glucose measurements (weeks apart).
In summary, we found that A1C performs best when more stringent glucose criteria are used to define diabetes (i.e., fasting glucose ≥126 mg/dl on two separate occasions), similar to clinical practice. Our data support current recommendations for use of A1C in the diagnosis of diabetes and demonstrate that an A1C cutoff of 6.5% is highly specific and may be reasonably sensitive in the context of evidence linking A1C to risk of long-term microvascular and macrovascular outcomes in nondiabetic adults. We also found that A1C and fasting glucose both strongly predict subsequent risk of diagnosed diabetes, but the very high risk observed for individuals with both elevated fasting glucose and A1C suggests a dual role for fasting glucose and A1C for prediction of diabetes.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This research was supported by National Institutes of Health (NIH) National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) (grants R21-DK-080294 and K01-DK-076595 to E.S.). The ARIC study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022). F.L.B. was supported by the NIH NIDDK (grant K24-DK-62222) and by the Johns Hopkins Diabetes Research and Training Center (NIDDK grant P60-DK-079637).
No potential conflicts of interest relevant to this article were reported.
E.S. designed the study, collected the data, analyzed the data, and wrote the manuscript. M.W.S. collected the data, contributed to discussion, and reviewed/edited the manuscript. E.G., F.L.B., and J.C. contributed to discussion and reviewed/edited the manuscript.
Parts of this study were presented in abstract form at the 70th annual meeting of the American Diabetes Association, Orlando, Florida, 25–29 June 2010.
We thank the staff and participants of the ARIC study for their important contributions. We also thank Yang Ning, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, for assistance with the graphical display of the data.