In a recent letter in Diabetes Care, Lockwood (1) presented a statistically significant correlation between statewide diabetes prevalence and statewide total air pollution emissions reported in the Environmental Protection Agency’s (EPA) toxic release inventory (TRI) database (r = 0.54, P < 0.0001). Lockwood noted that such a correlation does not necessarily result from a causal relationship, but called for further research into understanding the association between air pollution and diabetes. In response, Nicolich (2) took issue with Lockwood’s use of statewide data. To demonstrate that correlations based on statewide data may not show causal relationships, Nicolich presented four highly statistically significant correlations between statewide diabetes prevalence and factors that would not be expected to be causal factors in diabetes: latitude of the state capital, longitude of the state capital, state population, and numerical position of the state name on an alphabetized list. Nicolich stated that relationships should be based on individual-level data, rather than statewide data, and on the existence of a plausible mechanism. Lockwood’s response (3) pointed out a previous association between Nicolich and ExxonMobil but did not address Nicolich’s claim that statewide data are inherently prone to nonsensical correlations.

The highly significant correlations pointed out by both Lockwood and Nicolich are puzzling. There should be some explanation for these correlations, though as both authors note, this explanation need not be causal in nature. To investigate the issue further, the calculations of Lockwood and Nicholich were repeated using data available on the internet (http://www.epa.gov/tri, http://www.census.gov, and http://apps.nccd.cdc.gov/brfss). The Pearson correlation coefficient calculated between log diabetes prevalence and log TRI air emissions matched the value reported by Lockwood (r = 0.54, P < 0.0001). The correlation between log population and log diabetes prevalence (r = 0.48, P < 0.001) also closely matched the value reported by Nicolich. However, the correlation of diabetes prevalence and state alphabetic rank was nonsignificant (r = −0.017, P = 0.904), in contrast to the results reported by Nicolich (r = 0.49, P < 0.001). Log transformations of either or both variables did not produce a statistically significant result.

The potential role of confounding in producing these correlations was examined using a multivariate regression approach. Statewide diabetes prevalence was regressed on both state population and TRI emissions because these factors had been shown to be significant in the bivariate analysis. In addition, the proportions of the state population in each of three ethnic groups (African American, Latino, and white) were included in the regression because ethnicity is known to influence diabetes prevalence. All variables were log transformed since this was observed to produce roughly normally distributed residuals.

The results (Table 1) indicate that only the association between statewide diabetes prevalence and proportion of African-American population is statistically significant. The bivariate correlations noted by Lockwood and Nicolich appear to result from partial confounding with this factor. African Americans have historically migrated to large industrial states, such as New York, Michigan, Louisiana, and Texas, that would be expected to have both high populations and high TRI air emissions. In contrast, rural northern states, such as Vermont, North Dakota, and Idaho, have low populations, low TRI emissions, and low proportions of African Americans. The negative correlations with latitude and longitude reported by Nicolich appear to result from higher African-American populations in the southeastern states.

This does not rule out air pollution as a causal factor in diabetes. However, the analysis of state-level emissions data is unlikely to yield much insight into this issue given the lack of contaminant-specific exposure information, the small variation in the statewide prevalence that would be expected from environmental factors, and the many potentially confounding factors. Further research into the causes of diabetes is certainly desirable (1), and promising avenues of research (2) include individual-level and mechanistic studies (48).

Although the analysis of state-level data would not be expected to be a powerful tool to understand individual-level risk factors, it may be a worthwhile enterprise for other reasons. Understanding regional variations and their underlying causes may help to focus and prioritize efforts to improve health outcomes. In this particular case, the explanatory power of ethnicity is striking and may provide motivation to efforts to assist African Americans with both the prevention and treatment of diabetes.

Table 1—

Linear regression coefficients

VariableStandardized coefficienttP
ln TRI emissions 0.235 1.297 0.201 
ln population 0.079 0.387 0.701 
ln percent white −0.125 −0.919 0.363 
ln percent African American 0.487 3.417 0.001 
ln percent Latino −0.142 −0.940 0.352 
VariableStandardized coefficienttP
ln TRI emissions 0.235 1.297 0.201 
ln population 0.079 0.387 0.701 
ln percent white −0.125 −0.919 0.363 
ln percent African American 0.487 3.417 0.001 
ln percent Latino −0.142 −0.940 0.352 

The natural logarithm of the state prevalence of diabetes is the dependent variable.

1
Lockwood AH: Diabetes and air pollution (Letter).
Diabetes Care
25
:
1487
–1488,
2002
2
Nicolich MJ: Diabetes and the state capital (Letter).
Diabetes Care
25
:
2367
,
2002
3
Lockwood AH: Response to Nicolich (Letter).
Diabetes Care
25
:
2367
–2368,
2002
4
Henriksen GL, Ketchum NS, Michalek JE, Swaby JA: Serum dioxin and diabetes mellitus in veterans of Operation Ranch Hand.
Epidemiology
8
:
252
–258,
1997
5
Michalek JE, Akhtar FZ, Kiel JL: Serum dioxin, insulin, fasting glucose, and sex hormone-binding globulin in veterans of Operation Ranch Hand.
J Clin Endocrinol Metab
84
:
1540
–1543,
1999
6
Calvert GM, Sweeney MH, Deddens J, Wall DK: Evaluation of diabetes mellitus, serum glucose, and thyroid function among United States workers exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin.
Occup Environ Med
56
:
270
–276,
1999
7
Roegner RH, Grubbs WD, Lustik MB, Brockman AS, Henderson SC:
Air Force Health Study: An Epidemiologic Investigation of Health Effects in Air Force Personnel Following Exposure to Herbicides
. McLean, VA, Science Applications International Corporation,
1991
8
Veterans and Agent Orange: Herbicide/Dioxin Exposure and Type 2 Diabetes: Committee to Review the Evidence Regarding the Link Between Exposure to Agent Orange and Diabetes
. Washington, D.C., Division of Health Promotion and Disease Prevention, Institute of Medicine, National Academy Press,
2000