In a recent letter in Diabetes Care, Lockwood (1) pointed out that there was a strong correlation (r = 0.54, P = 0.000057) between the statewide self-reported diabetes prevalence in 2000 and the total statewide air toxic release inventory (TRI) in 1999 for the 50 states and Washington, D.C. He pointed out that “[although] […] the correlation between air emissions and the prevalence of diabetes does not prove a cause-and-effect relationship, the significance of the relationship demands attention.”
I agree that the correlation does not prove a cause-and-effect relationship, but the demand for attention is questionable. The demand for attention is based on the magnitude of the observed correlation, but to attribute possible cause requires at least a plausible mechanism and individual-level data (not statewide averages). Lockwood developed an impression that dioxins are the main culprit in the hypothetical exposure-response relationship, but it is difficult to understand how the reported correlation is useful for developing the relationship since dioxins are not, as he noted, one of the chemicals inventoried in the TRI.
As an example of how looking at statewide averages (group data) can lead to questionable results, the self-report diabetes data were downloaded from the CDC behaviorial risk factor surveillance systems Web site (2), as were the latitudes and longitudes of each of the state capitals and the state population sizes in 2000. Correlations were calculated among these variables using the same techniques as Lockwood (1) used.
The correlations were instructive. Table 1 shows the Pearson correlation between statewide diabetes prevalence and a selected state level variable in addition to the associated significance level of the correlation (P values available only to three significant figures).
The correlation between statewide diabetes prevalence and the latitude of the state capital is the same magnitude as that reported by Lockwood (1) for the correlation between statewide diabetes prevalence and statewide toxic air emissions. The correlations with the other variables are about the same size and are all statistically significant.
The conclusion is that to reduce the risk of diabetes a person should move to a northwestern state with a low population, whose state name is near the beginning of the alphabet—Alaska is a reasonable choice based on an unreasonable application of statistics. However, this application is not very different from the methods used by Lockwood.
I hope that this demonstrates that a highly significant correlation between two variables based on statewide data doesn’t show anything.
The Pearson correlations between statewide diabetes prevalence and selected state level variables, with associated significance levels
. | r . | P . |
---|---|---|
Latitude of the state capital | −0.54 | <0.001 |
Longitude of the state capital | −0.31 | <0.02 |
State population | +0.46 | <0.001 |
Numerical position of the alphabetized state list (i.e. Alabama = 1, Wyoming = 51) | +0.49 | <0.001 |
. | r . | P . |
---|---|---|
Latitude of the state capital | −0.54 | <0.001 |
Longitude of the state capital | −0.31 | <0.02 |
State population | +0.46 | <0.001 |
Numerical position of the alphabetized state list (i.e. Alabama = 1, Wyoming = 51) | +0.49 | <0.001 |
References
Address correspondence to Mark J. Nicolich, Statistician, 24 Lakeview Rd, Lambertville, NJ 08530. E-mail: [email protected].