A recent Perspective in Nature issued a call for more transparency in the reporting of preclinical research (1). Although this article focused primarily on experimental design, it emphasized the need for improved reporting in the scientific literature. Within the context of preclinical studies, there have been discussions regarding the appropriate reporting of standard error (SE) and standard deviation (SD) (2–5); however, despite the recommendations, opportunities remain to improve upon the reporting of these statistics in the literature.
To first set the stage for the distinction between SD and SE, we start with the similarities. Both SD and SE measure variability or, informally, “spread.” As such, both statistics give a numerical summary of variability. Given this, how does one distinguish SE from SD?
The distinction is that one summarizes the variability of data and the other describes the variability of an estimated quantity. Let us consider the latter scenario first. It is readily apparent that if an experiment were to be replicated, then slightly different values would be observed. On average, one would expect to have similar numerical summaries (e.g., approximately the same mean across samples), but some sample-to-sample variation would occur. This type of variation is summarized by SE. Specifically, SE quantifies variability in sample estimates, which are oftentimes means but also can be estimated regression parameters, correlation coefficients, or another value. What is important to consider, however, is that SE can be estimated from a single sample. For example, the SE of the mean formula is commonly known to be SD/√n. This introduces the concept of what is SD.
SD is a measure of spread in data about the mean, as opposed to the variability in an estimated summary of data. When one wants to summarize the variability in data, whether it is sample characteristics or response patterns, SD should be used. When one seeks to summarize the variability, or precision, in an estimated quantity, such as the mean response for a particular experimental condition, SE or a function of the SE should be used.
The standard practice of reporting mean ± SE is problematic from several statistical and conceptual perspectives. When these summary measures are used in the standard bar chart with error “whisker,” the presentation is actually consistent with a 68% CI. Practically, this misrepresents the precision of the estimated mean because the 68% CI is at least half the width of the more generally accepted 95% CI. Furthermore, reporting mean ± SE does not allow for ad hoc comparisons between groups because the confidence coefficient (the multiplier of the SE used for the creation of the CI) varies based on sample size and distributional assumptions. Therefore, it is recommended that an estimated summary be accompanied by a 95% CI in text and graphical displays when one wants to describe the precision of the estimate. Of course, when there is a priori (planned) justification for a level of significance other than α=0.05 (6), the prespecified level of significance can be used for the CI. Rarely would one expect α= 0.32 (i.e., P < 0.32) to be used for such a level of significance.
In summary, when reporting the characteristics of a sample to express the variability in the observed values, the SD should be used. SE should be reported only when reporting the variation of estimated quantities. Even then, it has been suggested that SE should be multiplied by a confidence coefficient to produce a CI to allow for more robust statistical comparison of the reported data. The combination of SDs and CIs should be the preferred statistics reported in the literature to give the reader a clear impression of the variability of the observed values and precision of estimated summaries of the data, respectively.
This work was supported by grant number UL1 TR000135 from the National Center for Advancing Translational Sciences.
No potential conflicts of interest relevant to this article were reported.
The contents of this letter are solely the responsibility of the author and do not necessarily represent the official views of the National Institutes of Health.