Guidelines for genetic and other ‘omic studies in Diabetes Care
Focus
Diabetes Care focuses on novel findings relevant to the clinical care of people with diabetes. While some of these findings might be translational in nature and not be ready for incorporation into clinical protocols, research contributions should have clear management implications, whether in the diagnostic or therapeutic realms. Studies solely describing cellular or animal models, or fundamental genetic discoveries related to gene regulation or function, might be best served in more basic journals such as Diabetes. On the other hand, pharmacogenetic studies, studies on the clinical course of monogenic or syndromic forms of diabetes, or those that use genetics or other ‘omics to predict clinical outcomes or to ascertain causality of a given exposure are welcome. Epidemiologic studies that include an ‘omic component and generate a testable hypothesis will also be considered. Otherwise, rigorous and well conducted studies that lack novelty may not receive as high a priority.
Sample size, statistical power, and multiple testing
Most complex traits are driven by multiple genetic and environmental factors, either as direct determinants or interacting with each other. The genetic effects of single loci tend to be modest, with only a few exceptions. Thus, to detect them in the context of the multiple hypotheses tested, large sample sizes are typically required.
In Diabetes Care, a Bayesian approach to statistical genetics is preferred. That is, the robustness of a finding is predicated on its prior probability of being real, in the context of the universe of possible hypotheses. For a generic common genetic variant (e.g., alternative/minor allele frequency >5%), the prior probability must be estimated within the ~1 million independent tests in the non-African genome, yielding the commonly accepted genome-wide significance threshold of P <510-8; this significance threshold may need to be adjusted for African genomes (where genetic variation is approximately two-fold larger) or for the much larger number of variants present at lower allele frequencies, increasingly captured via sequencing or well-powered imputation approaches. We note that this prior probability (the likelihood this specific association is real, from within all possible associations in the human genome) does not change whether the investigator examines a single variant or conducts a genome-wide association study (GWAS). A different correction should be employed for an exome-wide exploration (analyzing only protein-coding variants) or for gene-based tests (whether in genomic or expression datasets). Similar considerations should apply to metabolomic, proteomic or transcriptomic data.
The prior probability of a biomarker (or a set of biomarkers) may be higher when they have been demonstrated to be implicated in relevant physiology. For example, genetic variants associated with type 2 diabetes at genome-wide significance might be more likely to have an effect on risk of diabetic complications or response to glucose-lowering therapies. Similarly, a polygenic score constructed of variants that have met a stringent statistical threshold (in essence, capturing the overall genetic burden for that trait) could be tested for association with a related outcome as a single hypothesis. This hypothesis-testing approach could assume a significance threshold of 0.05 (for a single variant or polygenic score), although the significance threshold may require adjustment for the number of strata or outcomes examined.
As a general rule, genetic studies submitted to Diabetes Care (whether genetic association studies, phenome-wide association studies, or those studies reporting analyses of transcriptomic, miRNA, metabolomic or epigenomic datasets) should account for multiple testing, describe the method applied, and de-emphasize findings that are only nominally significant (at an uncorrected threshold of P <0.05), while highlighting what findings have exceeded experiment-wide statistical significance.
Replication and reproducibility
Diabetes Care aims to publish results of clinical relevance; thus, it is critical that reported findings generate confidence. The robustness of the results can be supported by stringent statistical thresholds, converging orthogonal lines of evidence, or independent replication. Authors should make every effort to identify a venue for replication of their findings. In exceptional cases, hypothesis-generating exercises of sufficient clinical impact may be considered, in the hope that other investigators who have access to adequate datasets may be able to pursue confirmation of the original observations.
“Big data” explorations (including but not limited to genomic information) are becoming increasingly sophisticated. A basic tenet of the scientific method postulates that a hypothesis must be “falsifiable”; that is, other investigators should be able to test validity by reproducing the experiment. For a clinical or translational journal such as Diabetes Care, analyses of multidimensional big data should aim to be reproducible by others, interpretable in the context of known physiology, and actionable in the clinical space.
Context
Science does not occur in a vacuum. Novel findings should be evaluated within the landscape of the existing evidence. Genetic variants tested for association with metabolic phenotypes should be presented in light of what is already known about them. For instance, an association with 2-hour glucose in an oral glucose tolerance test performed in 60 participants may be seen as a false positive result if it is completely null in a GWAS dataset of 6,000 individuals. Publicly available data (e.g., in resources such as NIH’s dbGaP, the European Genome-phenome Archive (EGA), or the Accelerating Medicines Partnership Common Metabolic Diseases Knowledge Portal should be mined to obtain evidence on a given hypothesis. Corroborating analyses in available datasets (such as the UK Biobank or the AllofUs initiative) should be considered.
Negative studies
We aim to counteract publication bias. Submission of rigorous negative studies is welcome, as they might indicate the futility of pursuing a specific hypothesis and/or support alternative research approaches; however, negative studies need to be statistically powered to test an effect of a prespecified magnitude. Power calculations will be typically required and authors should refrain from concluding that no such effect exists; rather, authors should state that if such an effect is real, its magnitude is likely to be below what can be detected in the current study.
Mendelian randomization
Using genetic variants as an instrumental variable that may help infer the causality of a given exposure (e.g., a biomarker or an endophenotype) on a clinical outcome has gained popularity as an accessible approach that can be easily implemented in large datasets. While attractive and useful, the method poses a number of challenges in interpretation; thus, appropriate caveats should be implemented. These caveats include accounting for potential confounding introduced by population stratification or linkage disequilibrium as well as controlling for possible pleiotropy via accepted methods (e.g., MR-Egger regression or the weighted median estimator). The novelty and clinical implications of the findings will be taken into account. Use of Mendelian randomization or another genetic approach to confirm prior knowledge may not be sufficient for publication in Diabetes Care.
Dissemination and data sharing
It is expected that authors who submit their work to Diabetes Care will be willing to share the entirety of their datasets with the scientific community. In cases where individual-level data cannot be shared widely, the authors should specify the reasons for this restriction and, as a minimum, provide summary-level data to be posted.
Generalizability
To date, the preponderance of ‘omic data has been collected in populations of European descent, hindering generalizability to the majority of the world’s population and potentially perpetuating health disparities. To the extent possible, manuscripts should report research conducted in diverse populations. When participant diversity is not possible or not advisable, a defensible explanation should be provided.