Understanding the causes of diabetes is at the heart of most diabetes research. The field of causal inference has made dramatic advances in methodology that allows for causal inference from observational and not experimental data (1). One such methodological advancement is the use of genetic markers as instrumental variables, or so-called Mendelian randomization (MR) (2). MR relies on the laws governing genetic inheritance to gain insights into a modifiable risk factor’s causal effect on a disease through the use of genotype to address potential confounding or reverse causation.
Three crucial assumptions are required for traditional MR analyses (3). They are that 1) the genetic markers must be associated with the risk factor, 2) the genetic markers are independent of factors that may confound the risk factor and disease, and 3) the genetic marker is independent of the disease conditional upon the risk factor and confounding factors.
In this issue of Diabetes, Corbin et al. (4) use MR to explore the causal relationship between BMI and type 2 diabetes (T2D). A composite instrument was developed from 97 genetic variants identified in previous genome-wide association studies. Gene–BMI and gene–T2D associations taken from two independent samples were combined using three estimating approaches. Traditional MR analyses require assumptions, as specified in Fig. 1A, in order for estimates to be valid. If multiple genetic markers are used to create a composite instrument, as was done in this case, Corbin at al. argue that those assumptions are likely violated. Specifically, at least some of the genetic variants may either themselves affect T2D or be in linkage disequilibrium with other variants that affect T2D via pathways that do not involve BMI (Fig. 1B). Violations of this type are referred to as “horizontal pleiotropy” (5). It is an interesting sidenote that Fig. 1B also represents the scenario of gene–environment interaction, a situation that almost certainly exists in relation to the effect of BMI on T2D (6).
Three sets of causal assumptions for the estimation of the effect of BMI on T2D utilizing genetic markers (G) as an instrumental variable: so-called Mendelian randomization. A: Traditional assumptions. B: Relaxation of horizontal pleiotropy assumption. C: Time-varying scenario. Common causes of BMI and T2D (dotted lines) have been left off of C for readability but should be assumed present.
Three sets of causal assumptions for the estimation of the effect of BMI on T2D utilizing genetic markers (G) as an instrumental variable: so-called Mendelian randomization. A: Traditional assumptions. B: Relaxation of horizontal pleiotropy assumption. C: Time-varying scenario. Common causes of BMI and T2D (dotted lines) have been left off of C for readability but should be assumed present.
Corbin et al. (4) use three MR estimators, two of which are novel applications of methods that relax the horizontal pleiotropy assumption (7–9). The authors note that the three approaches are generally consistent in the sense that they return similar estimates of a positive effect of increased BMI on the odds of T2D. Through the use of sensitivity analysis, Corbin et al. investigate potential bias due to including genetic variants that exhibit pleiotropic effects. They nicely demonstrate the influence of two genes, TCF7L2 and FTO, on the three different estimates of the causal effect. Investigating biases resulting from violations of assumptions is an important step in any analysis (10) and one that warrants further exploration in this case.
The causal relationship between BMI and T2D is not a simple one. BMI and T2D are both time-varying factors that likely exhibit a complex time-dependent signature (Fig. 1C). As BMI changes over time, it both influences future BMI levels and T2D status and is influenced by past BMI and T2D status. This picture gets more complicated when the diagnosis of T2D and not just the presence of T2D is considered. While methodologists continue to work on the problem of causal inference in the presence of time-varying factors and in particular how MR can be used (11), it may be instructive to take a step back and ask, “What is intended to be estimated?”
The task of causal inference is reliant upon first defining what is meant by “the causal effect” (12). In the instance of BMI and T2D, one could ask many sensible questions. Some examples are: If everyone in the population was to reduce their BMI by 1 kg/m2 tomorrow, what would the burden of T2D be in 1 year? Or if everyone in the population was to reduce their BMI by 1 kg/m2 tomorrow, what would the burden of T2D be in 5 years? Or if everyone was to reduce their BMI by 1 kg/m2 tomorrow and remain at that new BMI for the remainder of their life, what would the burden of T2D be in 5 years? It seems reasonable to assume that the numeric answer to each of those questions would be quite different. Further, each of those questions would require different data collection strategies. Regardless of study design or estimation strategy, the meaning of a single estimate of “the causal effect” of BMI on T2D is semantically vague when so many causal effects are definable.
Valid causal inference requires a well-defined causal effect and the identification of data that can be used for the estimation of that effect (13). Judea Pearl (14) stated this idea succinctly when he summarized the practice of causal inference in three steps: “define first, identify second, estimate last.” While great advances have been made in leveraging instrumental variable estimators and the specification of the identification assumptions that underlie them, they provide little value if the causal effect is first not defined. In order to take the next step forward in understanding the risk factors causally responsible for T2D, it is imperative to take a step back. Important causal questions must be asked and translated into statistical targets first, prior to estimation. In some cases, published data may be sufficient to aid in estimation (15), but many will require more granular levels of detailed data such as longitudinal time series data. Novel methods, particularly those leveraging the growing genomic knowledge base in time-varying scenarios, are needed to fulfill the promise of elucidating the causal effect of BMI on T2D. For that promise to be fully realized in the future, the novel methods need to address well-defined causal questions and be paired with appropriately collected data.
See accompanying article, p. 3002.
Article Information
Duality of Interest. No potential conflicts of interest relevant to this article were reported.