A Guide for Selection of Genetic Instruments in Mendelian randomisation (MR) studies of Type-2 diabetes and HbA1c: towards an integrated approach

This study examines the instrument selection strategies currently employed throughout the type-2 diabetes and HbA1c MR literature. We then argue for a more integrated and thorough approach, providing a framework to do this in the context of HbA1c and diabetes. We conducted a literature search for Mendelian randomisation studies that have instrumented diabetes and/or HbA1c. We also used data from the UK Biobank (N=349,326) to calculate instrument strength metrics that are key in MR studies (the F-statistic for average strength and R2 for total strength) with two different methods (‘Individual-level data regression’ and Cragg-Donald formula). We used a 157-SNP instrument for diabetes and a 51-SNP instrument (as well as partitioned into glycaemic and erythrocytic) for HbA1c. Our literature search yielded 48 studies for diabetes and 22 for HbA1c. Our UKB empirical examples showed that irrespective of, the method used to calculate metrics of strength and whether the instrument was the main one or was partitioned by function, the HbA1c genetic instrument is strong in terms of both average and total strength. For diabetes, a 157-SNP instrument was shown to have good average and total strength, but these were both substantially smaller than those of the HbA1c instrument. We provide a careful set of five recommendations to researchers who wish to genetically instrument type-2 diabetes and/or HbA1c. MR studies of glycaemia should take a more integrated approach when selecting genetic instruments and we give specific guidance on how to do this.

Broadly, criteria for instrument selection (which are intrinsically linked to the core assumptions underlying MR - Fig. 1) include: i) ensuring that there is no sample overlap between the samples used in the discovery genome-wide association study (GWAS) and the data under analysis, as this helps minimise bias arising from "winner's curse" and the use of weak instruments - (67); ii) selecting independent variants from the latest and largest GWAS for the exposure (at a threshold of p<5*10 -8 ); iii) choosing variants based on the amount of variance explained in the exposure (R 2 ); iv) selecting variants on the basis of biology and function; and v) deciding whether variants for a continuous, or a binary exposure are more appropriate. However, often prioritised in glycaemic MR studies are i), ii) and perhaps iii), but the remainder are not always taken into consideration. In relation to ii, we argue that bigger is not always better, as the greater the number of genetic variants, the more we increase our chances of including pleiotropic variants. This directly violates a core MR assumption (no horizontal pleiotropy: that variants for the exposure should not be associated with common confounders or directly with the outcome under study but should only associate with the outcome via the exposure being instrumented) (60). A balance is needed between including sufficient genetic variants to enable well-powered analyses, but not so many that pleiotropy is inevitable.
Currently few, if any journals, demand a clear explanation for choice of genetic instrument. While some determinants of choice, such as overlap with genetic instrument derivation GWAS, variant function and whether the trait is continuous or binary, may be gleaned from the manuscript without being explicit, key statistical characteristics, specifically R 2 and F, which may make a major contribution to the power of an MR analysis, are not. Here the R 2 is the amount of variance in the exposure that is accounted for by the selected genetic variants and generally when it comes to the R 2 , the larger the better, as this will directly contribute to the power of an MR analysis. The F-statistic provides information about the average strength of a genetic variant for the exposure of interest. An F of >10 indicates that substantial weak instrument bias is unlikely (1/F of the bias from the observational estimate) (68). Weak instrument bias is of concern in MR studies, as weak instruments can bias MR estimates towards the confounded observational estimate (68) and thus, results are not as robust as with a strong instrument. Therefore, our overall objectives were to understand instrument selection approaches currently used in MR studies of diabetes and HbA 1c , to present why we need integrated Garfield  Europe PMC Funders Author Manuscripts approaches (described below) for this and provide a framework for how this can be done in practical terms. Our specific aims were:

1.
Conduct a literature search for MR studies that have instrumented type-2 diabetes and/or HbA 1c to understand which exposure is instrumented more frequently and whether they report metrics of instrument strength.

2.
Argue for the use of integrated approaches for the selection of HbA 1c and type-2 diabetes genetic instruments, with recent examples from the MR literature.

3.
Use empirical examples to compare the total and average strength of an HbA 1c genetic instrument (including partitioned by function) with a type-2 diabetes instrument to show that an HbA 1c instrument may be superior.

4.
Provide an overall framework for how to best select instruments for HbA 1c and type-2 diabetes in an MR setting, considering 1 and 2.

Type-2 Diabetes Instruments for Use in Mr Studies
Here we highlight recent examples from the MR literature which have used HbA 1c and/or diabetes genetic variants in MR studies, in what we are naming "an integrated approach". An integrated approach to genetic instrument selection is one that considers factors which are sometimes overlooked in MR studies of glycaemic traits. These include: the use of novel approaches, such as for example that of Burgess and colleagues(57) described here; more careful consideration of which exposure GWAS is used; where possible prioritising instrumentation of a continuous rather than a binary exposure; and finally, ensuring that both the variance explained (R 2 ) and measures of instrument strength (F-statistic) are always calculated and presented.
Example A. Published MR study of glycaemia and coronary heart disease using an integrated approach to HbA1c genetic instrument selection A recent MR study by Burgess and colleagues (57) used HbA 1c genetic variants to investigate associations between genetically-instrumented glycaemic status and incident coronary heart disease. The authors used a novel approach to genetic instrument selection: they took 40 independent HbA 1c SNPs based on their associations with diabetes at genomewide significance from a recent GWAS (64) and their association with HbA 1c in the 2017 MAGIC GWAS by Wheeler et al. (61). They then calculated a weighted allele score for each individual in their data (UK Biobank) whereby they multiplied each diabetes risk-increasing allele dosage by the SNP's HbA 1c beta coefficient from the MAGIC GWAS. By doing so, the authors ensured that their allele score reflected average blood glucose levels, as opposed to only HbA1c or risk of diabetes. This also relates to our earlier point about selecting instruments based on biological function. Corresponding metrics for their instrument were F=144.5 and R 2 =0.018, indicating that although they had fewer variants, this was a strong instrument, both in terms of total (R 2 ) and average strength (F-statistic) and thus, carried a low risk of weak instrument bias.
Example B. Published MR study of glycaemia and cognitive/brain health As mentioned earlier, an assumption that is often made when approaching genetic instrument selection in MR studies is that 'bigger is better'. Therefore, researchers are likely to take as many SNPs (genome-wide significant and independent) as possible from the largest and latest GWAS. However, our own recently published MR study shows that this is not necessarily the case (69). We instrumented diabetes using both a 157-and 77-SNP genetic instrument, as we needed to try to mitigate issues of sample overlap between the GWAS for the exposure and the data under study (both UKB). Therefore, we took the 157 diabetes SNPs included in our instrument and looked them up in an older diabetes GWAS from 2014 (70). We found 77 of the diabetes SNPs (reduced number could be due to differences in coverage of imputation panels, for example) and observed that although this was an older GWAS in a different and smaller sample, the log(betas) for each SNP were comparable, even though most of the variants did not reach conventional genomewide significance (p<5*10 -8 ). When we calculated the average strength (F-statistic) of our 77-SNP instrument and compared this with the 157-SNP F-statistic they were 31 and 27, respectively. This indicates that an instrument with more genetic variants is not necessarily better in terms of average strength and the greater the number of variants, the greater the likelihood of including pleiotropic variants.
That a greater number of SNPs is not always better is also supported by recent MR studies that have instrumented body mass index (BMI) (71). The authors used an 'older' instrument containing 96 BMI SNPs performs well and therefore, it is perhaps unnecessary to always use an instrument with hundreds of SNPs. Larsson and colleagues showed that this BMI instrument explained 1.6% of the variance in BMI and had an F-statistic of 61 (71), while another recent MR study that instrumented BMI to understand its association with chronic kidney disease (CKD) used a 773-SNP instrument, which explained ~6% of the variance in BMI but only had an F-statistic of 23.6 (72). It is important to note that when selecting a genetic instrument for an MR study we need to balance these metrics against one another. This is because an instrument with more genetic variants has a larger R 2 (total strength) and more power but is also more likely to include pleiotropic variants which could lead to violation of a core MR assumption. An instrument with a larger R 2 usually has a lower F-statistic (average strength) which, if <10 will carry a greater risk of weak instrument bias.

Literature search for Mendelian randomisation studies that instrument type-2 diabetes and/or HbA 1c
We were interested in how many studies have instrumented HbA 1c and type-2 diabetes to date, whether there is a preference for one over the other and whether they report metrics of instrument strength. Thus, we conducted a literature review in PubMed up until March 2021 (for details of our search terms and strategy see Supplemental Material S1) of MR studies that instrumented these exposures. We excluded anything that was not a research article, i.e., conference abstracts, letters, editorials, reviews, opinion pieces and commentaries. Studies that evidently did not instrument HbA 1c or type-2 diabetes were not included. Supplementary Material Tables 1 and 2 list all the studies for diabetes and HbA 1c , respectively, that were included.

Empirical examples in UK Biobank (UKB): Calculation of total (R 2 ) and average strength (F-statistic) metrics for HbA 1c and type-2 diabetes instruments
The aim of these empirical examples was to show the reader that, a) calculating (R 2 and) F-statistic metrics as part of an MR study is important to understand both the total and average strength of the instrument of choice and b) irrespective of whether individualor summary-level data are used for an MR study, options for obtaining these metrics are available. We chose two approaches as there has not been any quantitative comparison of how they perform for glycaemic instruments when considering both the R 2 and F-statistic. These methods are: 'Individual-level data regression' and Cragg-Donald F-statistic. Statistical analyses-Selection of type-2 diabetes and HbA 1c genetic instruments For both phenotypes, we used previously-described genetic instruments (69). Briefly, for type-2 diabetes the genetic instrument comprised 157 single nucleotide polymorphisms (SNPs) from a 2018 GWAS of European ancestry (74), while the 51-SNP HbA 1c instrument came from a 2017 trans-ethnic GWAS (61). We filtered SNPs on minor allele frequency (>0.01), used LD clumping in PLINK and p<5*10 -8 (69). For HbA1c we also partitioned the instrument into 16 glycaemic SNPs and 19 erythrocytic SNPs (the remainder are unclassified, as per the 2017 GWAS) separately with the aim of testing whether the HbA 1c instrument is strong in terms of both average (measured by the F-statistic) and total strength (measured by the R 2 ) when using all the SNPs, as well as when we partition it by biological function. Similarly to our previously published MR study of glycaemia and brain health/ cognition/dementia outcomes, we suggest that it is worth doing three things when using an HbA1c genetic instrument: i) perform MR using all of the HbA 1c SNPs, ii) perform MR using only the glycaemic SNPs, iii) perform MR using only the erythrocytic SNPs.
Calculation of the F-statistic as a measure of average instrument strength and the R 2 as a measure of total strength-'Individual-level data regression approach': this approach involves fitting a multivariable linear regression between SNPs and the exposure (treated as an outcome y here), where the relationship between the j-th SNP and the outcome y is evaluated while holding all the other SNPs constant. In the regression equation below β 0 represents the constant and ε the residual or error term.
As with any multivariable regression the output includes the F-statistic and R 2 , which conventionally indicate the model fit and, in this case, we are likely to not be concerned with the interpretation of the coefficients of each SNP on the exposure. Linear regression can also be used when the exposure is binary (e.g., in this case, we used it for genetic liability to diabetes), whereby the coefficients and statistics represent associations on an absolute scale rather than a relative risk or odds ratio scale. Therefore, here we calculated R 2 and the F-statistic for liability to diabetes using linear regression.
The formula is thus: Cragg-Donald F-statistic formula: this method uses the Cragg-Donald F-statistic formula provided in the paper by Burgess and colleagues (68) which requires a value for R 2 (previously calculated R 2 values were 0.028 and 0.030 for HbA 1c , and 0.015 for diabetes), k (number of SNPs= 51,275 and 157) and n (349,326). For consistency and comparability, we kept the R 2 , k and n the same as in the 'Individual-level data regression' approach above.
Above, we were able to calculate the R 2 , but it is sometimes the case that GWAS authors provide the R 2 for the top SNPs which could then be used in this formula.

Literature search results
Our searches yielded a total of 657 studies for diabetes, of which 609 did not instrument this phenotype and thus 48 remained. For HbA 1c , we found a total of 77 articles, of which 55 did not instrument HbA 1c and were excluded, leaving 22 articles. From this literature search it was clear that many more studies currently choose to instrument type-2 diabetes over HbA 1c .

Results of F-statistic (average instrument strength) and R 2 (total instrument strength) HbA 1c 51-and 275-SNP instrument and partitioned glycaemic/erythrocytic instruments
As per Table 1 below, using 51 and 275 HbA 1c SNPs in UKB, the 'Individual-level data regression' and Cragg-Donald formulae gave similar F-statistics (using the same R 2 values of 2.8% and 3%). The two methods yielded somewhat different F-statistics for the 16-SNP glycaemic instrument, but both were substantially larger than 10, indicating no cause for concern (Table 1). For the 19-SNP erythrocytic instrument the F-statistics obtained using both methods were comparable (Table 1). Garfield Table 1 presents F-statistics and R 2 metrics using both methods. Results were comparable irrespective of which formula was used (with the same R 2 of 1.5%).

Which approach should I use in my study?
The 'Individual-level data regression' approach naturally requires individual-level data for the exposure of interest, which are not always available to researchers. The Cragg-Donald formula, however, relies on having information about the R 2 which could come from the published GWAS for the exposure, yet this is not always included in GWAS papers. The 't-statistic' approach can be used to calculate the F-statistic when the R 2 is not known if betas or log(betas) and standard errors are provided in the summary-level GWAS exposure dataset. Thus, if individual-level data are available then the 'Individual-level data regression' may be recommended, but if this is not the case then the Cragg-Donald formula can be used.

Consideration of total and average instrument strength for HbA 1C and type-2 diabetes
Across our empirical examples in the UK Biobank, the HbA1c instrument outperformed that for type 2 diabetes, in terms of total strength (R 2 ) and average strength (F-statistic) even though it contained markedly fewer SNPs. Specifically, the 16-SNP glycaemic instrument had the highest average strength and explained 1% of the variance in HbA 1c , which is lower than the 2.8% variance explained for the 51-SNP instrument, but certainly still appropriate for use in MR. The type-2 diabetes 157-SNP instrument had a much smaller F-statistic (F<30) in UKB overall and explained around 1.5% of the variance in diabetes in UKB.
On the other hand, the HbA 1c erythrocytic instrument also demonstrated that it is more than adequate for use in MR studies, with a similar R 2 to the glycaemic variants and an F value of just under 200. Therefore, whether it is partitioned into glycaemic and erythrocytic or not the HbA1c genetic instrument with 51 SNPs is overall, a strong instrument for use in MR studies, as indicated by both R 2 and F-statistic metrics, even in comparison to the newer 275-SNP HbA 1c instrument. However, the type-2 diabetes instrument appears to be somewhat weaker both in terms of total and average strength, when compared to the HbA1c genetic instrument(s).

Potential recommendations for MR studies instrumenting diabetes and/or HbA 1c
First, as demonstrated in our empirical examples and argued above, 'bigger is not always better' when it comes to selection of instruments for glycaemic MR studies. Above we show that in some cases glycaemic instruments with fewer SNPs may be stronger and thus, more robust for use in MR when it comes to trying to minimise the important issue of 'weak instrument bias'. This is the case for both HbA 1c and diabetes, with the HbA 1c instrument being superior. We therefore recommend that researchers do not assume that the latest and largest GWAS will always yield the best genetic instrument for these exposures and that careful consideration should be given to which GWAS is selected for the exposure. Genetic variants identified in older GWA studies may of course also be pleiotropic. Thus, researchers might choose to empirically test this in their MR study by for example, performing a Phenome-Wide Association Study (PheWAS). However, it is important to note that instrument selection will likely have to balance choosing an instrument with a larger number of genetic variants (greater R 2 =total strength), but potentially with smaller average strength (lower F-statistic). When prioritising the former, it is more likely that the instrument will include pleiotropic variants, which violates a core MR assumption. If the latter is prioritised it is possible that the total instrument strength may be weakened, as fewer variants often yield a larger F-statistic, but with lower variance explained in the exposure (R 2 ). However, it is also important to note that more variants provide opportunities to run more robust methods, including common sensitivity analyses such as the MR-Egger test. For the HbA 1c instrument exemplified above in the UKB cohort, however, when we partitioned by glycaemic vs. erythrocytic variants the R 2 remained at 1% for a small number of SNPs. Therefore, this example is a demonstration of an integrated approach that considers the total and average strength of the instrument, alongside biological function of the variants. In addition, another way to avoid pleiotropy is to use an approach such as that of Luo and colleagues (75), who adjusted for erythrocytic properties to control for unknown sources of pleiotropy.
Second, to reiterate the recommendation made by Boef and colleagues in 2015, and the more recent STROBE-MR guidelines (66), authors of MR studies should calculate and report the F-statistic for the association between their genetic instrument and the exposure of interest in their study. As demonstrated earlier, this can be calculated using one of three approaches, depending on whether researchers have access to individual-level data or not. If individual-level data are available for the exposure of interest, then researchers should likely prioritise calculating the F-statistic using the 'Individual-level data regression' approach. If individual-level data are not accessible, but the exposure GWAS paper provides the R 2 for the (exact) instrument that is being used, then we recommend using the Cragg-Donald F-statistic method. An additional method exists, namely the 't-statistic' method, which we did not implement here. This is because the 't-statistic' method (F= β 2 /SE 2 ) can be used when the R 2 is not known (i.e., not provided in the paper for the GWAS for the exposure). In this equation, β represents the coefficient for each SNP's association with the exposure and SE its standard error. Using the 't-statistic' method the obtained F-statistic will be more of an approximation because it uses the discovery GWAS (usually for the exposure) sample size, rather than that of the outcome dataset.
Third, and related to our earlier point, there are some complex issues surrounding genetic instrumentation of binary disease exposures such as diabetes (76,77). When instrumenting these types of disease exposures, it is important to note that we are modelling an underlying continuous measure where liability thresholds are used to separate individuals into different categories (76,78) and we should thus, interpret MR using binary exposures in terms of genetic liability (78). If MR instrumental variable assumptions are met for the underlying continuous exposure which is used to categorise individuals, then we assume that we can infer causality using the binary exposure (76). However, there may be circumstances in which researchers feel the need to genetically instrument diabetes itself as it may prove to be clinically informative. We would still recommend that researchers interested in how hyperglycaemia might causally impact a range of important health outcomes, take advantage of what is evidently a strong HbA 1c instrument. This instrument is currently Europe PMC Funders Author Manuscripts underused, as we found only 22 studies that used it as an exposure in MR studies and thus, we recommend that researchers exploit this instrument to a much greater extent. Also, the MAGIC Consortium GWA studies do not include UKB making this instrument very attractive for use in two-sample MR studies of HbA 1c and important health outcomes. In terms of instrument metrics, our applied example in UKB data clearly showed that the HbA 1c instrument completely outperformed the diabetes instrument. The HbA 1c instrument can also be split by biological function, into erythrocytic and glycaemic SNPs, as shown above in our examples. Genetic instrumentation of a continuous exposure such as HbA 1c also enables the application of non-linear MR methods (79), which are also somewhat underused in MR. Using non-linear MR methods can help define levels of risk and may also aid in understanding that it is both low and high levels of HbA1c that are associated with risk. While understanding the causal impact of disease status (e.g., diabetes) on a range of outcomes is both interesting and important, it is well established that continuous measures are superior and should be used where possible.
Fourth, we recommend that where plausible, researchers may adopt an instrument selection approach such as that of Burgess et al (57)  Fifth, another example of an integrated approach to instrument selection is provided in Example B above, in which we sought to bypass the issue of sample overlap in our previous MR study. To try to mitigate this we took as many of the newer diabetes variants as possible (from a more recent GWAS, but that contained overlap with our data under study) and used the effect estimates from the earlier GWAS. The most popular approach to instrument selection is to naturally take the most recent, largest GWAS (which often includes UKB), due to assumptions that the benefits (e.g., large number of genetic variants) outweigh the risks (e.g., sample overlap). However, we show that a diabetes instrument with 77 SNPs had a larger F-statistic (average strength) indicating that if anything, this instrument carried a lower risk of weak instrument bias compared to our original 157-SNP instrument.
While our paper focuses on genetic instrument selection for MR studies of HbA 1c and/or liability to diabetes, we acknowledge that as a method, MR has limitations and is not a panacea for causality. As such, triangulation of findings is crucial whereby different study designs are employed to be enable robust causal statements. Key limitations of MR include confounding by ancestry, confounding by linkage disequilibrium (LD), confounding by horizontal pleiotropy and canalisation (82). Confounding by ancestry, or population stratification, refers to the fact that allele frequencies of common genetic variants, as well as disease frequencies, may differ by population. However, it is now common to adjust for genetic principal components in MR studies to correct for residual confounding by population structure. Confounding by LD refers to when the selected genetic variant(s) is/are in LD (i.e., correlated with) another genetic variant associated with the outcome under study, Garfield et  Europe PMC Funders Author Manuscripts which may produce a confounded causal estimate. Confounding by horizontal pleiotropy is when a single genetic variant influences the outcome under study directly, rather than via the exposure being instrumented. However, numerous methods have been developed to detect and correct for horizontal pleiotropy (83). Canalisation is when an individual develops a compensatory mechanism for disruptive genetic or environmental influences, as a response to higher or lower levels of a risk factor (e.g., higher, or lower body mass index).

Conclusions
In summary, we recommend that MR studies of glycaemia take a more integrated approach when it comes to selection of genetic instruments. Therefore, careful consideration should be given to the following: i) whether novel approaches such as those described here from the literature might be used; ii) which GWAS is used to select the instrument for the exposure; iii) whether a continuous, as opposed to a binary exposure can be instrumented; iv) inclusion of both variance explained (R 2 =total strength of the instrument) and the F-statistic (average strength).

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.