By the turn of the 21st century it had become evident that type 2 diabetes has a strong genetic basis (1), but very little was known about the specific loci causing the disease. Sixteen years on, hundreds of loci have been discovered at which DNA variation and cardiometabolic traits robustly correlate (2). But, as we well know, correlation and causation are not synonymous, and there is much that must now be done to characterize the mechanisms through which genetic variants raise disease risk.
Although no single approach will be sufficient to achieve this, the use of advanced molecular assays has helped elucidate the key subphenotypes that underlie diseases such as type 2 diabetes. However, a key barrier to the application of these methods is that genetic effects are usually of small magnitude, making them difficult to estimate, and the simple solution of studying very large sample collections can prove prohibitively expensive. This has motivated studies that maximize power by selectively sampling biosamples or participants with starkly contrasting genetic characteristics from much larger cohorts and undertaking detailed phenotyping in these subsamples (Fig. 1). This approach, termed genotype-based recall (GBR) (3), can be considerably more powerful than randomly sampling participants for substudies of similar size and potentially more cost-effective than phenotyping the entire cohort.
In the U.K., for example, UK Biobank (N ∼500,000) (4), INTERVAL (N ∼50,000) (5), and National Institute for Health Research BioResource (N ∼20,000–100,000) (6) are recruiting participants in whom consent for GBR is obtained. These initiatives provide platforms for studies in which the phenotypes of people at the extremes of genetic risk distributions can be compared and contrasted, either by physically recalling participants for new studies or by deep-phenotyping their stored biosamples with state-of-the-art assays. This approach is especially appealing when the focus is on the effects of relatively rare genotypes, as genotype frequencies within the recalled subpopulation can be increased to match the frequency of the common genotype, provided the sampling frame is large enough. This approach is also attractive for structurally complex loci. For example, where genotype is defined by haplotype or copy number variation (CNV), the comparison would then be between contrasting haplotype or CNV groups.
Human amylase is an isoenzyme secreted from the pancreas and salivary glands. AMY1 encodes the salivary isoform of amylase, which catalyzes the initial step in dietary starch and glycogen digestion, and emerged in recent human evolution following duplication of an ancestral pancreatic amylase gene. The evolutionary factor postulated to have caused population-specific selection of AMY1 variants is dietary starch—historically abundant in agricultural societies subsisting in arid environments but less so in circumpolar and rainforest populations (7). AMY1 CNV, which correlates with salivary amylase protein concentrations, is more frequent in high-starch than in low-starch populations (7). Thus, positive selection for higher AMY1 CNVs is likely a consequence of pervasive dietary exposures throughout recent evolution.
Accordingly, the AMY1 locus represents a strong biological candidate for gene–diet interactions in diseases and traits linked to the metabolism of carbohydrates, such as obesity, diabetes, and dyslipidemia, which has motivated many association studies testing these hypotheses. Nevertheless, it remains unclear whether AMY1 CNV causally affects health, and a recent comprehensive and adequately powered assessment of multiple structural forms of the locus concluded that technical biases are the most likely explanation for many prior findings of association with obesity (8), and by extension with other diseases too. A further challenge is that AMY1 CNV correlates with ethnicity (7), which might cause relationships between AMY1 CNV and disease traits, specifically those that vary by ethnicity, to be confounded (population stratification). Hence, there is much work to be done to resolve whether AMY1 CNVs play a causal role in diseases related to carbohydrate metabolism.
In this issue of Diabetes, Arredouani et al. (9) report a GBR study (scenario A, Fig. 1) of AMY1 CNV and metabolism using stored biosamples from the French D.E.S.I.R. (Data from an Epidemiological Study on the Insulin Resistance Syndrome) cohort. Serum from a total of 100 young adult (aged 30–40 years), normal-weight (BMI 18.5–24.9 kg/m2) women were retrieved in equal numbers from participants predicted to carry <5 copies or >7 copies of the AMY1 CNV, within which metabolite profiles were determined. Total and salivary serum amylase protein concentrations were substantially lower in women with low versus high copy number. Long- and medium-chain fatty acid concentrations were lower and dicarboxylic fatty acids and 2-hydroxybutyrate concentrations were higher in samples from women with low copy number, which the authors postulate is indicative of glucose malabsorption following starch ingestion.
The study by Arredouani et al. (9) is a rare example of how statistical power and cost-effectiveness can be maximized in genetic association studies by undertaking GBR in a large, extant, and well-characterized bioresource. If the results of this study are correct, they provide important novel insights into the role of AMY1 variation in substrate metabolism. However, the study has significant limitations in that the genotyping method used to predict CNV may, as described elsewhere (8,10), have significant technical flaws; neither this limitation nor population stratification are addressed in the study. The authors also undertook group-based, rather than 1:1, matching to control for confounding; thus, residual confounding by the matched variables may persist. Moreover, should any of the matching variables be outcomes of both the AMY1 genotype and metabolite concentrations, there is a risk of collider bias.
The current study is intriguing, predominately because it is a rare demonstration of how GBR can be used, but these results need to be validated in other studies that account for the important limitations outlined above. As others have highlighted (8,11), reconstructing AMY1/AMY2 haplotypes, rather than merely counting CNVs, might help characterize variation at this locus more accurately; thus, future GBR studies might focus on haplotype-based rather than CNV-based recall. Such studies would also benefit from the physical recall of participants who subsequently undertake carbohydrate tolerance tests and diet interventions, as a study of this nature might unveil causal evidence of gene–diet interactions at this intriguing locus. A very small, published study of this nature (N = 7 low vs. 7 high AMY1 CNV carriers) provides preliminary evidence to support such an interaction.
See accompanying article, p. 3362.
Duality of Interest. P.W.F. has been a member of advisory panels for Eli Lilly and Sanofi and has received research support provided by Novo Nordisk, Eli Lilly, Sanofi, Johnson & Johnson, Servier, and Boehringer Ingelheim. No other potential conflicts of interest relevant to this article were reported.