We investigated the patterns and extent of linkage disequilibrium (LD) in the vicinity of the type 2 diabetes gene calapin-10 (CAPN10) in Mexican Americans, European Americans, African Americans, and Chinese Americans. We found that CAPN10 occurs within a single block of high LD and that LD decays rapidly outside of the gene. This reduces the likelihood that associations between CAPN10 polymorphisms and type 2 diabetes could be attributed to variation at some distance from CAPN10. We also consistently observed that cases have more extensive LD than control subjects and that cases from families with evidence for linkage have more extensive LD than cases from families without evidence for linkage. These observations further suggest that there are one or more relatively common alleles increasing risk of type 2 diabetes in this local region.
Linkage studies in Mexican Americans from Starr County, Texas, localized a susceptibility gene for type 2 diabetes to 2q37 (NIDDM1) (1,2), and subsequent positional cloning analyses suggested that calpain-10 (CAPN10; OMIM 605286) was the gene associated with type 2 diabetes. Individual studies have yielded mixed results, but pooled and meta-analysis of large numbers of case and control subjects (>5,000 each) have provided strong support for the involvement of two single nucleotide polymorphisms (SNPs), SNP-43 and -44, in affecting type 2 diabetes risk (3,4).
Studies subsequent to the original fine mapping and association study have typically genotyped only the three or four markers (SNP-43, -44, and -63 and Indel-19) showing prior association to type 2 diabetes, without regard for the underlying pattern of linkage disequilibrium (LD) in the region. An allele that causes an increase in disease risk can generate evidence of association (i.e., will be found more frequently in case than in control subjects), but even markers that do not directly affect susceptibility to disease may be associated with disease if the associated marker allele is in LD with the causal allele. LD is usually detectable only between alleles at tightly linked markers, and it decays proportionally to the recombination frequency between the sites. Thus, it is possible the 43–44-19–63 alleles and haplotypes in the CAPN10 gene might not be the variants affecting susceptibility to type 2 diabetes but may simply be in strong LD with the true nearby disease-causing locus somewhere else in the NIDDM1 region. To rigorously assess this possibility, we undertook a thorough examination of the patterns and extent of LD in the NIDDM1 region in a variety of populations.
The pattern of LD in 108 Mexican-American cases and a random sample of 112 control subjects, all from Starr County, Texas, is shown in Fig. 1. Over the densely typed portion of the 2q region surrounding CAPN10, two regions of high LD are discernable. The larger contains all of CAPN10 extending 5′ to include the 3′ end of RNPEPL1, and the smaller contains GPR35 extending 3′ of it. LD decays rapidly in both directions from CAPN10, clearly reducing the likelihood that associations between CAPN10 polymorphisms and type 2 diabetes could be attributed to variation at some distance from CAPN10. The high LD within CAPN10, however, suggests that it will be difficult to distinguish the variants at CAPN10 that affect risk of type 2 diabetes from those that are in strong LD. The observations on the association of SNP-44 with type 2 diabetes illustrate this problem. SNP-44 is in perfect LD (r2 = 1) with a missense mutation Thr504Ala (SNP-110) and several other polymorphisms in Mexican Americans and other populations that have been studied (5). Determining the unique relationship of each of these polymorphisms to the etiology of type 2 diabetes will require functional studies.
It was apparent from the beginning of our studies on LD in this region that cases had more extensive LD than control subjects. Figure 1 captures this subtle difference, as the proportion of pairwise comparisons with significant evidence for LD across the 70-kb region is significantly greater in case than control subjects (18.0 and 12.7%, respectively, Pearson’s χ21 = 7.686, P = 0.006). To localize the observation, we calculated two related measures of LD using a sliding-window approach. The mean significance of LD between the markers “within” the window captures local LD patterns, while the mean significance of LD value between any marker within the window and any marker “outside” of the window informs longer-range LD between the window in question and other nearby regions.
For the within-window significance of LD (Fig. 2A), cases generally have more LD than control subjects across the region, although this difference is significantly different only in the large intron (intron 12) between exons 12 and 13 of CAPN10. For the outside-window significance of LD (Fig. 2B), cases again generally have more LD than control subjects across the region, but for this measure, the difference is significant throughout much of CAPN10, including the regions containing SNP-44, SNP-43, Indel-19, and SNP-63. The maximal difference is again located in the intron 12. No bias in the number of pairwise comparisons was detected (i.e., no significant correlation exists between number of markers in the window and the case-control LD difference).
The sliding-window analysis of the cases partitioned by nonparametric linkage (NPL) scores (Fig. 3) shows an even greater difference between the two groups. The subset of cases from families with NPL ≥0.7 at NIDDM1 have more significant LD within the 10-kb windows in CAPN10 than cases from families with NPL ≤−0.7. The greatest difference between the two groups is again found in intron 12. That this difference in the extent of LD between case and control subjects is also observed within the cases alone when partitioned for the evidence of linkage allows us to rule out the possibility that the former observation is simply an artifact of different amounts of admixture in the case and control subjects, which is consistent with previous studies indicating that genome-wide microsatellite data do not reveal any population substructure in this population (6).
In each of these analyses, the largest LD difference occurs in the middle, or toward the 3′ end, of intron 12. None of the polymorphisms in this region yielded significant allelic associations between cases and the random sample or between cases partitioned for the evidence of linkage to NIDDM1 and random control subjects. Therefore, the LD signal observed here indicates something other than a simple allelic frequency difference between the two groups (either case versus control subjects or NPL ≥0.7 cases vs. NPL ≤−0.7 cases).
Multilocus LD mapping methods can provide increased power and resolution over single-locus methods in association studies often conducted for the fine-mapping stage of a positional cloning effort (7). Such methods have been applied to NIDDM1, implicating the same general region we highlight here (8). However, these approaches are quite computationally intensive. Since we could detect differences in the extent of LD between the case and random samples using even simple pairwise measures of LD, we examined additional approaches utilizing multiple markers simultaneously. LD unit plots are a particularly useful way to visualize and quantify differences in the extent of LD. Figure 4A summarizes the extent of LD in the same Mexican-American case and random samples for the 2q region using the LDMAP approach, which considers LD between all SNP pairs simultaneously. The LD unit scale on the y-axis increases as the extent of LD decreases and is a cumulative unit (as are units of genetic map distance, such as centimorgans) (9). The plot is characterized by a series of plateaus and steps. The former represent regions of high LD and low haplotype diversity (e.g., the CAPN10 LD block), and the latter are areas of low LD and high recombination. LD drops off dramatically in both directions from this CAPN10 LD block, indicating the 43–44-19–63 disease-associated polymorphisms are in high LD with markers within this block but are not in LD with other markers in the NIDDM1 region outside of CAPN10. Also, we again observe that cases possess dramatically more LD and therefore less haplotype diversity than the random samples. The significance (P = 0.032) of this magnitude in case-control LD unit difference was assessed by permuting the case and control labels.
The observations on LD in case and control subjects in the NIDDM1 region are, perhaps, not surprising given the very strong evidence for linkage in this region and the fact that case and control subjects are known to have significantly different distributions of haplotype frequencies in the local region. To determine whether our observations on the NIDDM1 region in Mexican Americans could readily be generalized to other populations, the same markers were typed in cases with type 2 diabetes and control subjects in a variety of populations with no reported evidence for linkage to type 2 diabetes in the NIDDM1 region. The pattern of LD across this region of 2q in these populations is in excellent agreement with that observed in similar populations in the HAPMAP (data not shown). Fig. 4B–E summarizes the extent of LD difference between case and control samples for each of these populations. Although the magnitude of the difference varies, all populations appear to cumulatively have more extensive LD in case than control subjects in this overall region, even though the loci have relatively modest effects on population-specific risk (insufficient to generate evidence for linkage in the population generally and insufficient in the population to generate significant evidence for association at any of the polymorphisms considered individually). Although no single population generated a significant case-control difference in LD units, collectively these observations are significant (P < 0.001) for the area between the case and control LD unit plots as assessed by permuting the case and control labels, even when the Mexican-American subjects are excluded (P = 0.034).
Our results are consistent with the hypothesis that there are one or more relatively common alleles increasing risk of type 2 diabetes in the region and that the increased frequency of these alleles in cases, relative to random or control samples, reduces the haplotype diversity in cases (a high proportion of the haplotypes are historically related to each other in the vicinity of the alleles increasing risk and thus increases the extent of LD). The difference in extent of LD between the case and random samples is readily quantified using simple pairwise measures of LD. The use of the LD units approach also clearly highlights CAPN10 as a region with more extensive LD in case than in control subjects in multiple populations. This difference in the extent of LD likely contributes to the localization of a susceptibility gene for type 2 diabetes to this region using more computationally intensive multipoint LD mapping methods (8). The results also suggest that it may be possible to exploit differences in the extent of LD to identify regions harboring susceptibility genes for complex phenotypes.
RESEARCH DESIGN AND METHODS
The Mexican-American case and random samples include individuals from the vicinity of Starr County, Texas. The 108 case samples represent one affected individual chosen at random from each affected sibpair from the original linkage and positional cloning studies (2,10). The unrelated random samples (n = 112) were collected from the same location, and the same random sample was used in the original linkage and positional cloning studies. The European-American, African-American, and Chinese-American case samples (n = 87, 66, and 89, respectively) and unaffected control samples (n = 73, 56, and 53, respectively) are from individuals residing in the San Francisco, California, area (11,12).
In the original studies in Mexican Americans (10), all polymorphisms were identified through resequencing 10 subjects; once the extent of LD in the region was noted, all polymorphisms with unique patterns and minor allele frequencies >0.2 were genotyped (also by resequencing) in the larger case-control set. Additional markers were chosen for the studies in other populations based on physical coverage and informativeness across the populations and were typed by direct sequencing and Taqman-based assays.
We calculated pairwise measures of LD ( D’ and r2) for 69 biallelic Indels/SNPs spanning 1.5 Mb in the Mexican-American case and random control subjects using GOLD (http://www.sph.umich.edu/csg/abecasis/GOLD/ ) and Haploview (http://www.broad.mit.edu/mpg/haploview/index.php ). We limited our investigation to polymorphisms that, for both the case and control samples, were 1) in Hardy-Weinberg equilibrium, 2) ≥0.10 in minor allele frequency, and 3) genotyped in ≥70% of the individuals. Some analyses were further restricted to include only those markers (n = 38; rs2975769 through UCSNP-31) meeting these criteria from the most densely typed 70-kb region spanning the 3′ end of RNPEPL1, CAPN10, and GPR35 (online appendix Table A1 [available at http://diabetes.diabetesjournals.org]).
Within this smaller region, we used sliding windows (10-kb window length sliding 1 kb) to compare the difference between mean LD values of the case and control subjects. Mean significance of LD (r2 = χ2/N) was calculated for all pairwise comparisons within a window and for all pairwise comparisons in which one marker was in the specified window and the other was outside the specified window. An identical approach was used to examine the LD patterns within the cases partitioned by the NPL score in the family of the affected case from the original genome scan. We focused on cases from families in which the maximum NPL score for the NIDDM1 region was ≥0.7 or ≤−0.7, as 0.7 is half the maximum possible NPL score for an affected sibpair.
LD was also compared between the case and control subjects in several additional ethnic populations using LDMAP (http://cedar.genetics.soton.ac.uk/pub/PROGRAMS/LDMAP ). These analyses were conducted using the full set of 69 markers in the Mexican Americans and with a set of 12 polymorphisms meeting the three criteria above for case and control subjects in all four populations studied (online appendix Table A1).
L.D.B.-P. is currently affiliated with the Instituto Nacional de Medicina Genómica, Mexico, D.F., Mexico. T.T. is currently affiliated with the Department of Medicine and Molecular Science, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan.
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournal.org.
This research was supported in part by U.S. Public Health Services Grants DK-20595, DK-47486, DK-47487, DK-55889, DK-58026, and HL-007605.