We have identified a region on chromosome 1q21-q24 that was significantly linked to type 2 diabetes in multiplex families of Northern European ancestry and also in Pima Indians, Amish families, and families from France and England. We sought to narrow and map this locus using a combination of linkage and association approaches by typing microsatellite markers at 1.2 and 0.5 cM densities, respectively, over a region of 37 cM (23.5 Mb). We tested linkage by parametric and nonparametric approaches and association using both case-control and family-based methods. In the 40 multiplex families that provided the previous evidence for linkage, the highest parametric, recessive logarithm of odds (LOD) score was 5.29 at marker D1S484 (168.5 cM, 157.5 Mb) without heterogeneity. Nonparametric linkage (NPL) statistics (P = 0.00009), SimWalk2 Statistic A (P = 0.0002), and sib-pair analyses (maximum likelihood score = 6.07) all mapped to the same location. The one LOD CI was narrowed to 156.8–158.9 Mb. Under recessive, two-point linkage analysis, adjacent markers D1S2675 (171.5 cM, 158.9 Mb) and D1S1679 (172 cM, 159.1 Mb) showed LOD scores >3.0. Nonparametric analyses revealed a second linkage peak at 180 cM near marker D1S1158 (163.3 Mb, NPL score 3.88, P = 0.0001), which was also supported by case-control (marker D1S194, 178 cM, 162.1 Mb; P = 0.003) and family-based (marker ATA38A05, 179 cM, 162.5 Mb; P = 0.002) association studies. We propose that the replicated linkage findings actually encompass at least two closely spaced regions, with a second susceptibility region located telomeric at 162.5–164.7 Mb.
Type 2 diabetes (MIM125853) likely encompasses a diverse set of diseases marked by elevated levels of plasma glucose. Among Caucasian populations, individuals with type 2 diabetes, individuals with the intermediate phenotype of impaired glucose tolerance, and likely individuals at risk of diabetes are all characterized by variable degrees of both decreased insulin action, particularly resistance to insulin-mediated muscle glucose uptake, and impaired insulin secretion in response to that decreased insulin action (1). Defects of both insulin action and insulin secretion among individuals with normal glucose tolerance predict later onset of diabetes (2). Despite the diverse phenotypic nature of type 2 diabetes, monozygotic and dizygotic twin studies, family studies, and marked differences in disease prevalence across populations all provide convincing evidence for an important role of genetic susceptibility loci in type 2 diabetes pathogenesis (1). Based on epidemiological data, the total sibling relative risk (λs) has been estimated at 3–4 (3), although the number of loci that contribute to this risk is unclear.
Based on these data supporting type 2 diabetes susceptibility genes, genome scans for both type 2 diabetes and type 2 diabetes-related traits have been undertaken by multiple laboratories in Caucasian, Pima Indian, African-American, and Asian populations (1,4,5), among others. These scans have identified possible susceptibility loci throughout the genome, but to date only the NIDDM1 locus on chromosome 2q in Mexican-American subjects has been mapped to a single gene, the calpain 10 gene (6). Calpain 10 plays a small role in most other populations, however, and has been inconsistently replicated by linkage and association. Other regions with evidence for replication include chromosome 12q (7–9) and chromosome 20 (10–12). A region on chromosome 1q21-q23 was identified independently among Pima Indian sib-pairs discordant for type 2 diabetes or Pima Indian sib-pairs with onset of diabetes before age 25 years (13) and in studies from our laboratory of 42 multiplex kindreds of Northern European ancestry ascertained in Utah (14). Subsequent studies in French families (15), English sib-pairs (16), and Amish families (17) and in preliminary studies of Chinese sib-pairs (18) have identified linkage of type 2 diabetes to this same region, very near the original Pima and Utah linkage peaks. Furthermore, this region was linked to HbA1c in the Framingham Offspring Study (19), to metabolic syndrome traits in nuclear families from Hong Kong (20), and to the possibly related phenotype of familial combined hyperlipidemia (21,22). Given the difficulty in replicating linkage in complex diseases, the finding of diabetes and related traits in at least 10 studies from diverse populations is striking. However, the exact map location of the linkage peaks, the specific trait or disease definition for the study, and the subgroup providing the evidence for linkage differs among studies.
In previous studies from our laboratory (23), the most significant linkage peak (logarithm of odds [LOD] = 4.3) was found using pedigrees trimmed to fit into the Genehunter program (24) under a partially penetrant recessive parametric model. The linkage peak was quite broad, with a 1 LOD CI that extended from between D1S305 and CRP to D1S196, or ∼20 cM. A similar location, albeit with lower significance, was identified with both sib-pair analysis (Mapmaker/Sibs) and nonparametric linkage (NPL). Studies in Pima Indians and in French families placed the linkage peak within 5 cM of our data, although initial Amish and English studies placed the peak centromeric or telomeric, respectively. In post hoc analyses from our laboratory, the LOD score was reduced when full families were used for fewer markers, when unaffected individuals were removed from the analysis, and when individuals with intermediate diagnoses were removed. In contrast, removal of two families that segregated hepatocyte nuclear factor 1α variants increased the LOD score to 4.87 in the remaining 40 families (23,25). Finally, we found no linkage to chromosome 1 in either 21 smaller replication families or when all 63 families were analyzed together without heterogeneity (23). The goal of the present study was to localize the well-replicated type 2 diabetes susceptibility gene in this region using a dense microsatellite map across a 37-cM region for linkage, case-control association studies, and family-based association studies.
RESEARCH DESIGN AND METHODS
We performed a number of analyses using both family-based and case-control studies to narrow the regions of susceptibility genes on chromosome 1q. For linkage analyses, we first attempted to replicate the earlier analyses showing linkage under a recessive model (23), but using a dense marker map. Although software is now available that permits multipoint analysis of full families, we included recessive analysis using Genehunter v. 2.1 and Genehunter-sized families to be comparable with our earlier analysis. While our highest linkage peak was under a recessive model, based on the variable location of the linkage peak from other laboratories and unpublished data from our laboratory suggesting associations in multiple locations, we considered the possibility that multiple susceptibility loci might be present and that these loci might have different modes of inheritance. To test this hypothesis, we included two nonparametric (model independent) analyses, one using the statistics implemented in SimWalk 2 (26) and the other using the sib-pair analysis that provided the highest nonparametric score in our previous study (23). By using multiple analytical methods, we were also able to assess whether the localization of the linkage peaks was robust to model assumptions. Finally, based on the success of microsatellite association studies in mapping other complex disease genes (27,28), we included both case-control and family-based studies of a dense microsatellite map as a framework for mapping genes by association.
We included two closely related study populations. Both linkage and family-based association studies were conducted in samples from previously described families (23). Briefly, the primary studies were conducted on 618 members of 42 families (526 nonfounders). The mean number of individuals tested was 13.3 per family, and the mean number of affected individuals per family was 4.0, with a mean age of onset of 50.6 years. An additional 27 smaller families (mean number of individuals tested: 6.6), which included six families that were not previously typed, were used as a replication set and were typed for all markers in the present study. The replication families were ascertained under the same criteria as the initial families, but the families were smaller and had fewer available members for testing. All families were ascertained for at least two siblings with type 2 diabetes diagnosed before the age of 65 years and with no more than one parent known to have type 2 diabetes. All subjects were ascertained in Utah for Northern European ancestry. All available parents and siblings of the index sib-pair, as well as all available offspring of diabetic siblings, were studied. All nondiabetic individuals underwent a 75-g oral glucose tolerance test. Subjects were classified as affected if they had a previous diagnosis of type 2 diabetes and were on medical therapy. To incorporate young-onset impaired glucose tolerance into the affection status, individuals were considered affected if the fasting glucose exceeded 7.8 mmol/l or if the 2-h postchallenge glucose was >7.8 mmol/l for participants under age 45 years, 11.1 mmol/l for participants aged 45–64 years, or 13.3 mmol/l for those over age 64 years. All other individuals with abnormal glucose tolerance tests were considered to be of unknown affection status. This scheme closely follows the World Health Organization criteria for impaired glucose tolerance (under age 45 years) and type 2 diabetes (age 45–64 years) but raises the postchallenge glucose for elderly subjects based on epidemiological data. All diagnoses were the same as in our previous study (23). Uncertainty was programmed into parametric models for individuals considered affected but who did not meet the criteria for type 2 diabetes.
Case-control association studies were conducted on 150 unrelated individuals with known type 2 diabetes and 150 ethnically matched, unrelated control individuals. Of the type 2 diabetic individuals, 70 were selected from the linkage families and 80 additional individuals were selected from the same population for type 2 diabetes and a family history of type 2 diabetes in a first-degree relative. Control individuals included spouses from linkage families who had normal glucose tolerance tests (108 subjects) and Caucasian individuals ascertained in Utah or Arkansas (42 subjects) who had normal glucose levels or glucose tolerance tests and no family history of diabetes in a sibling, parent, or grandparent.
All individuals provided written informed consent under protocols approved by the University of Utah Institutional Review Board (diabetic kindreds and case-control population) or the University of Arkansas for Medical Sciences Institutional Review Board (additional case-control samples).
Marker selection and typing.
For linkage studies of chromosome 1, we added 37 microsatellite markers to the 38 markers previously typed (23), with 29 new markers in the region between D1S305 and D1S212, where previous linkage signals were found. Marker order and spacing was derived from published maps (29,30) with reference to the physical map to establish the order and distance for closely spaced markers (National Center for Biotechnology Information [NCBI] build 33). The average marker distance between D1S305 and D1S212 was 1.17 cM. For the population-based case-control association study, we typed 46 microsatellite markers between markers D1S305 and D1S212, with an average inter-marker distance of 0.52 Mb.
Microsatellite markers were amplified in the presence of universal M13 forward primers that were labeled with LI-COR IR700 and IR800 dyes, and the products were separated and detected on LI-COR 4200 sequencers using standard methods (Li-COR, Lincoln, NE). Genotypes were scored automatically using either SAGAGT software (31) (Li-COR) or semiautomatically using GeneImage IR 3.56 software (Scanalytics, Fairfax, VA). All readings were reviewed independently, and between 30 and 50 blinded duplicate samples were included for all markers for both linkage and association studies. All gels included at least two additional samples from selected grandparents of CEPH (Centre d’Etude du Polymorphisme Humain) families as an additional quality control. Before linkage analysis, all data were checked for inconsistencies in size, inconsistencies between duplicates, and inconsistencies in Mendelian inheritance using the PEDCHECK program (v. 1.1) (32). All blinded duplicates were in agreement with the exception of four samples that were consistently incorrect and appeared to be incorrectly identified duplicate samples. We identified 0.98% genotyping errors (251 of 28,095 genotypes that were automatically read without reference to pedigree data) that resulted in noninheritance and were changed to unknown before analysis.
The marker map used for all multipoint studies was derived from primary reference to the Marshfield map (http://research.marshfieldclinic.org/genetics), which included all of the typed markers. To properly space markers that were too close to be resolved on the Marshfield linkage map, we set the distance between markers with 0 recombination fractions to 0.5 cM, with marker order based on the physical map. Consequently, our map over the region from D1S305 to D1S212 was expanded by 3 cM from the Marshfield map and by 4 cM from the recently published DeCode map (33). Thus, exact locations used in the current study differ slightly from those cited in the most recent Marshfield map.
Despite careful quality control and retyping of markers with excess recombination events, recombination between closely linked markers exceeded expectations for many intervals. Inspection of genotypes failed to identify errors leading to increased recombination. Consequently, before multipoint analysis we used a mistyping analysis implemented in SimWalk2 (v. 2.82) (26) to remove all genotypes that had a 25% or greater posterior probability of error based on excess recombination. These genotypes were considered missing for all multipoint analyses. We removed a total of 882 of 48,017 genotypes for all 62 families (1.8%). Expected and observed recombination rates for each interval are shown in the online supplemental data (Table 1).
We conducted multipoint linkage analysis under a recessive parametric model that provided the maximum LOD score in our previous studies using Genehunter version 2.1_r3 beta (24,34) and families trimmed to fit this program. Nonparametric analyses were performed using statistics A through E in SimWalk 2 (v. 2.82) (26). Additionally, based on previous results showing the highest LOD score under a sib-pair analysis, we performed sib-pair linkage analysis using Genehunter (v. 2.1_r3) under models of dominance variance and no dominance variance (35). The recessive parametric model set the disease allele frequency at 0.25 and included a linear, age-dependent penetrance function that varied from 0.02 below age 30 years to 0.60 over age 65 years (23). The allele frequency of each microsatellite marker used for linkage analysis was estimated from unrelated pedigree members, assuming Hardy-Weinberg equilibrium. Linkage studies were conducted on the full 69-family set (original families and replication families) and on the 40 families that provided the maximum evidence for linkage in our previous study. These 40 families were selected from the 42 families of the previous study but excluded two families that segregated hepatocyte nuclear factor 1α variants (25). To fit the large families into Genehunter Plus, individuals who were unaffected or of unknown affection status were trimmed before analysis as described previously (23). The location score was also calculated in SimWalk2 using full families. Parametric recessive LOD scores were calculated assuming homogeneity (α = 1) and allowing for heterogeneity. The maximum likelihood estimate of alleles shared identical by descent (IBD) among sib-pairs from the 40 kindreds that were primarily responsible for earlier linkage findings was calculated both with and without weighting to correct for multiple sib-pairs and both with and without dominance variance (λs = λo).
Because of the increased recombination observed in this study despite elimination of clear genotyping errors and to minimize the impact of map errors, particularly between closely spaced markers, we supplemented the multipoint analyses with a two-point linkage analysis of the 40-family set under the recessive model using the FASTLINK program (36). To further minimize the errors in recombination fractions resulting from sex-averaged estimates of recombination, we incorporated sex-specific recombination fractions in these analyses.
Tests of association.
Population association tests for microsatellite alleles were conducted for 43 markers using CLUMP v. 1.9 software (37). We report the maximized χ2 test (T4 statistic), which calculates the maximum χ2 value found by collapsing the contingency tables over each allele in turn to form 2 × 2 contingency tables. The significance was assessed using a Monte Carlo approach with 10,000 simulations. Family-based associations with type 2 diabetes were tested in 69 families using a modification of the transmission disequilibrium test (TDT) (38), as implemented in the Pedigree Analysis Package (39) and described previously (40). This analysis tests the probability that a heterozygous parent transmits an allele to an affected offspring more often than expected by chance, similar to the gamete-competition model described by Sinsheimer et al. (41). Increased transmission from parents to affected offspring was tested by maximum likelihood analysis against equal transmission of the alleles. All alleles at a marker were tested simultaneously with k-1 df, where k represents the number of alleles. The pedigree is analyzed as an intact unit, so that trios and nuclear families were not examined separately. Because linkage in this region was established, this likelihood test was a test of association. Data are presented without correction for the number of markers tested. In a case control study of a two-allele marker, our power for a test of allelic association with 150 individuals in each group exceeds 80% for differences in allele frequency of 12% or greater. Linkage disequilibrium between microsatellite adjacent markers was calculated from the case-control study of unrelated individuals (both case and control subjects included) as a multilocus D′ value using the expectation maximization algorithm as implemented in the 2LD program (http://linkage.rockefeller.edu).
Haplotype estimation and haplotype sharing analysis.
We examined a total of 75 microsatellite markers on chromosome 1, including 38 markers previously reported (23) and 37 markers newly typed. We typed a total of 33 markers in the region of the previously described linkage peak from marker D1S305 (159 cM) to marker D1S212 (196 cM), with all locations referenced to the Marshfield map (http://research.marshfieldclinic.org/genetics) (29). In our earlier analysis, we considered two family sets: the 42 families that constituted our primary genome-wide scan, and 21 smaller replication families. For the present study, we considered all available families (69 families; the original 42 families and 27 replication families, including 6 families not considered in the earlier study) and the 40 families from the original 42 families for which we had not identified another potential diabetes susceptibility gene. Based on our earlier data, we chose the recessive parametric model that provided the best evidence for linkage previously as the primary tool for narrowing the linkage peak. However, to determine whether that localization was robust to model assumptions, we also analyzed the linkage data under nonparametric models.
Parametric linkage analysis.
As in our previous report of 21 replication families (23), we found no evidence for linkage in the 27 replication families despite the dense map. Using the full available pedigree set (69 families), we only found evidence for linkage under models that incorporated heterogeneity, with a maximum heterogeneity LOD (HLOD) score of 1.42 with 25% of families linked at position 168.5 cM (marker D1S484). When the 40 families from the original linkage study that did not segregate hepatocyte nuclear factor 1α variants were tested, the maximum LOD score using families trimmed to fit Genehunter requirements was 5.28 at the same location (position 168.5 cM; marker D1S484), which was increased from 4.89 in our previous study. In contrast to the full family set, we found little evidence for heterogeneity (HLOD = 5.29; α = 0.96, 168.5 cM) using the 40 families that were trimmed of many unaffected individuals. As in our previous analysis (23), inclusion of all unaffected individuals using the Simwalk2 program dropped the nonheterogeneity location score to 2.98 and the heterogeneity LOD score to 4.07 (α = 0.65) without moving the location of the peak (marker D1S484; 168.5 cM) (Fig. 1). Based on the Genehunter analysis, the one LOD CI was narrowed to 167.6–170.6 cM, corresponding to locations 156.8–158.9 Mb on the physical map (NCBI Build 33).
To determine whether localization of the chromosome 1q type 2 diabetes susceptibility locus was robust to model assumptions, we tested linkage also using nonparametric approaches (Fig. 2). Our primary nonparametric analyses used SimWalk2, which could handle full families (Fig. 2), and affected sib-pair analysis using the Genehunter program (Fig. 3). Although these analysis corroborated the location of the first peak at 168.5 cM (Marker D1S484, 157.5 Mb; Genehunter NPL score 4.30; P = 0.00009), they showed a prominent second peak not seen in the parametric analysis ∼12 cM telomeric to the first peak at 180 cM, between markers D1S1158 (163.3 Mb) and D1S2762 (163.6 Mb; NPL score 3.88; P = 0.0001) (Fig. 2). Using no weighting for sibships and assuming dominance variance, the highest maximum likelihood score (MLS) was 6.07 at 168.5 cM and 5.25 at 180 cM (Fig. 3). Additionally, a third peak was evident on sib-pair analysis (unweighted; MLS = 2.98) centromeric to the larger peaks at 152.8 cM and just proximal to marker D1S305 (151.0 Mb) and near candidate genes liver- and red cell-type pyruvate kinase (PKLR) (152.0 Mb), retinoid-related orphan receptor γ (RORC), and interleukin 6 receptor (IL6R; 151.1 Mb). Nonparametric statistics examined in SimWalk2, which did not permit simultaneous consideration of the full map region, nonetheless showed similar trends for location (Fig. 2). The most significant SimWalk2 results were seen with statistic A, which is strongest under recessive models, at P = 0.0002 and location 168.5 cM at marker D1S484. The second peak was less obvious with the SimWalk2 statistics but was most significant near marker D1S433 (184 cM, 165.0 Mb; P = 0.001 for statistic C, P = 0.002 for statistic D) (Fig. 2), which was ∼4 cM or 1.4 Mb telomeric to the Genehunter NPL and sib-pair analyses. When the full family set (69 families) was examined together, the highest MLS scores on sib-pair analysis were 1.73 at 170 cM (APOA2; 158.0 Mb) and 2.46 at 180 cM (D1S1158; 163.3 Mb). Thus, when all families were considered, the proximal peak moved slightly telomeric and the distal peak slightly centromeric but retained approximately the same locations.
Two-point LOD score.
We observed an unexpectedly high recombination fractions between closely spaced markers despite retyping several markers and careful scrutiny of recombination events (online Supplemental Data, Table 1). To reduce the effect of these potential errors and to incorporate sex-specific recombination fractions, we calculated two-point parametric LOD scores using recessive parametric model described above. As shown in Table 2 of the online Supplemental Data, LOD scores exceeded 3.0 for markers D1S2675 (LOD 3.06) and D1S1679 (3.45), which are located at 171.5 cM (158.9 Mb) and 172 cM (159 Mb), respectively, just telomeric to the recessive multipoint linkage peak near markers D1S484 and D1S2705 (168.5 cM or 157.5 Mb and 169 cM or 157.6 Mb, respectively).
To further localize the type 2 diabetes susceptibility locus, we tested association in a case-control population comprising diabetic case subjects and nondiabetic control subjects ascertained in Utah or Arkansas for 46 microsatellite markers. We also tested the 33 markers used in the linkage studies for excess transmission of any allele from parents to affected offspring using maximum likelihood methods. In case control studies, markers D1S194 (178 cM, 162.1 Mb) and D1S1677 (176 cM, 160.2 Mb) were nominally significant at P = 0.003 and P = 0.012, respectively, based on Monte Carlo assessment of significance tested using the CLUMP statistic T4 to examine all alleles simultaneously (37). Marker ATA38A05 at 179 cM (162.5 Mb) was most strongly associated by TDT (P = 0.002). These markers fall under the second linkage peak, with both D1S194 and ATA38A05 falling within the 1 LOD CI for the sib-pair analysis. The data for all 46 microsatellites is shown in Table 2 of the online Supplemental Data. Multipoint linkage disequilibrium between adjacent pairs of markers ranged from not significantly different from 0 to the highest D′ value of 0.483 (Table 3 of online Supplemental Data).
We followed the methods of Saarela et al. (42) to establish shared haplotypes for the 33 markers that spanned the 37-cM region between markers D1S305 and D1S212. Haplotypes were inferred in SimWalk2 and were examined manually for sharing among the 58 sibships that had two or more affected individuals from the 40 families. Although no single haplotype was shared by all sibships, a 1.16-cM region centered on the first linkage peak and flanked by markers D1S2771 and D1S2705 was shared by 32 of 58 sibships (Table 4 of online Supplemental Data).
Multiple genome-wide scans for type 2 diabetes have implicated a large number of regions for possible susceptibility genes. To date, only a single gene has been cloned, NIDDM1 or calpain 10 on chromosome 2q, but the at-risk haplotype at this locus is rare outside of Hispanic populations. Other regions with evidence for replication include chromosomes 12q and 20, but the replication has generally been at some distance from the original description. Chromosome 1 has now been identified in Pima Indians (13), our studies described here, Amish Caucasians (17), British Caucasians (16), French Caucasians (15), and in preliminary reports of both Chinese and African Americans (5). Furthermore, a syntenic region was identified in the GK rat (43). The location of these linkage peaks is remarkably consistent but nonetheless spans the three peaks observed in the present study (5). Thus among Amish with both type 2 diabetes and impaired glucose homeostasis, the peak was near 159 cM (marker D1S2858), with a second peak that was centromeric on the P arm. This first peak falls just centromeric to our primary peak at 169 cM. Among Pima Indians, the highest scores were at 175 cM (sib-pairs discordant for diabetes) and 200 cM (sib-pairs with onset before age 25 years), and thus fall more into our second linkage peak. Initial reports from Wiltshire et al. (16) placed their linkage in the region of our second peak at 181 cM (D1S196), although additional markers are reported to have moved the highest score more centromeric to the location of our first and largest linkage peak. The results of Vionnet et al. (15) place their chromosome 1 peak in nearly the same location as our first peak, albeit only in lean (BMI <27 kg/m2) individuals. The loci reported in other studies are not precisely localized (5). These studies thus support the possibility that several susceptibility loci account for the apparent replication across studies, as suggested by our distinct linkage peaks.
We focused the current study on the 40 multiplex families that provided the majority of the evidence for linkage in our initial report and that did not segregate other known mutations. Unlike the original study, the dense map of approximately one marker every centiMorgan has resolved the broad linkage peak observed initially into at least two narrow peaks. The first of these peaks has moved slightly centromeric from the original peak at APOA2 (170 cM) to the present location of D1S484 (168.5 cM). With additional markers, the LOD score has increased to 5.28 under the recessive model and using Genehunter-sized pedigrees. Similarly, using multipoint sib-pair analysis, the MLS has increased from 2.98 in the original study to 6.07 in the present study. Despite the large variation in significance levels for the first peak with different analytical methods, the location of this peak was remarkably consistent. Both the SimWalk2 statistic A and the parametric analysis continue to support a recessive-like mode of inheritance for the susceptibility gene or genes that accounts for the first peak. Based on the present analyses, we have narrowed the 1 LOD CI for this peak to a region from 167.6 cM (156.8 Mb) to 170.6 cM (158.9 Mb). This peak includes at least 60 RefSeq genes, including a number of strong candidate genes, many of which have been evaluated by our laboratory and others. Among the candidate genes previously evaluated in this region are apolipoprotein A2 (APOA2) at 170 cM (157.9 Mb) (44); phosphoprotein enriched in astrocytes (PEA15), which may be involved in insulin action (45); C-reactive protein, which may be involved in inflamation (46); and two inwardly rectifying potassium channel genes, KCNJ9 and KCNJ10 (47,48). None of the reported associations of single nucleotide polymorphisms (SNPs) in these genes can convincingly account for the strong linkage signal in our families, however. In contrast, we have identified two regions within the 1 LOD support interval in which a cluster of SNPs shows strong associations with type 2 diabetes in case-control studies. These associations thus appear to support the linkage findings. Neither region falls close to a strong candidate gene, but work is in progress to identify additional polymorphisms within these regions and to evaluate nearby coding genes. Additional support for an association under this peak has come from other groups with linkage in this region (17).
Unlike our original report, the present study suggests a second peak at 180 cM, ∼10 cM from the first peak at 169 cM. Based on the 40-family sib-pair analysis, the 1 LOD support interval is 177.7–181.6 cM, or ∼162.5 to 164.7 Mb. Unlike the first peak, this second region is much less prominent using the recessive parametric models and is most prominent using multipoint sib-pair analysis, under which this peak nearly equals the first peak with a MLS of 5.247. These data suggest that the susceptibility locus accounting for the second peak acts less like a recessive locus. Furthermore, this peak has a higher MLS score than the first peak when all 69 families are considered, thus suggesting that the susceptibility allele accounting for this peak may be more prevalent than that accounting for the first peak. The most prominent candidate genes for type 2 diabetes in the 1 LOD support interval are the RXRγ (49), for which we found an association with lipid abnormalities but a less prominent association with type 2 diabetes, and the overlapping homeobox transcription factor LMX1A (50). The microsatellite associations found in the present study also support one or more susceptibility genes that account for this peak. Marker D1S194, which was associated with type 2 diabetes in the case-control study, lies just telomeric to RXRγ (162.06 Mb), whereas marker D1S1677 lies nearly 2 Mb telomeric to the 1 LOD CI (160.2 Mb). However, the only marker identified as overtransmitted in family members in a TDT-like test, marker ATA38A05, also lies within this second peak (162.5 Mb). We cannot exclude the possibility that one or more of the associations are spurious, particularly given the modest P values and the span of nearly 2.5 Mb between associated microsatellite markers. Additional SNP typing in these regions will be needed to confirm these associations and to narrow the genes responsible for these associations.
Although this study narrowed the most prominent linkage and association signals to the region between 156 and 168 Mb, we have previously demonstrated an association of multiple noncoding SNPs within the PKLR gene with type 2 diabetes (51), which is centromeric to the first linkage peak. This association would fall under the most centromeric linkage peak that was observed only on the unweighted sib-pair analysis (Fig. 3). The physical distance encompassed by this peak might extend from 117 Mb to at least 152.5 Mb. Among possible candidates in this region besides PKLR are RORC (52), α-endosulfine (ENSA) (53), and interleukin-6 receptor (S.C.E., unpublished data). Of these candidates, a prominent association in this population was observed only with PKLR. Because of unusually strong linkage disequilibrium extending for large distances in this centromeric region, the actual genes accounting for the linkage peak and the association may lie at some physical distance from the observed association.
Were a single variant responsible for our linkage signal on 1q21-q24, we would expect to identify one haplotype of the microsatellite markers across the linkage peak that was shared among affected individuals. In contrast, in the region between D1S305 to D1S212, no single haplotype was shared. This finding is consistent with the existence of at least two and possibly three linkage peaks, suggesting more than one susceptibility gene in this region. The finding of several association peaks in this region offers further support for multiple susceptibility loci. We did identify a 1.16-cM region flanked by markers D1S2771 and D1S2705 in which affected siblings of 55% of the 58 sibships from the 40 families shared the same haplotype, but no single haplotype was shared, even in this narrow region. This finding is consistent with other common disease susceptibility genes and suggests that even within this first linkage peak, multiple at-risk haplotypes contribute to the linkage signal.
In summary, using combined linkage mapping, haplotype sharing and association studies with a dense marker map, we were able to confirm and narrow our original peak of linkage to a 3.3-cM region or ∼2.1 Mb. We have resolved a second linkage peak that is ∼10 cM telomeric to our largest peak but in a region of both association and linkage in other studies. Our analysis strongly suggests that the replication in this region comes in part from the coalescence of several susceptibility loci in a region that could not be resolved on a 10-cM genome scan. The region harbors many strong candidate genes for type 2 diabetes, as well as a large number of poorly characterized transcripts that may also be good candidates. International collaborative efforts are underway to map these loci using positional candidate and linkage disequilibrium approaches in the populations with linkage to this region.
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org.
This work was supported by grant DK39311 from the National Institutes of Health/NIDDK. Subject ascertainment was supported in part by the Research Service of the Department of Veterans Affairs, by the American Diabetes Association, and by National Institutes of Health/NCRR support of the General Clinical Research Centers of University of Arkansas for Medical Sciences (M01RR14288) and the University of Utah (M01RR03655). We thank Demond Williams and Winston Chu for technical assistance, Terri Hale and Judith Cooper for assistance with subject ascertainment, and the GCRC nursing and laboratory staff for assistance with subject assessment.