HNF1α (TCF1) is a key transcription factor that is essential for pancreatic β-cell development and function. Rare mutations of HNF1α cause maturity-onset diabetes of the young. A common variant, G319S, private to the Oji-Cree population, predisposes to type 2 diabetes, but the role of common HNF1α variation in European populations has not been comprehensively assessed. We determined the linkage disequilibrium and haplotype structure across the HNF1α gene region using 29 single nucleotide polymorphisms (SNPs). Eight tagging SNPs (tSNPs) that efficiently capture common haplotypes and the amino acid–changing variant, A98V, were genotyped in 5,307 subjects (2,010 type 2 diabetic case subjects, 1,643 control subjects, and 1,654 members of 521 families). We did not find any evidence of association between the tSNPs or haplotypes and type 2 diabetes. We could exclude odds ratios (ORs) >1.25 for all tSNPs. The rare V98 allele (∼3% frequency) showed possible evidence of association with type 2 diabetes (OR 1.23 [95% CI 0.99–1.54], P = 0.07), a result that was supported by meta-analysis of this and published studies (OR 1.31 [1.08–1.59], P = 0.007). Further studies are required to investigate this association, demonstrating the difficulty of defining the role of rare (<5%) alleles in type 2 diabetes risk.
The HNF1α gene (TCF1) is an excellent candidate gene for type 2 diabetes. It codes for a key member of a transcription factor network that is essential for the development and function of the pancreatic β-cell (1,2). Rare severe mutations of the HNF1α gene are the commonest cause of maturity-onset diabetes of the young, a young-onset monogenic subtype of diabetes (3,4). The private HNF1α variant, G319S, predisposes to type 2 diabetes in the Oji-Cree population of Canada (5), and a number of genome-wide scans have found evidence of linkage of type 2 diabetes to 12q24 (6–8), where the HNF1α gene is located.
Recent studies suggest that comprehensive analysis of monogenic diabetes genes can be a productive approach for identifying type 2 diabetes susceptibility variants: rare mutations in Kir6.2, PPARγ, and HNF4α cause monogenic diabetes, and common variants of these genes have been reproducibly associated with type 2 diabetes using large-scale studies and meta-analysis (9–13). The role of common HNF1α variation in type 2 diabetes susceptibility has not been comprehensively assessed. A few studies (14–22), each consisting of a few hundred subjects, have previously examined the role of the amino acid–changing variants I27L, S487N, and A98V but with conflicting results in type 2 diabetes, although there is good evidence that the A98V variant affects β-cell function in nondiabetic subjects. In this study, we analyzed 5,307 samples that gave us >80% power to detect odds ratios (ORs) for common variants (minor allele frequency [MAF] >5%) of 1.2–1.4 at P < 0.01.
To identify variants and determine linkage disequilibrium (LD) structure across the HNF1α gene, we sequenced all 10 exons, 5 conserved noncoding regions, and 9 amplicons containing more than one single nucleotide polymorphism (SNP) for at least 24 U.K. Caucasian subjects. This gave 91% power to detect alleles of 5% frequency. We identified 29 SNPs, covering 69,396 bp (28 kb 5′ and 17 kb 3′ of HNF1α) with an MAF >5%. These included the three nonsynonymous variants: I27L, S487N, and A98V. We found strong LD and limited haplotype diversity across the region. In this subsample, five haplotypes that were tagged by three SNPs occurred at >5% frequency and accounted for 68% of all haplotypes (82% of all haplotypes when limited to intragenic SNPs). Supplementary Fig. 1 in the online appendix (available at http://diabetes.diabetesjournals.org) shows a schematic of LD across the HNF1α gene. Supplementary Table 1 provides information on all 29 variants (online appendix).
The three tagging SNPs (tSNPs) identified were supplemented by an additional five defined by Winckler et al. (23) after both groups had exchanged information on our initial analysis of this gene. Winckler et al. had performed a similar LD and haplotype analysis of the HNF1α gene but across a larger region (118 kbp) and at a greater SNP density than our own study. The eight tSNPs captured a large proportion of identified variation in the region (51% of SNPs with MAF >10% had an r2 value of >0.8, with at least one of the eight tSNPs, and 76% had an r2 value of >0.5) (23). A98V was not efficiently captured by the tSNPs because of its relatively low minor allele frequency (∼3%). We therefore also typed this SNP, as it is a nonsynonymous variant, and some, but not all, previous studies have suggested that it is associated with type 2 diabetes and measures of β-cell function (14,15,17,20–22).
We genotyped the eight tSNPs and A98V in a case-control study of 2,010 U.K. type 2 diabetic subjects, 1,643 U.K. control subjects, and a type 2 diabetic family association study (1,654 individuals from 521 families). Clinical details of the subjects used are given in Table 1. The results of our case-control study are shown in Table 2. Two SNPs, A98V (OR 1.30 [95% CI 1.01–1.68], P = 0.04) and rs2071190 (OR 1.12 [1.01–1.25], P = 0.04), demonstrated nominal evidence of association with type 2 diabetes. There were no significant deviations from expected transmission rates in our family-based study (Table 2). Combining the family-based results with the case-control results attenuated the A98V (OR 1.23 [0.99–1.54], P = 0.07) and rs2071190 (OR 1.09 [0.99–1.20], P = 0.06) type 2 diabetes associations. Individual genotype and allele counts are available in supplementary Table 2 (online appendix).
Our study provides possible evidence of an association between A98V and type 2 diabetes. Previous smaller association studies (14,15,17,20–22) of this variant in type 2 diabetes have produced conflicting results. We therefore performed a meta-analysis of all published case-control studies and our new study. The meta-analysis plot is shown in Fig. 1 and suggests that the V allele predisposes to type 2 diabetes with OR 1.31 (95% CI 1.08–1.59), P = 0.007. However, there was little support of an association of A98V with type 2 diabetes in the accompanying article from Winckler et al. (23). Additionally, meta-analysis is prone to publication bias, such that the association we have seen of A98V with type 2 diabetes needs to be replicated in further much larger studies before it can be confirmed as a type 2 diabetes variant. This shows that establishing whether a variant <5% MAF predisposes to type 2 diabetes is likely to be difficult, even with sample sizes of several thousand subjects.
The eight tSNPs defined six haplotypes with frequency >5%, which accounted for 84% of all haplotypes across the HNF1α region in the case-control groups. No haplotype differed significantly in frequency between case and control subjects (Table 3). In particular, there was no evidence that the haplotype capturing the Leu27 allele (which is in almost perfect LD with rs3830659, r2 = 0.97) (haplotype A) predisposes to type 2 diabetes. No haplotype differed significantly from expected familial transmission ratios in the familial haplotype association study (Table 3).
In our analysis of the role of common variation of HNF1α in type 2 diabetes susceptibility in U.K. Caucasians, we have found no evidence that common HNF1α haplotypes are associated with type 2 diabetes. We were able to exclude ORs of 1.25 for all tSNPs; however, even with this large sample size, certain methodological considerations may have led us to miss real associations. We have not typed all SNPs across HNF1α and therefore may not have adequately captured disease variants across the gene. We note, however, the high level of LD in the region (supplementary Fig. 1 of the online appendix) and the fact that we have captured each of the individual SNPs that showed nominal evidence for association (P < 0.05) in the initial cohort of Winckler et al. (23), with r2 values ranging from 0.66 to 1.0. We clearly cannot exclude that multiple rare alleles (<5% MAF) are associated with type 2 diabetes. A further potential source of error is population stratification, which can cause false-positive or false-negative results. However, all subjects are of U.K. Caucasian origin and family- and case-control–based results are consistent with each other. A further source of false-negative results is the fact that our control subjects are a younger population-based resource rather than age matched to case subjects. As ∼5% of the control subjects will go on to develop diabetes, this would slightly reduce the power of our study compared with the use of the same number of control subjects that were age matched to the case subjects.
In conclusion, using a tSNP and comparative genomics approach, we have analyzed the HNF1α gene for association of common variation with type 2 diabetes. No variants or haplotypes were associated with type 2 diabetes in the U.K. population.
RESEARCH DESIGN AND METHODS
Variant detection and haplotype definition.
We used a minimum of 24 unrelated U.K. diabetic subjects for identifying variants and 62 diabetic subjects for haplotype construction.
Case-control subjects.
The clinical characteristics of the subjects in our study are given in Table 1. All participants gave informed consent and were U.K. Caucasians. All type 2 diabetic subjects had diabetes defined either by World Health Organization criteria or by being treated with medication for diabetes. Known subtypes such as maturity-onset diabetes of the young or mitochondrial-inherited diabetes and deafness were excluded by clinical criteria and/or genetic testing. Subjects in the type 2 diabetic case group were recruited from three sources, as previously described (13). In brief, these were a collection of young-onset type 2 diabetic subjects (n = 277), probands from type 2 diabetic sibships from the Warren 2 repository (n = 541), and a new collection of type 2 diabetic subjects from the Warren 2 repository (n = 1,192). Patients who had GAD autoantibodies at recruitment have been excluded for the first two groups of case subjects but not the third. Population control subjects were recruited from two sources: 1) parents from a consecutive birth cohort (Exeter Family Study) with normal (<6.0 mmol/l) fasting glucose and/or normal HbA1c (A1C) levels (<6%) (n = 1,176) and 2) a nationally recruited population control sample of U.K. Caucasians obtained from the European Cell Culture Collection (n = 467). Excluding the young-onset type 2 diabetic subjects, which are younger at diagnosis and more obese, from the combined analysis did not affect the overall association results.
Family-based subjects.
The clinical characteristics of the affected probands in our family-based study are shown in Table 1. All subjects are independent of those from the case-control study. Families fitting the following criteria were included: an affected proband with both parents (404 “trios” families) or one parent and at least one unaffected sibling (117 “duos” families). The characteristics of some of these families have been previously described (13,24).
Comparative genomics to identify conserved noncoding sequences.
We downloaded the human, mouse, and rat sequences of HNF1α, including 50 kb upstream and 500 bp downstream from the Santa Cruz genome browser builds from November 2002, February 2003, and January 2003, respectively. Further sequence 3′ was not analyzed due to the presence of another gene starting at 536 bp 3′ of HNF1α. Pairwise alignments were performed using BLAST2. A cutoff of >100 bp showing >75% matches was used to define conserved noncoding sequences. Regions sequenced that contained conserved noncoding sequences occurred at base pairs (NCBI34) 119,816,898–119,817,223 (5′); 119,828,047–119,828,501 (promoter); 119,829–119,829,732 (intron 1); 119,830,033–119,831,037 (intron 1); and 119,850,821–119,851,107 (3′ untranslated region).
Variant detection.
To ascertain all common variation in the coding and conserved noncoding sequence, we sequenced the 10 exons and conserved noncoding sequences in a minimum of 24 subjects. In addition, we sequenced a further nine amplicons, five 5′ and four 3′, of the HNF1α gene, selected as having more than one SNP present in the SNP database. Sequencing was performed on ABI-377 platforms using standard protocols. Where SNPs were identified in the initial samples, further samples were sequenced so that 62 subjects in total had been analyzed, with the exception of when the MAF was <5% in the initial sequencing.
Genotyping and quality control.
Genotyping was performed by Kbiosciences (Herts, U.K.) using modified TAQMAN assays (available at http://www.kbioscience.co.uk). Each 1,536-well plate contained a mixture of case, control, and negative control subjects. The presence of 10% duplicate samples revealed that the genotyping accuracy was 99.7%. The genotyping success rate was 99% for control subjects, 97% for case subjects, and 95% for families. After excluding families with obvious relationship inconsistencies (as determined by the genotyping of an additional 43 SNPs), the Mendelian inconsistency error rate was 0.0004. All SNPs were in Hardy-Weinberg equilibrium in both case and control subjects (χ2 P > 0.01). The three groups that made up the case subjects and the two groups that made up the control subjects did not differ significantly in frequency (all individual SNP/study genotype and allele frequencies are presented in supplementary Table 2 of the online appendix).
Haplotype construction and assessment of LD.
Pairwise LD estimates were calculated using GOLD (25). Haplotype construction was performed using SNPHAP (version 1.2; available at http://www-gene.cimr.cam.ac.uk/clayton/software/snphap.txt) and PHASE (26).
Association study statistical analysis.
ORs and P values were determined for our case-control analyses using χ2 tests. We used COCAPhase (27) to estimate haplotype frequencies and perform tests of haplotype association. PHASE (26) produced very similar haplotype frequencies. To analyze our family data, we used the FBAT program (27) (available at http://www.biostat.harvard.edu/∼fbat/default.html). The TDT/sibTDT program (28) produced very similar results. To estimate the ORs for the families with parents and unaffected siblings available, we used the discordant allele test (29). To estimate the ORs for the case-control and family-based combined analysis we combined ORs from the discordant allele, TDT, and case-control studies using Mantel-Haenszel meta-analysis (13). Approximate power for allelic ORs for each of the SNPs was calculated using the power calculator at the University of California, Los Angeles website (available at http://calculators.stat.ucla.edu/powercalc/). To include the family-based subjects in the power approximations, we included probands as case subjects and parents as control subjects. The significance level was set to 0.01 for a range of allele frequencies.
A98V meta-analysis methods.
To identify all relevant published studies and abstracts, we searched Pubmed, ISI Web Of Science, and Google using combinations of the following key words: A98V, Ala98Val, 98, TCF1, HNF1, diabetes, type 2 diabetes, genetic association, maturity-onset diabetes of the young, and SNP. Because there were zero alleles in the case group from the Rissanen et al. (20) study, we used the Haldane correction to plot the 95% CIs and exact methods for pooled OR estimation and significance testing (30). Although a funnel plot suggested that publication bias was not a major problem, we cannot exclude the possibility that such bias exists. We used the Q statistic to test for heterogeneity and the Mantel-Haenszel method to combine the studies.
. | Case subjects . | . | . | Control subjects . | . | Families . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | Warren 2 case subjects . | Warren 2 probands . | Young-onset type 2 diabetic subjects . | Population control 1 (Exeter Family Study) . | Population control 2 (European Cell Culture Collection) . | Warren 2 trios* . | Warren 2 duos* . | ||||
n | 1,192 | 541 | 277 | 1,176 | 467 | 404 families | 117 families | ||||
Male (%) | 62 | 53 | 55 | 49 | 52 | 58 | 57 | ||||
Median age (years)† | 51 (45–58) | 56 (50–61) | 40.0 (35–44) | 31 (28–35) | NA | 40.0 (35.0–46.0) | 45.0 (39.3–49.0) | ||||
Median BMI (kg/m2) | 30.6 (27.1–34.9) | 28.0 (25.3–31.5) | 32.2 (27.8–36.6) | 26.3 (24.1–29.0)‡ | NA | 33.0 (29.1–37.6) | 32.5 (28.2–36.6) | ||||
Treatment (diet/oral hypoglycemic agents/insulin) (%) | 8/66/26 | 18/67/15 | 9/38/53 | § | § | 21/60/19 | 15/57/28 |
. | Case subjects . | . | . | Control subjects . | . | Families . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | Warren 2 case subjects . | Warren 2 probands . | Young-onset type 2 diabetic subjects . | Population control 1 (Exeter Family Study) . | Population control 2 (European Cell Culture Collection) . | Warren 2 trios* . | Warren 2 duos* . | ||||
n | 1,192 | 541 | 277 | 1,176 | 467 | 404 families | 117 families | ||||
Male (%) | 62 | 53 | 55 | 49 | 52 | 58 | 57 | ||||
Median age (years)† | 51 (45–58) | 56 (50–61) | 40.0 (35–44) | 31 (28–35) | NA | 40.0 (35.0–46.0) | 45.0 (39.3–49.0) | ||||
Median BMI (kg/m2) | 30.6 (27.1–34.9) | 28.0 (25.3–31.5) | 32.2 (27.8–36.6) | 26.3 (24.1–29.0)‡ | NA | 33.0 (29.1–37.6) | 32.5 (28.2–36.6) | ||||
Treatment (diet/oral hypoglycemic agents/insulin) (%) | 8/66/26 | 18/67/15 | 9/38/53 | § | § | 21/60/19 | 15/57/28 |
Data are median (interquartile range), unless otherwise indicated. Only successfully genotyped subjects included.
*Probands details only given.
Age at diagnosis for case subjects and age at study for control subjects. No clinical details were available for the European Cell Culture Collection propulation control samples, but percent male was determined by XY-PCR.
BMI measurement for males only, as females were pregnant at time of study.
Control subjects were not on treatment. There were significant differences in age at diagnosis, BMI, and treatment between the three case groups (all P < 0.05). This was mainly explained by the young-onset type 2 diabetic subjects having a younger age at onset, higher BMI, and a higher proportion using insulin.
SNP . | Alleles . | MAF . | Observed transmissions . | Expected transmissions . | OR (95% CI) . | P . |
---|---|---|---|---|---|---|
Case-control analysis (n = 2,010 case and 1,643 control subjects) | ||||||
rs959398 | A/G | 0.08 | 1.03 (0.88–1.22) | 0.70 | ||
rs1920792 | C/T | 0.47 | 1.01 (0.92–1.11) | 0.77 | ||
A98V | T/C | 0.03 | 1.30 (1.01–1.68) | 0.04 | ||
GE117884_349 | A/G | 0.30 | 0.98 (0.88–1.08) | 0.70 | ||
rs3830659 (r2 = 0.97 with I27L) | ins/del | 0.32 | 0.96 (0.87–1.06) | 0.40 | ||
rs1169292 | T/C | 0.32 | 0.93 (0.84–1.03) | 0.15 | ||
rs2071190 | A/T | 0.24 | 1.12 (1.01–1.25) | 0.04 | ||
P288P | C/G | 0.30 | 0.98 (0.88–1.08) | 0.64 | ||
rs1169306 | T/C | 0.37 | 0.94 (0.85–1.03) | 0.18 | ||
Family association analysis (n = 521 families) | ||||||
rs959398 | A/G | 0.09 | 74 | 68 | 1.18 (0.81–1.71) | 0.32 |
rs1920792 | C/T | 0.46 | 292 | 295 | 0.94 (0.77–1.15) | 0.78 |
A98V | T/C | 0.04 | 38 | 38 | 1.06 (0.64–1.76) | 0.95 |
GE117884_349 | A/G | 0.30 | 202 | 213 | 0.89 (0.72–1.11) | 0.21 |
rs3830659 (r2 = 0.97 with I27L) | ins/del | 0.32 | 212 | 214 | 0.97 (0.78–1.20) | 0.67 |
rs1169292 | T/C | 0.31 | 209 | 209 | 1.06 (0.85–1.33) | 0.99 |
rs2071190 | A/T | 0.25 | 177 | 183 | 0.96 (0.75–1.21) | 0.63 |
P288P | C/G | 0.30 | 209 | 212 | 0.98 (0.79–1.22) | 0.58 |
rs1169306 | T/C | 0.37 | 240 | 247 | 0.95 (0.87–1.04) | 0.93 |
SNP . | Alleles . | MAF . | Observed transmissions . | Expected transmissions . | OR (95% CI) . | P . |
---|---|---|---|---|---|---|
Case-control analysis (n = 2,010 case and 1,643 control subjects) | ||||||
rs959398 | A/G | 0.08 | 1.03 (0.88–1.22) | 0.70 | ||
rs1920792 | C/T | 0.47 | 1.01 (0.92–1.11) | 0.77 | ||
A98V | T/C | 0.03 | 1.30 (1.01–1.68) | 0.04 | ||
GE117884_349 | A/G | 0.30 | 0.98 (0.88–1.08) | 0.70 | ||
rs3830659 (r2 = 0.97 with I27L) | ins/del | 0.32 | 0.96 (0.87–1.06) | 0.40 | ||
rs1169292 | T/C | 0.32 | 0.93 (0.84–1.03) | 0.15 | ||
rs2071190 | A/T | 0.24 | 1.12 (1.01–1.25) | 0.04 | ||
P288P | C/G | 0.30 | 0.98 (0.88–1.08) | 0.64 | ||
rs1169306 | T/C | 0.37 | 0.94 (0.85–1.03) | 0.18 | ||
Family association analysis (n = 521 families) | ||||||
rs959398 | A/G | 0.09 | 74 | 68 | 1.18 (0.81–1.71) | 0.32 |
rs1920792 | C/T | 0.46 | 292 | 295 | 0.94 (0.77–1.15) | 0.78 |
A98V | T/C | 0.04 | 38 | 38 | 1.06 (0.64–1.76) | 0.95 |
GE117884_349 | A/G | 0.30 | 202 | 213 | 0.89 (0.72–1.11) | 0.21 |
rs3830659 (r2 = 0.97 with I27L) | ins/del | 0.32 | 212 | 214 | 0.97 (0.78–1.20) | 0.67 |
rs1169292 | T/C | 0.31 | 209 | 209 | 1.06 (0.85–1.33) | 0.99 |
rs2071190 | A/T | 0.25 | 177 | 183 | 0.96 (0.75–1.21) | 0.63 |
P288P | C/G | 0.30 | 209 | 212 | 0.98 (0.79–1.22) | 0.58 |
rs1169306 | T/C | 0.37 | 240 | 247 | 0.95 (0.87–1.04) | 0.93 |
For case-control results, ORs are given for the first allele in the “Alleles” column. For family-based results, expected statistics and P values obtained from FBAT (27) and ORs from discordant alleles sib test are given. MAF = MAF in control subjects (case-control analysis) and founders (family analysis).
Haplotype* . | Control frequencies . | Case frequencies† . | Observed transmissions . | Expected transmissions . | P . |
---|---|---|---|---|---|
Case-control analysis (n = 2,010 case and 1,643 control subjects) | |||||
A (GTGITTGT) | 0.28 | 0.27 | 0.22 | ||
B (GCADCTCC) | 0.16 | 0.18 | 0.17 | ||
C (GCGDCTGC) | 0.16 | 0.15 | 0.49 | ||
D (GCADCAGC) | 0.11 | 0.10 | 0.13 | ||
E (ATGDCACC) | 0.08 | 0.08 | 0.56 | ||
F (GTGDCTGT) | 0.06 | 0.05 | 0.86 | ||
Family association analysis (n = 521 families) | |||||
A (GTGITTGT) | 0.27 | 154 | 162 | 0.35 | |
B (GCADCTCC) | 0.17 | 98 | 101 | 0.60 | |
C (GCGDCTGC) | 0.14 | 97 | 90 | 0.26 | |
D (GCADCAGC) | 0.10 | 57 | 65 | 0.19 | |
E (ATGDCACC) | 0.08 | 57 | 49 | 0.15 | |
F (GTGDCTGT) | 0.08 | 52 | 51 | 0.83 |
Haplotype* . | Control frequencies . | Case frequencies† . | Observed transmissions . | Expected transmissions . | P . |
---|---|---|---|---|---|
Case-control analysis (n = 2,010 case and 1,643 control subjects) | |||||
A (GTGITTGT) | 0.28 | 0.27 | 0.22 | ||
B (GCADCTCC) | 0.16 | 0.18 | 0.17 | ||
C (GCGDCTGC) | 0.16 | 0.15 | 0.49 | ||
D (GCADCAGC) | 0.11 | 0.10 | 0.13 | ||
E (ATGDCACC) | 0.08 | 0.08 | 0.56 | ||
F (GTGDCTGT) | 0.06 | 0.05 | 0.86 | ||
Family association analysis (n = 521 families) | |||||
A (GTGITTGT) | 0.27 | 154 | 162 | 0.35 | |
B (GCADCTCC) | 0.17 | 98 | 101 | 0.60 | |
C (GCGDCTGC) | 0.14 | 97 | 90 | 0.26 | |
D (GCADCAGC) | 0.10 | 57 | 65 | 0.19 | |
E (ATGDCACC) | 0.08 | 57 | 49 | 0.15 | |
F (GTGDCTGT) | 0.08 | 52 | 51 | 0.83 |
Haplotype results obtained using the HBAT option in the FBAT program (26).
The order of SNPs making up the haplotypes is HNF1_rs959398, HNF1_rs1920792, HNF1_GE117884_349, HNF1α_rs3830659, HNF1_rs1169292, HNF1_rs2071190, HNF1_P288P, and HNF1_rs1169306.
Affected proband frequency for family analysis. D, deletion; I, insertion.
Additional information for this article can be found in an online appendix available at http://diabetes.diabetesjournals.org.
Article Information
We thank Diabetes U.K. for funding this research. A.T.H. is a Wellcome Trust Research Leave Fellow.
We thank Wendy Winckler, David Altshuler, and colleagues, who kindly shared information on their analysis of HNF1α throughout this study.