Common and rare variants of the hepatocyte nuclear factor 4α (HNF4A) gene have been associated with type 2 diabetes and related traits in several populations suggesting the involvement of this transcription factor in diabetes pathogenesis. Single nucleotide polymorphisms (SNPs) within a large haplotype block surrounding the alternate P2 promoter, located ∼45 kb upstream from the coding region, have been investigated in several populations of varying ethnicity with inconsistent results. Additionally, SNPs located within the P1 promoter and coding region have also been inconsistently associated with type 2 diabetes. Characterization of variation across this gene region in Mexican-American populations has not been reported. We therefore examined polymorphisms across the HNF4A gene in a cohort of Mexican-American pedigrees and assessed their association with type 2 diabetes. We observed evidence for association of SNPs in the P2 promoter region with type 2 diabetes (P = 0.003) and its age at diagnosis (P = 0.003). The risk allele frequency (53%) was intermediate to that reported in Caucasian populations (20–27%) and Pima Indians (83%). No other SNPs were associated with either trait. These results support the possibility that a variant in the P2 promoter region of HNF4A, or variants in linkage disequilibrium within this region, contributes to susceptibility to type 2 diabetes in many ethnic populations including Mexican Americans.
Hepatocyte nuclear factor 4α (HNF4A) is a transcription factor expressed in a number of tissues including liver, pancreas, kidney, and intestine. HNF4A regulates genes involved in glucose and fatty acid metabolism, as well as insulin secretion, and is therefore critical for maintaining lipid and glucose homeostasis (1,2). A number of mutations in HNF4A have been identified that cause maturity-onset diabetes of the young type 1, and others have been associated with a few cases of early-onset type 2 diabetes (3). More recently, common variants in HNF4A have been shown to account for the reported linkage of typical late-onset type 2 diabetes to an overlapping region on chromosome 20q13 in Finnish and Ashkenazi Jewish populations (4,5). Single nucleotide polymorphisms (SNPs) within a large haplotype block surrounding the alternate P2 promoter, located ∼45 kb upstream from the coding region, were significantly associated in both populations; however, association of SNPs located near the coding region and immediate proximal P1 promoter varied between the two studies. Following these reports, common variants in HNF4A have been tested for association with type 2 diabetes and related traits in a number of other populations (6–11). The results of these replication efforts have not been completely consistent with a role for HNF4A variation in type 2 diabetes risk; however, the inconsistencies across studies may be due to various factors, such as lack of power to detect an effect size that was overestimated in the original reports, variation in other genetic or environmental modifiers, or lack of linkage disequilibrium (LD) with an as yet unidentified true functional variant.
To determine whether reported variants of HNF4A affect the risk for type 2 diabetes in Mexican Americans, we investigated the haplotype block structure of those variants in HNF4A and its promoters and tested whether these variants were associated with type 2 diabetes in a Mexican-American cohort: the San Antonio Family Diabetes and Gallbladder Studies (SAFDGS).
RESEARCH DESIGN AND METHODS
Subjects used in this study were participants of the SAFDGS, which consists of extended pedigrees of Mexican-American descent and has been described in detail elsewhere (12,13). Diabetes was defined as a fasting plasma glucose level ≥7.0 mmol/l (126 mg/dl) (14,15) or reported physician-diagnosed diabetes and reported current therapy with either oral antidiabetes agents or insulin. SAS was used to model age of diabetes diagnosis as a proxy for age of diabetes onset with a Cox proportional hazards model (16). The Martingale residual from the Cox proportional hazards model, a quantitative trait, was used in the subsequent genetic analyses, using variance components methodology as implemented in the software package SOLAR (12,17). To analyze type 2 diabetes as a discrete phenotype in the variance components framework, we assumed an unobserved underlying quantitative liability, with individuals above a threshold considered diseased (18,19). The liability is assumed to have an underlying multivariate-normal distribution. Since the SAFDGS families were ascertained on the basis of a single type 2 diabetic proband per family, our analyses included ascertainment correction (20). The institutional review board of the University of Texas Health Science Center at San Antonio approved all procedures, and all subjects gave informed consent.
Genotyping.
Genotyping of all SNPs was completed using the Applied Biosystems (ABI, Foster City, CA) TaqMan Allelic Discrimination methodology on an ABI Prism 7900HT Sequence Detection System according to the manufacturer’s instructions. The discrepancy rate on duplicate genotyping was <0.3%, and the call rate was 99%. Further, no Mendelian inconsistencies were observed.
Statistical analyses.
LD between each pair of SNPs was calculated by direct correlation ( r ) between SNP genotype vectors in which individual SNP genotypes were scored as 0, 1, or 2, depending on how many copies of the rarer allele an individual carried. Haplotypes were estimated using the computer program SimWalk2 (21). Haplotype score vectors were then generated with elements containing a 0, 1, or 2, depending on the number of copies of a specific haplotype that an individual carried.
To test the association between each SNP or haplotype and the phenotypic traits, a measured genotype approach (22) was used, with the allele counts serving as the measured genotypes. This method accounts for the relatedness among family members. Parameter estimates were obtained by maximum likelihood methods, and the significance of association was tested by likelihood ratio tests. Within each model, we simultaneously estimated the effects of age and sex. The measured genotype method was implemented using SOLAR (17). To address possibilities of hidden population stratification in the SAFDGS population, we used a pedigree test of transmission disequilibrium, specifically the quantitative trait disequilibrium test, as described in Abecasis et al. (23). The model of Abecasis et al. (23) was used to partition the total association into either within (βw) or between (βb) family components, using allelic transmission scores in extended pedigrees. The parameters βw and βb are modeled as the fixed effects, within a variance components framework. Given that βb could be confounded by population stratification, this approach is used to address the issue of population stratification by testing whether βb = βw. In the absence of population stratification, βb = βw.
RESULTS
We genotyped eight SNPs located near the P2 and P1 promoters and the coding region of HNF4A selected from the LD blocks defined in the Ashkenazi Jewish and Finnish populations. The pairwise correlation among these SNPs in this Mexican-American population are shown in Fig. 1. The haplotype block structure is similar to that reported in Caucasian populations in that a block is observed near the P2 promoter region, and LD decays between the P2 promoter and the rest of the gene. Due to the high correlation (r > 0.99) between the four SNPs located near the P2 promoter, only two were included in further analyses. We genotyped six SNPs (rs1884613, rs2144908, rs2425640, rs3212183, rs1885088, and rs3818247) in our entire cohort for whom lymphoblastoid cell lines and updated phenotypic data were available (n = 697). All SNPs except rs2425640 (nominal P value = 0.048) conformed to Hardy-Weinberg equilibrium expectations. When accounting for multiple testing, however, this SNP no longer deviates from Hardy-Weinberg equilibrium and, hence, was further analyzed.
Table 1 summarizes the allele frequencies for these six SNPs and the results of genotypic association analyses using an additive model for each SNP. The P values presented are not corrected for multiple testing. Using a conservative Bonferroni correction, which does not account for nonindependence between SNPs, a P value <0.004 would be required to obtain an experiment-wide P value ≤0.05. Using this level of correction for multiple testing, both SNPs genotyped in the P2 promoter region were significantly associated with type 2 diabetes (P = 0.003) and the quantitative trait age at diabetes onset (P = 0.003). No other SNPs were associated with these traits. The allele A of SNP rs2144908 and allele G of SNP 1884613 were associated with increased risk for type 2 diabetes, as well as a lower age at diagnosis. The characteristics of the subjects by their genotype at SNP rs2144908 are shown in Table 2. Since SNPs rs1884613 and rs2144908 were in near complete LD (only one individual differed at these two SNPs), only the results for SNP rs2144908 are shown.
As indicated in Table 1, the allele frequencies for five of the SNPs examined in this study are markedly different from those reported in other ethnic groups. The minor alleles reported in the Ashkenazi Jewish and Finnish populations are the major alleles in this Mexican-American population. The direction of effect of the alleles appears the same, however. For example, the at-risk allele A for SNP rs2144908 had a frequency of 0.20–0.27 in the Caucasian populations (4,5) and 0.83 in the Pima Indian population (9) but an intermediate frequency of 0.53 in the SAFDGS and was associated with an increased risk for diabetes in all three populations. Since the Mexican Americans are an admixed population of European and Native-American origin, this difference in allele frequencies may reflect evidence of admixture. A test for hidden population stratification (i.e., admixture), however, was not significant for any of the individual SNPs (P value range of 0.27–0.99).
We further tested whether haplotypes were associated with diabetes and age at diagnosis (Table 3). We elected to conduct a haplotype analysis similar to that of Silander et al. (5) for comparative purposes. The haplotypes defined by SNPs rs2144908, rs2425640, rs3212183, rs1885088, and rs3838247 were compared, as shown in Table 3. The results largely reflect those of the single SNP analyses; i.e., a SNP in the P2 block that is in LD with rs2144908 is associated with type 2 diabetes and age at diagnosis. All haplotypes containing the rs2144908 risk allele A had an estimated relative risk ≥1.00, although only one was statistically significant. The most common haplotype contained the risk allele A at rs2144908 and was not associated with type 2 diabetes or age at diagnosis. In contrast, a less common haplotype (Hap 9 in Table 3) bearing the rs2144908 A allele, and the previously implicated risk alleles at rs2425640 and rs3212183, was modestly associated with type 2 diabetes (P = 0.019) and age at diagnosis (P = 0.004). The haplotype bearing all “protective” alleles (Hap 3 in Table 3) was present at a frequency of 7.8% and was not associated.
Two previous studies (7,10) have identified rare haplotypes within the disequilibrium block encompassing the P2 promoter region, which are significantly associated with type 2 diabetes, although the individual SNPs themselves were not associated. In each study, the associated haplotypes involve rare recombination events between SNP rs2144908 and either rs1884614 or rs1884613. In our study, we observed only four recombination events between these sets of SNPs in our preliminary genotyping dataset (n = 441). The results of haplotype analysis confined to the P2 promoter variants are shown in Table 3. The most common haplotype bears all risk alleles and is significantly associated with increased risk for type 2 diabetes (P = 0.0002, n = 441), while the second most common haplotype bears all protective alleles and is associated with decreased risk for type 2 diabetes (P = 8 × 10−5, n = 441). None of the rare haplotypes were significantly associated with type 2 diabetes or its age at diagnosis. As these rare haplotypes are present in only two to three subjects (one copy only), the point estimate of relative risk is unstable; thus, here we report only the direction of the trend that we observed.
In conclusion, we found that SNPs within the P2 promoter region haplotype block of HNF4A are significantly associated with type 2 diabetes and its age at diagnosis in the Mexican-American cohort of SAFDGS. The alleles associated with increased risk for type 2 diabetes are highly prevalent in this population, which may have aided in the detection of association. The differences in allele frequencies between the Mexican-American, Native-American, and Caucasian populations may also indicate evidence of admixture, which could be confounding the apparent associations but could also account for the stepwise increase in diabetes prevalence from Caucasians to Mexican Americans to Native Americans. Although the results of the quantitative trait disequilibrium test do not indicate any evidence for hidden population stratification (i.e., admixture) in our population, we cannot fully exclude this possibility. Conversely, it is widely accepted that admixture mapping may well be a means to identify disease-associated genes, and our results may indicate that this region may bear a variant that accounts for the increased risk in this ethnic group.
No evidence for linkage to type 2 diabetes or to age at onset of diabetes was observed on chromosome 20q in the prior genome-wide linkage scans of SAFDGS (12,19); however, the association results from this study provide supportive evidence that a variant in the P2 promoter region of HNF4A, or variants in strong LD with this region, contribute to the susceptibility to type 2 diabetes.
Allele frequencies and association analyses of individual SNPs in the HNF4A region in the SAFDGS (n = 697 including 188 diabetic subjects)
kb position* . | SNP . | Major/minor allele . | Minor allele frequency (n typed) . | Type 2 diabetes . | . | Age at diagnosis . | . | ||
---|---|---|---|---|---|---|---|---|---|
. | . | . | . | P value . | Relative risk . | P value . | Effect on age . | ||
−9.422 | rs4810424† | C/G‡ | 0.48 (427) | ||||||
−4.030 | rs1884613 | G/C‡ | 0.47 (697) | 0.003 | 1.31 | 0.003 | Decrease | ||
−3.926 | rs1884614† | T/C‡ | 0.48 (429) | ||||||
1.272 | rs2144908 | A/G‡ | 0.47 (697) | 0.003 | 1.31 | 0.003 | Decrease | ||
43.592 | rs2425640 | G/A | 0.28 (693) | 0.070 | 1.14 | 0.157 | Decrease | ||
50.693 | rs3212183 | T/C | 0.36 (695) | 0.619 | 1.04 | 0.576 | Decrease | ||
54.595 | rs1885088 | G/A | 0.19 (697) | 0.499 | 0.94 | 0.976 | Increase | ||
73.035 | rs3818247 | T/G‡ | 0.37 (697) | 0.529 | 1.23 | 0.725 | Decrease |
kb position* . | SNP . | Major/minor allele . | Minor allele frequency (n typed) . | Type 2 diabetes . | . | Age at diagnosis . | . | ||
---|---|---|---|---|---|---|---|---|---|
. | . | . | . | P value . | Relative risk . | P value . | Effect on age . | ||
−9.422 | rs4810424† | C/G‡ | 0.48 (427) | ||||||
−4.030 | rs1884613 | G/C‡ | 0.47 (697) | 0.003 | 1.31 | 0.003 | Decrease | ||
−3.926 | rs1884614† | T/C‡ | 0.48 (429) | ||||||
1.272 | rs2144908 | A/G‡ | 0.47 (697) | 0.003 | 1.31 | 0.003 | Decrease | ||
43.592 | rs2425640 | G/A | 0.28 (693) | 0.070 | 1.14 | 0.157 | Decrease | ||
50.693 | rs3212183 | T/C | 0.36 (695) | 0.619 | 1.04 | 0.576 | Decrease | ||
54.595 | rs1885088 | G/A | 0.19 (697) | 0.499 | 0.94 | 0.976 | Increase | ||
73.035 | rs3818247 | T/G‡ | 0.37 (697) | 0.529 | 1.23 | 0.725 | Decrease |
Kilobase position from HNF4A P2 promoter translation initiation site at chromosome 20 as reported by Love-Gregory et al. (4) and Silander et al. (5). Risk alleles identified in either Finnish or Ashkenazi Jewish populations are indicated by underline. P values are not adjusted for multiple comparisons. Association P values <0.05 are in bold font. Relative risk is the risk of bearing two copies of major alleles versus the risk for not bearing major alleles when using an additive model.
SNPs not further analyzed due to LD as shown in Figure 1.
Characteristics of SAFDGS subjects by genotype at SNP rs2144908
Genotype . | AA . | AG . | GG . | P value . |
---|---|---|---|---|
Male | 37.3 | 41.0 | 44.5 | 0.1219 |
With diabetes | 34.3 | 26.4 | 19.4 | 0.0026 |
Mean age (years) | 47.6 ± 18.0 | 46.0 ± 16.6 | 44.4 ± 16.7 | 0.1147 |
BMI (kg/m2) | 31.2 ± 7.1 | 31.4 ± 7.6 | 29.7 ± 6.6 | 0.9778 |
Genotype . | AA . | AG . | GG . | P value . |
---|---|---|---|---|
Male | 37.3 | 41.0 | 44.5 | 0.1219 |
With diabetes | 34.3 | 26.4 | 19.4 | 0.0026 |
Mean age (years) | 47.6 ± 18.0 | 46.0 ± 16.6 | 44.4 ± 16.7 | 0.1147 |
BMI (kg/m2) | 31.2 ± 7.1 | 31.4 ± 7.6 | 29.7 ± 6.6 | 0.9778 |
Data are mean ± SD or percent.
Haplotype frequencies and evidence for association in SAFDGS subjects
Hap* . | Haplotypes across genomic region (n = 693) . | . | . | . | . | Frequency in SAFDGS . | Type 2 diabetes . | . | Age at diagnosis . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | rs2144908 . | rs2425640 . | rs3212183 . | rs1885088 . | rs3818247 . | . | P value . | Relative risk . | P value . | Effect on age at diagnosis . | ||||||
1 | A | G | T | G | T | 0.271 | 0.292 | 1.08 | 0.379 | Decrease | ||||||
2 | G | G | C | G | G | 0.098 | 0.949 | 0.99 | 0.896 | Increase | ||||||
3 | G | A | T | G | G | 0.078 | 0.343 | 0.88 | 0.555 | Increase | ||||||
4 | A | A | T | G | T | 0.067 | 0.929 | 1.00 | 0.466 | Decrease | ||||||
5 | G | A | T | G | T | 0.062 | 0.168 | 0.84 | 0.187 | Increase | ||||||
6 | G | G | T | G | T | 0.055 | 0.732 | 0.94 | 0.393 | Increase | ||||||
7 | G | G | C | A | G | 0.053 | 0.697 | 0.94 | 0.391 | Increase | ||||||
8 | A | G | C | A | T | 0.051 | 0.991 | 1.00 | 0.899 | Increase | ||||||
9 | A | G | C | G | G | 0.042 | 0.019 | 2.09 | 0.004 | Decrease | ||||||
10 | G | G | T | G | G | 0.035 | 0.413 | 0.87 | 0.848 | Increase | ||||||
11 | G | A | C | G | G | 0.029 | 0.238 | 0.83 | 0.192 | Increase | ||||||
12 | A | G | T | G | G | 0.025 | 0.533 | 1.24 | 0.668 | Decrease | ||||||
13 | A | G | C | G | T | 0.022 | 0.294 | 1.52 | 0.343 | Decrease | ||||||
14 | G | G | C | G | T | 0.021 | 0.011 | 0.74 | 0.108 | Increase | ||||||
15 | G | G | C | A | T | 0.020 | 0.759 | 1.09 | 0.983 | Decrease |
Hap* . | Haplotypes across genomic region (n = 693) . | . | . | . | . | Frequency in SAFDGS . | Type 2 diabetes . | . | Age at diagnosis . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | rs2144908 . | rs2425640 . | rs3212183 . | rs1885088 . | rs3818247 . | . | P value . | Relative risk . | P value . | Effect on age at diagnosis . | ||||||
1 | A | G | T | G | T | 0.271 | 0.292 | 1.08 | 0.379 | Decrease | ||||||
2 | G | G | C | G | G | 0.098 | 0.949 | 0.99 | 0.896 | Increase | ||||||
3 | G | A | T | G | G | 0.078 | 0.343 | 0.88 | 0.555 | Increase | ||||||
4 | A | A | T | G | T | 0.067 | 0.929 | 1.00 | 0.466 | Decrease | ||||||
5 | G | A | T | G | T | 0.062 | 0.168 | 0.84 | 0.187 | Increase | ||||||
6 | G | G | T | G | T | 0.055 | 0.732 | 0.94 | 0.393 | Increase | ||||||
7 | G | G | C | A | G | 0.053 | 0.697 | 0.94 | 0.391 | Increase | ||||||
8 | A | G | C | A | T | 0.051 | 0.991 | 1.00 | 0.899 | Increase | ||||||
9 | A | G | C | G | G | 0.042 | 0.019 | 2.09 | 0.004 | Decrease | ||||||
10 | G | G | T | G | G | 0.035 | 0.413 | 0.87 | 0.848 | Increase | ||||||
11 | G | A | C | G | G | 0.029 | 0.238 | 0.83 | 0.192 | Increase | ||||||
12 | A | G | T | G | G | 0.025 | 0.533 | 1.24 | 0.668 | Decrease | ||||||
13 | A | G | C | G | T | 0.022 | 0.294 | 1.52 | 0.343 | Decrease | ||||||
14 | G | G | C | G | T | 0.021 | 0.011 | 0.74 | 0.108 | Increase | ||||||
15 | G | G | C | A | T | 0.020 | 0.759 | 1.09 | 0.983 | Decrease |
. | Haplotypes in P2 promoter block (n = 441) . | . | . | . | . | . | . | . | . | . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | rs4810424 . | rs1884613 . | rs1884614 . | rs2144908 . | . | . | . | . | . | . | |||
1 | C | G | T | A | 0.516 | 2 × 10−4 | 1.23 | 1 × 10−4 | Decrease | ||||
2 | G | C | C | G | 0.472 | 8 × 10−5 | 0.56 | 3 × 10−6 | Increase | ||||
3 | G | C | T | A | 0.004 | 0.326 | Increased | 0.1815 | Decrease | ||||
4 | C | G | C | G | 0.003 | 0.326 | Increased | 0.1819 | Decrease | ||||
5 | C | G | T | G | 0.002 | 0.156 | Increased | 0.1128 | Decrease | ||||
6 | G | G | C | A | 0.002 | 0.751 | Decreased | 0.856 | Increase |
. | Haplotypes in P2 promoter block (n = 441) . | . | . | . | . | . | . | . | . | . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | rs4810424 . | rs1884613 . | rs1884614 . | rs2144908 . | . | . | . | . | . | . | |||
1 | C | G | T | A | 0.516 | 2 × 10−4 | 1.23 | 1 × 10−4 | Decrease | ||||
2 | G | C | C | G | 0.472 | 8 × 10−5 | 0.56 | 3 × 10−6 | Increase | ||||
3 | G | C | T | A | 0.004 | 0.326 | Increased | 0.1815 | Decrease | ||||
4 | C | G | C | G | 0.003 | 0.326 | Increased | 0.1819 | Decrease | ||||
5 | C | G | T | G | 0.002 | 0.156 | Increased | 0.1128 | Decrease | ||||
6 | G | G | C | A | 0.002 | 0.751 | Decreased | 0.856 | Increase |
Observed haplotypes (Hap) numbered by decreasing frequency. Risk alleles as identified in the Finnish and Ashkenazi Jewish populations are underlined. P values are not adjusted for multiple comparisons. Association P values <0.05 are in bold font. Relative risk is the ratio of the risk for carrying two copies of the haplotype to risk when not having the haplotype; for rare haplotypes in P2 promoter block, only risk direction is given. Decrease: age at diagnosis is lower in subjects bearing the haplotype than in subjects not carrying the haplotype.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Article Information
This research was supported by grants from the National Institutes of Health (NIH) (R01-DK-42273, R01-DK-47482, R01-DK-53889, MH-59490, and P50DK061597 from the George O’Brien Kidney Research Center and K01-DK064867 [to K.J.H.]) and a Junior Faculty Award from the American Diabetes Association (to D.M.L.).
Genome-scan genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from NIH to Johns Hopkins University (contract no. N01-HG-65403).
We thank the family members of SAFDGS and are grateful for their participation and cooperation. We also thank Jeanette Hamlington and Korri Weldon for technical assistance on this project.