Type 2 diabetes (T2D) is a complex disorder in which both genetic and environmental risk factors contribute to islet dysfunction and failure. Genome-wide association studies (GWAS) have linked single nucleotide polymorphisms (SNPs), most of which are noncoding, in >200 loci to islet dysfunction and T2D. Identification of the putative causal variants and their target genes and whether they lead to gain or loss of function remains challenging. Here, we profiled chromatin accessibility in pancreatic islet samples from 19 genotyped individuals and identified 2,949 SNPs associated with in vivo cis-regulatory element use (i.e., chromatin accessibility quantitative trait loci [caQTL]). Among the caQTLs tested (n = 13) using luciferase reporter assays in MIN6 β-cells, more than half exhibited effects on enhancer activity that were consistent with in vivo chromatin accessibility changes. Importantly, islet caQTL analysis nominated putative causal SNPs in 13 T2D-associated GWAS loci, linking 7 and 6 T2D risk alleles, respectively, to gain or loss of in vivo chromatin accessibility. By investigating the effect of genetic variants on chromatin accessibility in islets, this study is an important step forward in translating T2D-associated GWAS SNP into functional molecular consequences.
Type 2 diabetes (T2D) is a complex disease resulting from the combined effects of an individual’s genetic predisposition and environmental exposures (1,2). It ultimately manifests when islets cannot secrete sufficient insulin to compensate for insulin resistance in peripheral tissues (3). Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) in >200 loci that confer genetic susceptibility to T2D and/or alter quantitative measures of islet (dys)function (4,5). These SNPs are predominantly noncoding (∼90%) and enriched within islet-specific cis-regulatory elements (cis-REs) (6–9), implicating perturbed islet transcription in T2D etiology (2). However, identifying the causal variants in each T2D-associated GWAS locus, their molecular effects, and the genes and pathways they affect remains critical to translate genetic associations into mechanistic understanding and treatments.
Quantitative trait locus (QTL) analyses have linked common genetic variants to in vivo gene expression changes (eQTL) for multiple cell types (10), including islets (8,11,12). However, eQTLs cannot pinpoint the causal variants among the multiple SNPs in linkage disequilibrium (LD) with each other. QTL approaches have recently been applied in cell lines to link genetic variation to epigenomic changes, such as DNaseI sensitivity (13), chromatin accessibility (caQTLs) (14–16), and histone modification levels (17). However, little is known about how genetic variation affects epigenomes of clinically relevant primary tissues such as islets.
In this study, we used the Assay for Transposase-Accessible Chromatin-sequencing (ATAC-seq) (18) to profile genome-wide chromatin accessibility in islets from 19 individuals (14 without diabetes [ND] and 5 with T2D). Using caQTL analysis, we identified genetic variants altering in vivo chromatin accessibility in islets and exhibiting concordant effects on in vitro luciferase reporter activity. Finally, we identified putative causal variants altering islet chromatin accessibility in 13 distinct T2D-associated GWAS loci. Together, this study provides a road map for translating T2D-associated GWAS SNPs into functional molecular effects.
Research Design and Methods
Study Subjects and Islet Culture
Fresh human cadaveric pancreatic islets were procured from Prodo Laboratories or the Integrated Islet Distribution Program (Supplementary Table 1) and processed according to institutional review board–approved protocols. Upon receipt, cells were transferred into PIM(S) media supplemented with PIM(ABS) and PIM(G) (Prodo Laboratories) and incubated overnight in a T-150 nontissue culture–treated flask (VWR) at 37°C and 5% CO2 overnight. The following day, nuclei and total RNA were isolated for ATAC-seq and RNA-seq library preparation and analysis (8). Genomic DNA was isolated from islet explant cultures using Qiagen DNeasy Blood & Tissue Kit as previously described (8).
Islet ATAC-seq libraries were prepared as previously described (8) from 22 donors. Per donor, three replicates, each consisting of 50–100 islet equivalents (50,000–100,000 cells), were transposed. Libraries were barcoded, pooled into three-donor batches (corresponding to nine barcoded transposition reactions), and sequenced using 2 × 75 bp Illumina NextSeq 500 to an average depth of 62.6 (± 18.6) million paired-end reads per donor (Supplementary Table 2). Low quality portions of reads were trimmed using Trimmomatic (19) and aligned to the hg19 human genome assembly using Burrows-Wheeler Aligner-MEM (20). For each replicate, reads were shifted as previously described and duplicate reads were removed (21,22). Technical replicates were merged using SAMtools after confirming high correlation between them. Open chromatin regions (OCRs) were called for each islet sample using MACS2 (23) (with parameters -callpeak --nomodel -f BAMPE) at a false discovery rate (FDR) of 1%. Islet ATAC-seq samples with less than 30,000 OCRs were excluded from further analyses, yielding data for 19 individuals. OCRs on sex chromosomes and those overlapping low mapability regions (blacklisted regions available from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeMapability/) were excluded, resulting in 154,437 autosomal OCRs detected in at least one individual using R package DiffBind (24). deepTools was used to generate bedgraph files for UCSC Genome Browser sessions (with parameters --normalizeUsingRPKM --centerReads --scaleFactor = 1 -bs = 25).
OCR Chromatin State Annotations
Previously described chromatin states for islets (8), ENCODE, and National Institutes of Health Roadmap Epigenomics (25) cells/tissues were used to annotate islet OCRs and visualized using ggplot2 (26). OCRs overlapping ≥2 different chromatin states were assigned a single state using the following hierarchy: Active transcription start site (TSS) > Bivalent TSS > Weak TSS > Flanking TSS > Active Enhancer-1 > Active Enhancer-2 > Weak Enhancer > Genic Enhancer > Strong Transcription > Weak Transcription > Repressed Polycomb > Weak Repressed Polycomb > Quiescent. Previously described stretch enhancer (SE) regions (6,8) were overlapped with islet OCRs and tested for enrichment using the Fisher exact test. For each tissue-specific test, the background set comprised SEs from all other tissues (n = 30).
Genotyping, Imputation, and caQTL Analysis
Each islet donor was genotyped using Illumina Infinium Omni2.5Exome (n = 11) or Omni5Exome (n = 8) BeadChip arrays (Supplementary Table 1). We mapped Illumina array probe sequences to the hg19 genome assembly using Burrows-Wheeler Aligner. SNPs with ambiguous probe alignments, 1000 Genomes (1000G) phase 1 variants with minor allele frequency of ≥1% within 7 bp of the 3′ end of probes, or call rates <95% were excluded. All alleles were oriented relative to the reference. Genotype calls were merged using vcftools/0.1.12a suite (vcf-merge command). After removing SNPs with missing data (--max-missing 1), ∼2.4 million SNPs were used for imputation (1000G phase 3 version 5 ) and phasing (Eagle version 2.3 ) using the Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html) (29).
VerifyBamID (30) was used to match ATAC-seq bam files to individuals’ genotypes to eliminate the possibility of a sample swap. Islet OCRs overlapping only monomorphic SNPs were removed from caQTL analyses, yielding 84,499 OCRs. Allele-specific counts were obtained for 195,207 SNPs within these OCRs, and caQTLs were detected using RASQUAL (15). To minimize confounding factors such as batch effects, we adopted the strategy described in Kumasaka et al. (15) and used the first five principal components as covariates in the RASQUAL model. Significant caQTLs were identified using a two-stage multiple hypothesis testing correction (15): 1) correcting for the multiple SNPs tested within each OCR using Bonferroni correction, and 2) then correcting for the number of OCRs tested genome wide by controlling FDR at 10% using RASQUAL’s permutation test (“--random-permutation”) 50 times.
To visualize chromatin accessibility patterns at caQTLs, first we calculated the number of ATAC-seq reads (normalized with respect to library size) spanning each base pair for all 19 samples using BEDTools (“genomecov”). Next, islet donors were grouped based on their genotypes for each displayed caQTL; average read counts were calculated for each genotype group and plotted using the “polygon” function in R.
Differential OCR Analyses
Differential chromatin accessibility analyses were conducted between islet ATAC-seq profiles of five T2D and five ND individuals with the most comparable demographics (Supplementary Table 3). To identify statistically significant T2D disease state–associated chromatin accessibility changes, only OCRs meeting the following criteria were used for differential analyses (n = 52,387): 1) present in ≥3 islet donors and 2) present in ≥1 T2D and ≥2 ND (or ≥1 ND and ≥2 T2D) individuals. Race, sex, and significant surrogate variables (n = 2) from surrogate variable analysis (SVA) (31) were used as covariates to minimize confounding factors. edgeR (32) R package was used to identify differentially accessible OCRs.
GWAS SNP Enrichment in Islet caQTLs
The NHGRI-EBI GWAS Catalog of GWAS index SNPs for 184 diseases/traits was retrieved on 19 January 2017 (https://www.ebi.ac.uk/gwas/) and LD-pruned using PLINK (33) version 1.9 (parameters --maf 0.05 --clump --clump-p1 0.0001 --clump-p2 0.01 --clump-r2 0.2 --clump-kb 1000) to avoid testing enrichment for multiple SNPs representing the same genetic association signal/locus per trait. For index SNPs exhibiting pairwise correlation r2 >0.2, only the SNP with the more significant P value was retained. We used GREGOR (34) on this LD-pruned list to determine whether GWAS index or linked SNPs (r2 >0.8, LD window size = 1 Mb, minimum neighbor number = 500) were enriched in islet caQTLs or differentially accessible OCRs.
Transcription Factor Motif Enrichments
Homer (35) findMotifsGenome.pl script (parameters: hg19, –size given) was used to identify transcription factor (TF) motifs enriched in islet OCRs. We compared motifs in OCRs that are accessible only in islets (n = 40,271 islet-specific OCRs) to motifs in OCRs that are also accessible in adipose, CD4+ T, GM12878, or peripheral blood mononuclear cells (PBMCs) (n = 41,639 shared/background OCRs) (Fig. 1C). Motifs enriched in caQTL-containing OCRs (Fig. 2D) were identified by comparing caQTL OCRs (n = 2,949) to all islet OCRs (n = 154,437). TFs were clustered based on the similarity of their position weight matrices (PWMs) using Kullback-Leibler divergence method implemented in TFBSTools (36). Motif enrichments for differential OCRs (n = 1,515) were calculated against all OCRs used in differential analysis (n = 52,387).
Total RNA was isolated from each islet sample using TRIzol (8). Stranded RNA-seq libraries were prepared from total RNA using the TruSeq Stranded mRNA kit (Illumina) for the 19 individuals with high-quality ATAC-seq data; External RNA Controls Consortium (ERCC) Mix 1 or Mix 2 spike-ins were randomly added to each sample (Thermo Fisher, catalog #4456740) (Supplementary Table 4). RNA-seq from 10 islet samples used in differential analyses were sequenced together on an Illumina NextSeq 500 to minimize batch effects, whereas the remaining nine samples were sequenced on Illumina HiSeq 2500, each to an average sequencing depth of 87.2 (±27.8) million paired-end reads (Supplementary Table 4). Paired-end RNA-seq reads were trimmed to remove low-quality base calls using Trimmomatic (19). Bowtie2 (37) and RSEM (38) (rsem-calculate-expression) were used to determine fragments per kilobase of transcript per million mapped reads (FPKM) and expected read counts for all Ensembl hg19 Release 70 transcripts.
Differential Gene Expression Analyses
RNA-seq data from 10 islet samples (Supplementary Table 3) were used for differential expression analysis. Expected read counts for autosomal genes with FPKM >5 in ≥3 RNA-seq samples (n = 10,116) were used in differential analyses based on edgeR (32) models (FDR 10%). Race, sex, ERCC spike-in, and significant surrogate variables (n = 1) from SVA were used as covariates to minimize the impact of confounding factors on T2D disease state–associated gene expression changes.
RSEM expected read counts (38) for 9,656 expressed genes (median FPKM >5) were used to identify islet eQTLs from 19 donors using RASQUAL (15). Only SNPs within the gene body or within 50 kb flanking the gene body were tested. To minimize potential batch effects, we adopted the strategy described in Kumasaka et al. (15) and used the first four principal components, in addition to age, sex, race, T2D status, and sequencing date as covariates in the RASQUAL model. A two-stage multiple hypothesis testing correction (15) was used to determine significant eQTLs similar to caQTLs, where only 10 permutation tests were used in step two.
Islet caQTL-eQTL Overlaps
Quantile-quantile (QQ) plots for caQTL P values were generated against the expectation of a uniform P value distribution between 0 and 1. The QQ plot was generated for islet eQTL SNPs from 112 individuals (8) and caQTL SNPs from 19 individuals by conditioning on lead caQTL SNPs that were either statistically significant at FDR 10% or background sets of randomly selected nonsignificant ones. Random sets of nonsignificant SNPs (n = 2,545) were generated 10 times to eliminate sampling bias; a representative result from one random set is shown in Fig. 2F and Supplementary Fig. 2G.
Gateway Cloning of Selected Islet caQTL Sequences and Alleles
Islet genomic DNA from individuals homozygous for the reference or alternate allele was used as templates to PCR amplify sequences containing each allele for 13 islet caQTLs (Supplementary Table 5). The corresponding 26 PCR amplicons were cloned into the pDONR201 vector using BP Clonase (Invitrogen). Sequences were validated by Sanger sequencing. Each islet caQTL sequence was transferred from pDONR201 into the Gateway-modified pGL4.23F plasmid (39) with LR Clonase.
Luciferase Reporter Assays
MIN6 cells were seeded in 96-well plates at a density of 60,000 cells per well 24 h prior to transfection as previously described (39). Gateway-modified Firefly (0.072 pmol) (pGL4.23, Promega) plasmid containing each islet caQTL sequence (Supplementary Table 5) and 2 ng Renilla (pRL-TK, Promega) plasmid were cotransfected in triplicate using Lipofectamine 2000 Transfection reagent (Life Technologies). The Dual Luciferase Reporter Assay system (Promega) was used to determine Firefly and Renilla luciferase activity in each sample. Cells were lysed with 1× passive lysis buffer 38–40 h after transfection. Luminescence was measured on a Synergy 2 Microplate Reader (BioTek). Firefly values were normalized to Renilla to control for differences in cell number or transfection efficiency.
Human Pancreatic Islet Chromatin Accessibility Maps
To determine the genome-wide location of cis-REs in human islets, we generated high-quality ATAC-seq profiles from 19 islet donors (Supplementary Fig. 1A, Supplementary Tables 1 and 2). Investigating chromatin accessibility near the NKX6.1 locus, a well-characterized β-cell–specific TF, revealed both OCRs unique to islet samples (Fig. 1A, orange and green rectangles) and OCRs shared with other cell types (22,40) (Fig. 1A, gray rectangle). Overall, chromatin accessibility profiles from 19 islets were highly correlated to each other and to those from sorted islet α- and β-cells (Fig. 1B and Supplementary Fig. 1B) (41). Notably, ATAC-seq profiles from T2D donors (n = 5; Fig. 1B, asterisks) did not cluster separately from ND donors, suggesting that the T2D disease state does not lead to global remodeling of human islet chromatin accessibility.
Collectively, we identified 154,437 islet OCRs accessible in at least one of the 19 individuals (Supplementary Table 6). Comparison with reported chromatin state annotations in human islets (6,8) assigned 12.9% and 23.14% of these OCRs as putative promoters and enhancers, respectively (Supplementary Fig. 1C). Putative promoter OCRs were shared with several of 30 tissues profiled by the National Institutes of Health Roadmap Epigenomics project (25). Putative enhancer OCRs were more specific to islets, consistent with previous observations of cell type specificity of enhancers (42). To further assess the islet specificity of detected OCRs, we compared them to SEs, which are long (>3 kb) contiguous enhancer chromatin states that govern cell-specific functions and often harbor disease-associated SNPs relevant to the cognate cell type (6). The majority (90%) of islet SEs overlapped islet OCRs (Supplementary Fig. 1D), significantly greater than overlaps observed between islet OCRs and SEs in other tissues (Fisher exact test P < 2.2 × 10−16). As anticipated, DNA sequence binding motifs of islet-specific TFs, such as PDX1 and NKX6.1, were significantly enriched in OCRs that are accessible in islet samples and not in GM12878, PBMCs, CD4+ T cells, skeletal muscle, or adipose tissues (Fig. 1C). Together, these observations indicate that high-quality chromatin accessibility maps of islets from multiple individuals reveal cis-REs (OCRs) important for islet-specific gene regulation.
Only 10% (n = 15,917) of islet OCRs were detected in all 19 individuals (Fig. 1D), which were overwhelmingly annotated as promoters (Fig. 1E, red bars). In contrast, 39.3% (n = 60,713) of OCRs were detected in only 1 out of 19 individuals (Fig. 1D) and were found predominantly (45%) in quiescent/low signal chromatin states (Fig. 1E, white bars). Though we cannot eliminate the possibility of false positives in OCR detection, these might also represent individual-specific enhancers missed in reference islet chromatin states, as references were based on data from 2–3 individuals. OCRs detected in 2–18 individuals (Fig. 1D) were mostly enhancers (Fig. 1E, orange/yellow bars), suggesting that genetic differences (i.e., SNPs) between individuals may alter the chromatin accessibility, and therefore the activity, of human islet enhancers.
Genetics of Chromatin Accessibility in Human Islets
To identify genetic variants (SNPs and small insertions/deletions) that alter chromatin accessibility of islet OCRs in which they reside (Fig. 2A), we genotyped islet samples and conducted caQTL analysis using RASQUAL (15), a method that can discover QTLs from small sample sizes. Using RASQUAL, we identified 2,949 SNPs associated with increased or decreased chromatin accessibility at FDR 10% (Supplementary Fig. 2A, Supplementary Table 7) from 19 islet samples. For example, the rs488797 C allele was associated with reduced OCR accessibility in an islet SE in the intron of CELF4 (Fig. 2B), a gene selectively expressed in islets (8,40). CC homozygous islet donors exhibited dramatically lower accessibility than CT or TT genotypes (Fig. 2B, compare blue CC, pink CT, and green TT profiles). Moreover, ATAC-seq sequences overlapping rs488797 in CT heterozygous samples almost exclusively contained the T allele (Fig. 2B, inset), reinforcing genetics as a strong determinant of chromatin accessibility at this OCR.
The rs488797 C allele is predicted to disrupt FOXA2 binding (Fig. 2B, compare top sequence between brackets to FOXA2 binding motif below). To test this, we analyzed FOXA2 ChIP-seq data from two islet donors (HI101 and HI32) (7) (Fig. 2C). We leveraged FOXA2 ChIP-seq reads and genetic LD to infer genotypes of these individuals in this region. As the caQTL SNP rs488797 alters in vivo islet chromatin accessibility, we imputed its genotype using a linked SNP rs648005 (T/C) (r2 = 0.99 with rs488797). rs648005 overlaps a distinct OCR and a FOXA2 binding site 8,178 nucleotides away but is neither an islet caQTL nor is predicted to disrupt a FOXA2 motif. In HI101, FOXA2 ChIP-seq reads overlapping rs648005 contained both C and T alleles (Fig. 2C, top), indicating that HI101 is heterozygous at rs648005 and, by extension, at rs488797 with high probability. However, FOXA2 ChIP-seq reads overlapping the caQTL SNP rs488797 exclusively contained the T allele, consistent with the islet caQTL analysis and supporting FOXA2 motif disruption predictions. In HI32, FOXA2 ChIP-seq reads at rs648005 contained only the T allele, suggesting that this individual is a TT homozygote at rs648005, and therefore a CC homozygote at rs488797 with high probability. Notably, no FOXA2 binding is observed at rs488797 for HI32, providing further support that the C allele disrupts FOXA2 binding. Supplementary Table 8 provides predicted motif disruptions from HaploReg (43) for all islet caQTL including rs488797.
Islet caQTLs were uniformly distributed across the autosomal chromosomes (Supplementary Fig. 2B), and the majority (>98%) were located within 200 kb flanking the TSS of the nearest islet-expressed gene (Supplementary Fig. 2C). Twelve percent of islet caQTLs were in promoters, whereas 30% overlapped enhancers (Supplementary Fig. 2D). Islet caQTLs were exclusively enriched in islet SEs compared with SEs in other tissues (Supplementary Fig. 2E). Finally, sequence motifs for islet-specific TFs, such as FOXA2, NKX6.1, and PDX1, were enriched in caQTL OCRs (Fig. 2D). To validate this, we overlapped caQTL OCRs with ChIP-seq data from human islets for islet-specific TFs and ubiquitous CTCF (7). We found that FOXA2, NKX6.1, and PDX1 binding (i.e., ChIP-seq peaks) were enriched at caQTLs (Supplementary Fig. 2F), in contrast to CTCF, whose binding sites were not enriched at islet caQTLs. Together, these results suggest that motif enrichment analyses likely reflect actual binding of these TFs at caQTL OCRs. Surprisingly, sequence motifs of oxidative stress-responsive TFs, such as BACH1, BACH2, and NRF2, were also enriched in caQTL OCRs, suggesting that some caQTLs may modulate stress/stimulus-responsive cis-RE activity.
To determine if caQTL alleles altering in vivo chromatin accessibility elicit concordant effects on in vitro enhancer activity, we selected a subset of caQTLs (n = 13) that were nearby genes exhibiting islet-specific expression (8) (e.g., Fig. 2B). We cloned DNA sequences containing each islet caQTL allele (Supplementary Table 5) and measured their enhancer activity using luciferase reporter assays in MIN6 mouse β-cells. We observed allelic effects on luciferase activity for 8 of the 13 caQTLs tested (Fig. 2E). Importantly, for all 8 caQTLs, the allele that increased in vivo chromatin accessibility also increased in vitro enhancer activity (Fig. 2E). Finally, we studied whether caQTL variants were also associated with variability in islet gene expression levels using islet eQTL data from this cohort (n = 19). As shown in Fig. 2F, caQTL variants exhibited more significant allelic effects on islet gene expression than randomly selected variants in OCRs (noncaQTLs). We observed the same trend comparing these caQTLs to eQTLs detected in a larger independent cohort (n = 112) (Supplementary Fig. 2G) (8). Importantly, for 84% of caQTL-eQTL pairs in our cohort (37/44) (Supplementary Fig. 2H), we observed a concordant direction of effect (Pearson r = 0.691), i.e., higher chromatin accessibility is associated with increased gene expression and vice versa (Supplementary Fig. 2H; Q1 and Q3), linking chromatin accessibility effects of these variants to downstream changes in islet gene expression.
Chromatin Accessibility Changes in T2D Versus ND Islet Samples
To assess potential environmental effects of T2D disease state on the islet epigenome, we compared chromatin accessibility between five T2D donors and five ND donors with comparable demographics, e.g., age, race, sex) (Supplementary Fig. 3A, Supplementary Table 3). After completing SVA (31) to remove unwanted variation in the data, e.g., batch effect, sex, postmortem interval, drug treatment (Supplementary Fig. 3B), we identified 1,515 of 52,387 (2.8%) OCRs that were differentially accessible between T2D and ND islet samples at FDR 10% (see research design and methods, Fig. 3A, and Supplementary Fig. 3C; 609 at FDR 5%, and 79 at FDR 1%), where 714 have increased (opening OCRs) and 801 have decreased (closing OCRs) accessibility in T2D compared with ND samples (Fig. 3A, Supplementary Table 9). There was a remarkable difference in the chromatin state annotation of opening and closing OCRs. Closing OCRs, e.g., the one highlighted near BHLHE41 (Fig. 3B, gray rectangle), mostly overlapped enhancers (48%), whereas opening OCRs extensively overlapped promoters (70%) (Fig. 3C). This difference was also reflected in TF motif enrichments, where opening and closing OCRs were enriched for distinct motifs (Supplementary Fig. 3D). Interestingly, motifs for PDX1 and TFs that regulate stress responses, such as ATF3/JUNB, AP-1, and BACH1, were enriched in closing OCRs, which may represent epigenomic signatures of previously described molecular perturbations in dysfunctional and T2D islets, including PDX1 export from the nucleus (44), perturbation of oxidative stress responses (45,46), and inactivation of β-cell survival pathways (47).
The overwhelming majority (>99%) of T2D disease state–associated changes in chromatin accessibility were quantitative, not qualitative, i.e., OCRs did not completely appear/disappear with T2D disease state (Fig. 3D). A total of 654 genes were associated with opening OCRs (predominantly enhancers), and 622 genes were associated with closing OCRs (predominantly promoters). Differentially accessible OCRs at gene promoters exhibited modest positive correlation with the corresponding gene’s expression (Supplementary Fig. 3E). T2D disease state–associated OCRs were not enriched for any GO terms or KEGG/Wiki pathways. Differential gene expression analyses from the same ND and T2D samples revealed few significant changes (Supplementary Table 9), where only 120 and 54 genes were up- or downregulated, respectively, with T2D disease state at FDR 10% (Supplementary Table 10).
Finally, given the significant impact of genetics on islet chromatin accessibility, we asked which T2D disease state–associated chromatin accessibility changes may be driven by genetic differences. Interestingly, 6% of the differentially accessible OCRs overlapped islet caQTLs (39 opening OCRs, 51 closing OCRs) (Fig. 3D, Supplementary Fig. 3F), including the opening OCR that contains the caQTL variant rs1463768. Four of five T2D islet samples were heterozygous or homozygous for opening G allele for this variant, whereas all five ND donors were homozygous for the closing T allele (Fig. 3E). rs1463768-containing sequences did not show allelic differences in luciferase reporter activity in MIN6 cells (Fig. 2E). Therefore, it remains uncertain whether genotype, environment (i.e., T2D disease state), or genotype–environment interactions are responsible for islet chromatin accessibility changes at this and other overlapping loci.
T2D-Associated GWAS SNPs Altering Islet Chromatin Accessibility
The vast majority (>90%) of GWAS SNPs associated with T2D (4,48) and metabolic measures of islet (dys)function (49,50) are noncoding and overlap islet SEs (6,7). To test whether T2D- and islet (dys)function–associated GWAS SNPs alter chromatin accessibility in islets, we assessed overlaps of GWAS index and linked SNPs (see research design and methods) with islet caQTLs. Among 184 diverse trait- and disease-associated SNP sets tested, only those associated with T2D (2.97 fold), fasting plasma glucose (13.46 fold), and BMI-adjusted fasting glucose–related (7.43 fold) traits were significantly enriched in islet caQTLs (Fig. 4A; P < 5.43 × 10−04, FDR 5%). In contrast, DNaseI sensitivity QTLs (13) in lymphoblastoid cell lines were enriched for mostly autoimmune disease–associated GWAS SNPs (Supplementary Fig. 4A), emphasizing the specificity of T2D-associated GWAS SNP enrichments in islet caQTLs.
We identified SNPs in 13 T2D-associated loci that alter islet chromatin accessibility, thereby nominating these as putative causal/functional SNPs (Fig. 4B, Supplementary Fig. 4B). caQTL SNP alleles for 4 of 13 T2D-associated loci (ADCY5, ZMIZ1, MTNR1B, RNF6) were previously linked to altered in vitro enhancer activity (51), in vivo chromatin accessibility (52), or in vivo steady state gene expression in islets (8,11,12,53). Importantly, T2D-associated risk alleles for these four loci exhibit concordant effects on chromatin accessibility and gene expression in islets, i.e., same direction of effect (Fig. 4B). For 6 of 13 T2D-associated caQTLs, the risk allele decreased chromatin accessibility, designated as loss of function (Fig. 4B). This included the T2D-associated caQTL SNP rs11708067 in the third intron of ADCY5, which overlaps an islet SE. The risk allele for this variant is associated with reduced chromatin accessibility (Fig. 4C), consistent with recent reports linking the rs11708067 risk allele to decreased transcriptional reporter activity in rodent β-cells (MIN6 and 832/13) and to decreased ADCY5 expression in ND human islets in vivo (12,51). The T2D risk allele was associated with increased chromatin accessibility for the remaining 7 of 13 islet caQTLs (Fig. 4B), designated as gain of function. For example, the T2D risk allele A at rs6937795 increased in vivo islet chromatin accessibility in the IL20RA locus (Fig. 4D) and conferred 2.5-fold higher transcriptional reporter activity than the nonrisk C allele (Fig. 2E). Although targeted approaches are required to establish causality, our analyses nominate rs6937795 as a strong candidate for causal SNP in the T2D-associated IL20RA locus.
In this study, we integrated ATAC-seq data and genotypes from 19 islet donors to link 2,949 SNPs with altered in vivo chromatin accessibility. Allelic effects on in vivo chromatin accessibility correlated well with effects on in vitro enhancer activity; 8 of 13 caQTLs tested showed concordant allelic effects in luciferase reporter assays. Although we cannot eliminate the possibility of false-positive associations for the remaining 5 caQTLs, these loci may also represent 1) α-cell–specific, 2) species-specific, or 3) poised/primed cis-REs (16), which need to be tested in future studies in human cells.
The data suggest that islet caQTL variants modulate regulatory programs important for islet cell identity and function. They were enriched in islet-specific TF motifs, TF ChIP-seq peaks (7), and islet SEs (6). They were specific to islets, as only 2.3% (68/2,949) of the islet caQTL variants altered chromatin accessibility in induced pluripotent stem cells or macrophages (data not shown) (16,54). Islet caQTL SNPs were linked to more significant effects on islet gene expression levels than variants that do not significantly impact chromatin accessibility (i.e., noncaQTL SNPs). Increasing the cohort size and separating islet cell types in future studies should lead to increased convergence between islet caQTLs and islet eQTLs. Furthermore, studying islets under stress conditions could identify and link primed enhancers and response eQTLs, which have been reported in other cell types (16,55).
T2D disease state–associated changes in chromatin accessibility were limited and quantitative, i.e., few OCRs completely lost or gained accessibility with T2D, suggesting that the T2D disease state does not lead to extensive remodeling of steady-state chromatin accessibility in islets. However, we acknowledge that T2D disease state–associated epigenetic changes may be masked by multiple factors, including 1) relatively low HbA1c values for T2D donors (5.3–7.4%); 2) cell type–specific changes hidden by whole-islet measurements; 3) steady-state, normoglycemic culture conditions of the islets that may mask changes elicited by the diabetic state; and 4) limited power due to cohort size (n = 10) and genetic diversity. We found that 6% of differentially accessible OCRs associated with T2D disease state overlapped caQTLs. Future studies integrating genotype and environment and their interaction in larger, genetically stratified cohorts should contribute to more precise understanding of epigenomic changes associated with T2D disease state.
This study demonstrates the utility of using islet caQTL analyses to identify and prioritize putative functional variants among hundreds of linked, “credible set” T2D-associated SNPs (4,9,48). Even with a relatively small cohort (n = 19), we identified putative causal variants at 13 T2D GWAS loci, based on their chromatin accessibility effects. These include four loci (ADCY5, MTNR1B, RNF6, and ZMIZ1) in which the same or linked (r2 > 0.8) SNP functions as an islet eQTL (8,11,12,53). Importantly, the risk allele exhibited concordant effects on islet chromatin accessibility and gene expression for each locus. Finally, we identified allelic effects on both in vivo and in vitro islet enhancer activity for multiple new loci, such as rs6937795 in the IL20RA locus, and linked the risk alleles at each locus to increased or decreased activity. This study provides new understanding of genetic variant effects on islet chromatin accessibility and enumerates targets for site-specific and hypothesis-driven investigation.
Acknowledgments. The authors are indebted to the anonymous pancreatic islet organ donors and their families, without whom this entire study would not be possible. A subset of human pancreatic islets was provided by the National Institute of Diabetes and Digestive and Kidney Diseases–funded Integrated Islet Distribution Program at City of Hope (grant 2UC4DK098085). The authors thank Jane Cha, Jackson Laboratory for Genomic Medicine, for help generating artwork for the figures; members of the Stitzel and Ucar laboratories for helpful discussion and critiques during study design and execution; and Taneli Helenius, Jackson Laboratory for Genomic Medicine, and anonymous reviewers, whose comments, questions, and suggested edits greatly improved the quality and clarity of the manuscript.
Funding. This study was made possible by generous financial support from The Jackson Laboratory startup funds (to M.L.S. and D.U.); the Doug Coleman Research Fund at The Jackson Laboratory; the National Institute of Diabetes and Digestive and Kidney Diseases under award number DK092251 (to M.L.S.); the Assistant Secretary of Defense for Health Affairs, through the Peer Reviewed Medical Research Program under award number W81XWH-16-1-0130 (to M.L.S.); and the American Diabetes Association Pathway to Stop Diabetes Accelerator Award (1-18-ACE-15) (to M.L.S.).
Opinions, interpretations, conclusions, and recommendations are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health, Department of Defense, or American Diabetes Association.
Duality of Interest. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. S.K., D.U., and M.L.S. conceived the study and designed experiments. R.K. and M.L.S. collected and prepared each islet sample for genotyping and sequencing. S.K. analyzed the data. A.Y., N.L., and E.J.M. contributed to bioinformatics and statistical analyses of the data. S.K. and A.J. cloned and tested caQTL allelic effects using luciferase reporters. S.K., D.U., and M.L.S. wrote the manuscript. All authors reviewed and edited the final manuscript. M.L.S. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Data Availability. The accession number for human islet ATAC-seq and RNA-seq data reported in this article is NCBI Sequence Read Archive: SRP117935.
Prior Presentation. Parts of this study were presented at the Pancreatic Diseases Gordon Research Conference, Waterville Valley, NH, 18–23 June 2017, and the Boston Ithaca Islet Club Meeting, Worcester, MA, 28–29 April 2018.