Alternative splicing (AS) within the β-cell has been proposed as one potential pathway that may exacerbate autoimmunity and unveil novel immunogenic epitopes in type 1 diabetes (T1D). We used a computational strategy to prioritize pathogenic splicing events in human islets treated with interleukin-1β plus interferon-γ as an ex vivo model of T1D and coupled this analysis with a k-mer–based approach to predict RNA-binding proteins involved in AS. In total, 969 AS events were identified in cytokine-treated islets, with a majority (44.8%) involving a skipped exon. ExonImpact identified 129 events predicted to affect protein structure. AS occurred with high frequency in MHC class II–related mRNAs, and targeted quantitative PCR validated reduced inclusion of exon 5 in the MHC class II gene HLA-DMB. Single-molecule RNA fluorescence in situ hybridization confirmed increased HLA-DMB splicing in β-cells from human donors with established T1D and autoantibody positivity. Serine/arginine-rich splicing factor 2 was implicated in 37.2% of potentially pathogenic events, including exon 5 exclusion in HLA-DMB. Together, these data suggest that dynamic control of AS plays a role in the β-cell response to inflammatory signals during T1D evolution.
Introduction
During the evolution of type 1 diabetes (T1D), islet-infiltrating immune cells are primed to release a variety of chemokines and cytokines, such as interferon-α (IFN-α), IFN-γ, and interleukin-1β (IL-1β). While these cytokines have been shown to induce β-cell death, they also activate transcriptional and signaling pathways within the β-cell that have been linked with increased immunogenicity through the generation of neoantigens (1), increased inflammatory gene expression, and changes in HLA gene expression patterns (2). More than 90% of human genes undergo alternative splicing (AS), which generates mRNA and protein diversity for cellular identity and homeostasis in multicellular life (3). AS is regulated in a cell-, tissue-, and stimulus-specific manner through a complex interaction network that involves cis elements within both DNA and precursor mRNAs as well as trans-acting factors that bind specific cis element sequences (4). Aberrant patterns of AS have been implicated in several autoimmune conditions, including multiple sclerosis (5), systemic lupus erythematosus (6), and rheumatoid arthritis (7), and AS has been proposed as one potential pathway that may unmask novel immunogenic protein epitopes or change cellular function to exacerbate autoimmunity.
In models of T1D, exposure of human β-cells to proinflammatory cytokines, such as IL-1β plus IFN-γ or IFN-α, have been linked with changes in AS (8–10). However, only a small subset of AS events are pathogenic, with a large majority ultimately classified as inconsequential passenger events (11). Defining whether an AS event affects protein structure can be tested directly using a variety of experimental approaches, such as X-ray crystallography or nuclear magnetic resonance. A limitation of these approaches is that they examine only a targeted region of the protein and are therefore impractical to perform on a genome-wide scale (12). We previously validated a novel machine learning–based algorithm, known as ExonImpact, which uses prediction-based methods for deriving protein structure–related features from RNA sequencing data sets of AS events through the integration of 31 curated features of protein structure (12). Here, we applied this method to RNA sequencing data sets generated from human islets treated with IL-1β and IFN-γ to identify potentially pathogenic AS events relevant to T1D pathophysiology (12). In parallel, we used a k-mer–based RNA-binding protein motif enrichment analysis to identify candidate RNA-binding proteins (RBPs) predicted to bind mRNA splice sites leading to activation or repression of splice site use under inflammatory conditions.
Research Design and Methods
Human Islets
Human islets from 10 cadaveric organ donors (male n = 6; female n = 4; average age 33 ± 11 years; average BMI 28.89 ± 4.68 kg/m2) were used for RNA sequencing (13). Human islets from 12 cadaveric organ donors (male n = 9; female n = 3; average age 47 ± 14 years; average BMI 27.72 ± 4.68 kg/m2) were used for quantitative PCR (qPCR), immunofluorescence, and immunoblot experiments. Islets were obtained from the Integrated Islet Distribution Program or the University of Alberta (Supplementary Table 1). Upon receipt, islets were cultured overnight in standard Prodo medium (Prodo Laboratories, Aliso Viejo, CA) and then treated with or without 50 units/mL IL-1β and 1,000 units/mL IFN-γ for 24 h followed by RNA isolation and downstream analysis (Fig. 1).
Schematic representation of the experimental and analysis workflow. Human islets from 10 cadaveric human organ donors were obtained from the Integrated Islet Distribution Program. Following overnight recovery, islets were cultured with or without 50 units/mL IL-1β and 1,000 units/mL IFN-γ for 24 h and then harvested for RNA isolation and downstream analysis, including identification of differential splicing events, ExonImpact functional annotation, GO enrichment analysis, and RBP motif prediction. Targeted qPCR and smFISH were used for validation.
Schematic representation of the experimental and analysis workflow. Human islets from 10 cadaveric human organ donors were obtained from the Integrated Islet Distribution Program. Following overnight recovery, islets were cultured with or without 50 units/mL IL-1β and 1,000 units/mL IFN-γ for 24 h and then harvested for RNA isolation and downstream analysis, including identification of differential splicing events, ExonImpact functional annotation, GO enrichment analysis, and RBP motif prediction. Targeted qPCR and smFISH were used for validation.
RNA Sequencing and AS Analysis
Total RNA was isolated using the SeraMir RNA Isolation Kit (System Biosciences). All sequenced libraries were mapped to the human genome (GENCODE GRCh37) using the STAR RNA sequencing aligner (14) (Supplementary Material). AS events were analyzed using replicated multivariate analysis of transcript splicing (rMATS) (15), which computes a percentage splicing index (PSI) and false discovery rate (FDR) for five different types of splicing events, including skipped exons, mutually exclusive exons, retained introns, and 5′ and 3′ AS sites. Events with more than half of the replicates from each comparison group having a sum of the inclusion junction counts and skipping junction counts per sample ≥10 were kept. To be considered significantly changed, a cutoff of 10% for ΔPSI and a 5% FDR were used. A heatmap and locus-by-locus volcano plot were generated using R packages. Two external data sets were analyzed for additional validation (Supplementary Material). ExonImpact, a machine learning–based tool, was used to prioritize and evaluate the functional consequences of AS events (12). Gene Ontology (GO) enrichment was performed with DAVID (https://david.ncifcrf.gov/tools.jsp) (16) using the biological processes category.
RBP Motif Enrichment Analysis
RBP motif enrichment analysis was performed on seven sequence regions around differentially spliced skipped exon events. We searched for intronic splicing regulators within introns 300 bases upstream and downstream of a skipped exon, 300 bases downstream of the upstream splice junction, and 300 bases upstream of the downstream splice junction. We searched for exonic splicing regulators within exons 150 bases upstream of the upstream splice junction, within the skipped exon itself, and within 150 bases downstream of the downstream splice junction (Fig. 1). Sequences of those seven regions were extracted from human reference genome (GENCODE GRCh37).
To calculate k-mer enrichment, Kvector (https://github.com/olgabot/kvector) was used to count k-mers in the extracted sequences. z scores of k-mer enrichment were calculated for each region, defined by the inclusion level direction (exclusion or inclusion) against total k-mer counts in the same region for skipped-exon events in the whole genome–wide scale. To calculate motif enrichment, the CISBP-RNA binding database (version 0.6) (catalog of inferred sequence binding preferences for RNA) (https://cisbp-rna.ccbr.utoronto.ca/) was used. Each position-weight matrix was transformed into a Boolean vector of k-mers with no mismatches. The motif k-mer matrix was used to calculate motif k-mer enrichment using t tests, by comparing each motif k-mer with all k-mers of that group. Principal component analysis (PCA) was performed on the resulting motif t-statistics using the R package. Motifs were presented as circles, and those with a PCA distance >2.5 squared SDs from the center were colored in red and labeled with the motif name (3).
Pairwise and Network Analysis of RBPs
To calculate the presence of motifs in the seven sequence regions around differentially spliced skipped-exon events, we used FIMO with a custom library of all TRANSFAC (and Jaspar) motifs at a P value threshold of 10−4 (17). A network diagram was constructed using Cytoscape (version 3.7.2) (18) to visualize an AS regulatory network with key known splicing factors by applying a yFiles Organic layout (details provided in the Supplementary Material).
Genome-Wide Association Study and Splicing Quantitative Trait Locus Colocalization Analysis
A total of 220 T1D disease-associated variants were retrieved from the GWAS Catalog (https://www.ebi.ac.uk/gwas). We extended our analysis to variants in high linkage disequilibrium (European populations [EUR] R2 > 0.8) with genome-wide association study (GWAS) hits, analyzing a total of 4,009 variants. Genotype-Tissue Expression (version 7) splicing quantitative trait locus (sQTL) summary statistics were obtained from 305 pancreas samples (19). The coordinates of these two data sets were overlapped to obtain a functional annotation per variant, in order to analyze colocalization of pancreas-specific sQTLs and T1D-associated loci. GO biological process term enrichment of sGenes was performed using DAVID (16), with an FDR threshold of 0.1 to identify significantly enriched terms.
qPCR Validation of HLA-DMB Skipped-Exon Events
To demonstrate the presence of HLA-DMB transcripts with and without exon 5, two primers were designed in exon 4 and exon 6, respectively (Supplementary Table 2) (details provided in the Supplementary Material). The inclusive and skipped-exon transcripts of HLA-DMB were quantified using SYBR green–based real-time qPCR.
Quantitative Single-Cell Fluorescence In Situ Hybridization Analysis
RNA fluorescence in situ hybridization (FISH) was performed to quantify expression of HLA-DMB total RNA and the HLA-DMB splice variant (sHLA-DMB) in human pancreatic sections from the Network of Pancreatic Organ Donors With Diabetes. Expression of both isoforms was quantified using an in-house algorithm (Supplementary Fig. 1) that allowed quantification at the single-cell level with single-molecule resolution (details provided in the Supplementary Material).
Data and Resource Availability
All sequence data were deposited in the Gene Expression Omnibus under the accession number GEO GSE169221.
Results
Cytokine Treatment Influences AS Patterns in Human Islets
The overall study workflow is illustrated in Fig. 1. To quantify AS events induced by cytokine treatment, we applied rMATS to RNA sequencing data sets generated from cadaveric human donor islets treated with or without 50 units/mL IL-1β and 1,000 units/mL IFN-γ for 24 h. To restrict our events to those with sufficient read coverage to increase reliability, we required at least 10 counts per junction in more than half of each comparison group replicate, |ΔPSI| ≥10%, and a statistically significant FDR ≤0.05 by rMATS. In aggregate, 969 differentially spliced exons were identified in 753 mRNAs following cytokine treatment. Of these, 499 splice variants (in 416 mRNAs) showed higher inclusion levels following cytokine treatment and 479 splice variants (in 393 mRNAs) showed higher exclusion levels. A majority (44.8%) involved a skipped exon; 38.9% of events were categorized as a mutually exclusive exon, 6.3% involved a 3′ AS site, 5.8% involved a 5′ AS site, and 4.2% resulted from a retained intron (Fig. 2A and B and Supplementary Fig. 2). Among AS event categories, intron retention has been most closely linked with nonsense-mediated decay, which is a quality control mechanism the leads to degradation of abnormal transcripts (20). Among the 40 significantly differentially spliced retained-intron events, 24 were annotated with known transcripts from the human genome reference repository (GENCODE GRCh37), and 21 (87.5%) of 24 events were predicted as nonsense-mediated decay target candidates according to 50-nucleotide rule (Supplementary Table 3).
Summary of AS events. A: Summary of AS event numbers for each category of event. B: Volcano plot of all identified splicing events (requiring at least 10 counts per junction in more than half of each comparison group replicate) and dot plot of inclusion level differences in control (Cont) and cytokine-treated (Cyto) human islets for representative genes. The x-axis shows inclusion level differences (Cyto vs. Cont), and the y-axis shows −log10 FDR of differential splicing analysis (Cyto vs. Cont). C: GO analysis of differential AS genes with increased inclusion levels. D: GO analysis of differential AS genes with repressed inclusion levels.
Summary of AS events. A: Summary of AS event numbers for each category of event. B: Volcano plot of all identified splicing events (requiring at least 10 counts per junction in more than half of each comparison group replicate) and dot plot of inclusion level differences in control (Cont) and cytokine-treated (Cyto) human islets for representative genes. The x-axis shows inclusion level differences (Cyto vs. Cont), and the y-axis shows −log10 FDR of differential splicing analysis (Cyto vs. Cont). C: GO analysis of differential AS genes with increased inclusion levels. D: GO analysis of differential AS genes with repressed inclusion levels.
PCA indicated that the majority of variance in splicing inclusion level differences was due to cytokine treatment and was not attributable to donor-related differences (Supplementary Fig. 3). The top splicing variant for an exon-inclusion event (ranked by statistical significance with FDR P < 1 × 10−230) was WARS, which encodes a tryptophanyl-tRNA synthetase that has been previously reported as a T2D-related GWAS hit (21). The top variant for a splicing exclusion event was DDR1, a receptor tyrosine kinase with known tissue-specific isoform expression (22,23) that has been previously associated with T1D risk in GWAS (24) (FDR P = 3.09 × 10−58).
Differentially spliced mRNAs with increased AS inclusion levels were enriched in GO terms for processes including exocytosis, post-Golgi vesicle-mediated transport, glycerolipid metabolic process, and phosphorylation. Exclusion events included GO terms for functions such as protein transport, protein phosphorylation, G2/M transition of mitotic cell cycle, and cell division (16) (Fig. 2C and D).
Extensive Alternative Exon Use Is Observed in Islets From Donors With T1D
To compare AS events from our ex vivo model of T1D with those observed in islets isolated from donors with and without T1D, we analyzed RNA sequencing data from bulk-sorted β-cells from four donors with T1D and 12 donors without diabetes (GSE121863) (Supplementary Table 4) (25). Using the same analysis pipeline, 1,461 alternative exon changes were identified with an FDR P ≤ 0.05. We cross-checked events with an FDR P ≤ 0.05 with those identified in our RNA sequencing data set (Supplementary Fig. 4A). Overlapping events were defined as those exhibiting the same trend in splicing difference changes between the two data sets (Supplementary Material). For 80 overlapping splicing exons (74 unique mRNAs), the ΔPSIs were significantly correlated (r = 0.72; P = 8.8 × 10−14) (Supplementary Fig. 4B). Using DAVID GO algorithms, we performed functional annotation of overlapping events and found that the top enriched pathways were transcriptional initiation, mRNA splicing via spliceosome, and cell-cell adhesion (Supplementary Fig. 4C).
ExonImpact Can Be Used to Prioritize Pathogenic AS Events in Cytokine-Treated Human Islets
To evaluate how individual AS events might influence protein structure, we used ExonImpact, which is a validated machine learning–based tool that integrates 31 curated features of protein structure (12). ExonImpact identified a total of 342 AS events. When applying a cutoff functional impact score of ≥0.5, which suggests high probability of a pathogenic AS event, 129 events in 118 mRNAs were retained. Details of these events are provided in Table 1 and Supplementary Table 5. The top-ranked AS event with an ExonImpact prediction score of 0.91 was SMURF2 (Smad ubiquitination regulatory factor-2) (Table 1). To compare differences in protein structural features, events were divided into low- and high-score groups using a functional impact score cutoff of 0.5. When using a Wilcoxon rank sum test, a majority of features showed significant differences between the two groups. Supplementary Figure 5A–D shows comparisons between the low- and high-score groups for selected key structural features. Of note, we used the APPRIS database (26) to annotate the principal isoforms of mRNAs in human islets (Supplementary Table 5). In this analysis, 167 (49.7%) of 336 AS events were annotated with the APPRIS database and 111 of 167 (66.5%) events were called as the principal isoform.
ExonImpact prediction of differential splicing variants
Gene ID . | Gene . | Gene name . | Chromosome . | Strand . | P . | FDR . | Inclusion level difference (cyto vs. cont) . | Disease probability . | phyloP score . | Type . |
---|---|---|---|---|---|---|---|---|---|---|
ENSG00000108854 | SMURF2 | SMAD specific E3 ubiquitin protein ligase 2 | 17 | − | 1.48E−12 | 1.27E−10 | −0.131 | 0.91 | 0.47 | Skipped exon |
ENSG00000155329 | ZCCHC10 | Zinc finger CCHC-type containing 10 | 5 | − | 1.39E−04 | 3.16E−03 | −0.108 | 0.87 | 0.34 | Skipped exon |
ENSG00000189292 | FAM150B | Family with sequence similarity 150 member B | 2 | − | 1.25E−03 | 2.24E−02 | 0.103 | 0.87 | 0.34 | Skipped exon |
ENSG00000144741 | SLC25A26 | Solute carrier family 25 member 26 | 3 | + | 4.87E−11 | 3.68E−09 | −0.15 | 0.84 | 0.41 | Skipped exon |
ENSG00000067082 | KLF6 | Kruppel-like factor 6 | 10 | − | 1.35E−03 | 2.40E−02 | 0.104 | 0.83 | 0.44 | Skipped exon |
ENSG00000174628 | IQCK | IQ motif containing K | 16 | + | 1.99E−03 | 3.38E−02 | 0.134 | 0.82 | 0.41 | Skipped exon |
ENSG00000054392 | HHAT | Hedgehog acyltransferase | 1 | + | 1.75E−13 | 1.63E−11 | −0.149 | 0.81 | 0.37 | Skipped exon |
ENSG00000177225 | PDDC1 | Parkinson disease 7 domain containing 1 | 11 | − | 4.66E−06 | 1.75E−04 | 0.121 | 0.81 | 0.18 | Skipped exon |
ENSG00000068354 | TBC1D25 | TBC1 domain family member 25 | X | + | 1.19E−04 | 2.74E−03 | −0.116 | 0.81 | 0.36 | Skipped exon |
ENSG00000184939 | ZFP90 | ZFP90 zinc finger protein | 16 | + | 1.24E−03 | 2.23E−02 | 0.109 | 0.81 | 0.24 | Skipped exon |
Gene ID . | Gene . | Gene name . | Chromosome . | Strand . | P . | FDR . | Inclusion level difference (cyto vs. cont) . | Disease probability . | phyloP score . | Type . |
---|---|---|---|---|---|---|---|---|---|---|
ENSG00000108854 | SMURF2 | SMAD specific E3 ubiquitin protein ligase 2 | 17 | − | 1.48E−12 | 1.27E−10 | −0.131 | 0.91 | 0.47 | Skipped exon |
ENSG00000155329 | ZCCHC10 | Zinc finger CCHC-type containing 10 | 5 | − | 1.39E−04 | 3.16E−03 | −0.108 | 0.87 | 0.34 | Skipped exon |
ENSG00000189292 | FAM150B | Family with sequence similarity 150 member B | 2 | − | 1.25E−03 | 2.24E−02 | 0.103 | 0.87 | 0.34 | Skipped exon |
ENSG00000144741 | SLC25A26 | Solute carrier family 25 member 26 | 3 | + | 4.87E−11 | 3.68E−09 | −0.15 | 0.84 | 0.41 | Skipped exon |
ENSG00000067082 | KLF6 | Kruppel-like factor 6 | 10 | − | 1.35E−03 | 2.40E−02 | 0.104 | 0.83 | 0.44 | Skipped exon |
ENSG00000174628 | IQCK | IQ motif containing K | 16 | + | 1.99E−03 | 3.38E−02 | 0.134 | 0.82 | 0.41 | Skipped exon |
ENSG00000054392 | HHAT | Hedgehog acyltransferase | 1 | + | 1.75E−13 | 1.63E−11 | −0.149 | 0.81 | 0.37 | Skipped exon |
ENSG00000177225 | PDDC1 | Parkinson disease 7 domain containing 1 | 11 | − | 4.66E−06 | 1.75E−04 | 0.121 | 0.81 | 0.18 | Skipped exon |
ENSG00000068354 | TBC1D25 | TBC1 domain family member 25 | X | + | 1.19E−04 | 2.74E−03 | −0.116 | 0.81 | 0.36 | Skipped exon |
ENSG00000184939 | ZFP90 | ZFP90 zinc finger protein | 16 | + | 1.24E−03 | 2.23E−02 | 0.109 | 0.81 | 0.24 | Skipped exon |
cont, control (no cytokine treatment); cyto, cytokine treated.
RBP Motif Enrichment Analysis
Skipped exons were the most frequent AS category identified in our data set, comprising 44.8% of all events. Based on known RNA 6-mers identified previously as splicing regulatory elements in cell-based screens (27), we analyzed whether there were cis regulatory elements within these flanking intron and exon sequences that could be used to identify enriched RBPs. We applied a k-mer–based computational strategy to identify RBPs potentially responsible for exon-skipping events. Using PCA on RBP motif–enriched 6-mer z scores, our analysis showed that exclusion events were enriched for G-rich motifs and inclusion events for A-rich motifs (Fig. 3A).
RBP motif enrichment analysis. A: PCA of motif-derived k-mer z scores, with each point as a motif and the vector components as the regulation of splicing (inclusion or exclusion). Motifs with PCA distance >2.5 SDs away from zero are plotted in red and labeled with the motif ID and RBP name from CISBP (version 6.0). Enriched motif sequences in each splicing regulation are conserved with distinct nucleotides. B: Heatmap showing differential mRNA expression levels of RBPs with PCA distance >2.5 SDs from zero. The heatmap represents the z score of RPKM values, with each row representing a gene. Each column represents a sample: Cont indicates human islets without cytokine treatment, and Cyto indicates human islets with cytokine treatment, ordered the same as the Cont group. C: Pairwise correlation between the 16 binding motifs for all 12 differentially expressed RBPs. D: Shown is the network of RBPs and the genes harboring splicing events. Nodes in the network correspond to the RBP and significant splicing exons with an ExonImpact prediction score ≥0.5. The color of the node represents differential splicing ratios. Edges represent the prediction count by FIMO, which measures the probability that an RBP binds to a target gene. The line thickness increases with number of predicted counts as indicated. Node sizes increase with significance (−log10 FDR) of the splicing events as indicated. The left panel depicts the network for all 12 RBPs, and the right panel depicts the SRSF2 network only.
RBP motif enrichment analysis. A: PCA of motif-derived k-mer z scores, with each point as a motif and the vector components as the regulation of splicing (inclusion or exclusion). Motifs with PCA distance >2.5 SDs away from zero are plotted in red and labeled with the motif ID and RBP name from CISBP (version 6.0). Enriched motif sequences in each splicing regulation are conserved with distinct nucleotides. B: Heatmap showing differential mRNA expression levels of RBPs with PCA distance >2.5 SDs from zero. The heatmap represents the z score of RPKM values, with each row representing a gene. Each column represents a sample: Cont indicates human islets without cytokine treatment, and Cyto indicates human islets with cytokine treatment, ordered the same as the Cont group. C: Pairwise correlation between the 16 binding motifs for all 12 differentially expressed RBPs. D: Shown is the network of RBPs and the genes harboring splicing events. Nodes in the network correspond to the RBP and significant splicing exons with an ExonImpact prediction score ≥0.5. The color of the node represents differential splicing ratios. Edges represent the prediction count by FIMO, which measures the probability that an RBP binds to a target gene. The line thickness increases with number of predicted counts as indicated. Node sizes increase with significance (−log10 FDR) of the splicing events as indicated. The left panel depicts the network for all 12 RBPs, and the right panel depicts the SRSF2 network only.
For all RBP motifs >2.5 SDs from the central zero point, we checked differential expression patterns of the predicted RBPs from our RNA sequencing data set. Twelve RBPs displayed statistically significant differential expression patterns in response to cytokine treatment (Fig. 3B). Of these, several proteins, including SRSF2, SRSF7, and HNRNPH2, have been previously identified as splicing factors (28,29). FIMO was used to annotate target regions for these 12 RBPs. From this analysis, SRSF2 was predicted to bind 190 (43.7%) significant skipped-exon events, and the complete list of all SRSF2 target mRNAs and their accompanying skipped exons is provided in Supplementary Table 6. Notably, in the subset of events with ExonImpact scores ≥0.5, SRSF2 was predicted to target 37.2% of these mRNAs. We noted an evident cooccurrence correlation and cluster with motifs in the same RBPs (Fig. 3C), suggesting robustness in prediction. Finally, a network was built to visualize 299 significant alternative skipped exons (in 102 unique genes) with ExonImpact prediction scores ≥0.5 and their interactions with these 12 key RBPs (Fig. 3D).
AS Maps to HLA Class II Genes in Cytokine-Treated Islets
Population-based transcriptome studies have suggested that AS may be a causal mechanism underlying GWAS signals of complex disease (30,31). Previous studies have shown that a majority of T1D candidate genes are expressed in human islets (8). To investigate the relevance of regulatory variation affecting splicing in the pathogenesis of T1D, we examined loci with genome-wide significant association with T1D in EUR and considered all variants in high linkage disequilibrium (EUR R2 > 0.8) with a lead single nucleotide polymorphism (SNP) reported in the National Human Genome Research Institute-European GWAS catalog (https://www.ebi.ac.uk/gwas) (32). We found that T1D risk variants were significantly enriched in pancreas tissue–specific sQTLs (19) compared with non-T1D risk genes (18.19 vs. 7.02%; P = 2.56 × 10−96) (Supplementary Table 7), suggesting that SNPs affecting splicing regulation may be a causal mechanism underlying a substantial fraction of GWAS signals for T1D. GO enrichment analysis of 16 sGenes (genes with an sQTL) highlighted a variety of immune-related biological processes, including antigen processing and presentation, the IFN-γ–mediated signaling pathway, and antigen processing and presentation of peptide or polysaccharide antigen via MHC class II (Supplementary Fig. 6).
Genetic risk of T1D is highly associated with variation within MHC class II genes (33), prompting us to categorize further AS events in this pathway. Interestingly, we identified 98 AS events in class II genes, including events in HLA-DRB1, HLA-DMA, and HLA-DPB1 (FDR P ≤ 0.05) (Supplementary Table 8), indicating that AS occurs with a high frequency within the MHC class II pathway, with a hypergeometric P = 0.005 when compared with background. Of these events, one 36-nucleotide AS exon in HLA-DMB was prioritized with an ExonImpact score of 0.75, suggesting high probability of a pathogenic event. This was a skipped-exon event occurring in exon 5 of the mRNA. Notably, this conserved exon encodes a lysosomal targeting signal in the protein, which includes a tyrosine-based motif, YTPL, whose first residue begins at the start of the exon (Fig. 4A and B) (23). In our data set, cytokine treatment resulted in a higher frequency of the skipped-exon event (|ΔPSI| = 27.6%; FDR P = 2.48 × 10−16) (Fig. 4C and D). Notably, SRSF2 protein was predicted to bind the flanking intronic region of this exon (Supplementary Table 6).
Increased exclusion of exon 5 of HLA-DMB occurs in cytokine-treated human islets. A: Genomic structure of the HLA-DMB gene. Exons are shown as filled boxes, with the four alternative isoforms of HLA-DMB mRNA inferred from previous reports. The lysosomal targeting signal domain is denoted as LT (23). B: Exon 5 is conserved across representative placental mammals. C: Sashimi plot for quantitative visualization of the expression of HLA-DMB alternative exon 5 and its flanking exons 4 and 6 in one human islet and cytokine-treated pair (n = 10). D: RNA sequencing results from the in-house data set (n = 10 human islet donors) revealed an increased exclusion ratio of exon 5 following cytokine treatment. E: RNA sequencing from an external independent data set (GSE108413) validated increased exclusion of exon 5 in cytokine-treated human islets (n = 5 cadaveric human donors) (|ΔPSI| = 14.4%; FDR P = 1.21 × 10−23). F: A cartoon illustrating the primers used for qPCR. Primer set (Inc-F and Inc-R) was used to detect products containing exon 5, and primer set (Ex-F and Ex-R) was used to detect products without exon 5. Primer pairs (HLA-DMB-F and HLA-DMB-R) for qPCR were used to detect HLA-DMB transcripts with and without exon 5 (184 and 149 bp, respectively). G: qPCR using specific PCR primer sets: Inc-F and Inc-R to detect the product with exon 5, and Ex-F and Ex-R to detect the product without exon 5. Analysis revealed increased exclusion of exon 5 in cytokine-treated human islets (n = 6) (|ΔPSI| = 26.4%; P = 0.001). H: Agarose gel electrophoresis of qPCR-amplified products using primer pair (HLA-DMB-F and HLA-DMB-R) detecting the increased exon 5 exclusion in cytokine-treated human islets (n = 6). I: Immunoblot of HLA-DMB and β-tubulin protein in human islets (n = 4) treated with (Cyt) or without cytokines (Con). The bar plot displays relative protein levels for HLA-DMB. Shown are means ± SDs (fold change [cyt vs con] = 3.237; P = 0.029).
Increased exclusion of exon 5 of HLA-DMB occurs in cytokine-treated human islets. A: Genomic structure of the HLA-DMB gene. Exons are shown as filled boxes, with the four alternative isoforms of HLA-DMB mRNA inferred from previous reports. The lysosomal targeting signal domain is denoted as LT (23). B: Exon 5 is conserved across representative placental mammals. C: Sashimi plot for quantitative visualization of the expression of HLA-DMB alternative exon 5 and its flanking exons 4 and 6 in one human islet and cytokine-treated pair (n = 10). D: RNA sequencing results from the in-house data set (n = 10 human islet donors) revealed an increased exclusion ratio of exon 5 following cytokine treatment. E: RNA sequencing from an external independent data set (GSE108413) validated increased exclusion of exon 5 in cytokine-treated human islets (n = 5 cadaveric human donors) (|ΔPSI| = 14.4%; FDR P = 1.21 × 10−23). F: A cartoon illustrating the primers used for qPCR. Primer set (Inc-F and Inc-R) was used to detect products containing exon 5, and primer set (Ex-F and Ex-R) was used to detect products without exon 5. Primer pairs (HLA-DMB-F and HLA-DMB-R) for qPCR were used to detect HLA-DMB transcripts with and without exon 5 (184 and 149 bp, respectively). G: qPCR using specific PCR primer sets: Inc-F and Inc-R to detect the product with exon 5, and Ex-F and Ex-R to detect the product without exon 5. Analysis revealed increased exclusion of exon 5 in cytokine-treated human islets (n = 6) (|ΔPSI| = 26.4%; P = 0.001). H: Agarose gel electrophoresis of qPCR-amplified products using primer pair (HLA-DMB-F and HLA-DMB-R) detecting the increased exon 5 exclusion in cytokine-treated human islets (n = 6). I: Immunoblot of HLA-DMB and β-tubulin protein in human islets (n = 4) treated with (Cyt) or without cytokines (Con). The bar plot displays relative protein levels for HLA-DMB. Shown are means ± SDs (fold change [cyt vs con] = 3.237; P = 0.029).
To validate our findings, we analyzed one external RNA sequencing data set of human islets from five donors treated or not with IL-1β and IFN-γ for 48 h (1). Using the same computational workflow, we found |ΔPSI| of 14.4% (FDR P = 1.21 × 10−23) for this event within HLA-DMB, which was similar to results obtained from our internal data set (Fig. 4E and Supplementary Table 4). Next, targeted PCR was performed in separate cytokine-treated islets. This analysis confirmed expression of HLA-DMB and sHLA-DMB with |ΔPSI| of 26.4% (P = 0.001) for the splicing event, which was comparable to RNA sequencing results (Fig. 4F–H). While antibodies that distinguish different HLA-DMB variants are not available, immunoblot analysis revealed a 3.23-fold upregulation of total HLA-DMB protein expression in cytokine-treated human islets (Fig. 4I). In addition, immunofluorescence analysis indicated reduced HLA-DMB colocalization with the lysosomal marker LAMP-1 following cytokine treatment (Supplementary Fig. 7).
Finally, we used human pancreatic tissue sections from the Network of Pancreatic Organ Donors with Diabetes repository and single molecular RNA FISH (smFISH) to quantify expression patterns of HLA-DMB and sHLA-DMB in tissues from 11 human organ donors, including donors without diabetes (control) (n = 4), with autoantibody positivity (AAb+) (n = 3) and with established T1D (n = 4). (Fig. 5A and B and Supplementary Table 1). Classification of cell types was enabled through partitioning of the glucagon and insulin channels (details provided in the Supplementary Material), and only insulin+ β-cells were included for analysis (Fig. 5C). Comparisons between the groups revealed the highest expression ratios of the splice variant versus full-length HLA-DMB mRNA in β-cells from donors with established T1D, with intermediate levels observed in β-cells from AAb+ donors and the lowest levels observed in donors without diabetes (Fig. 5D). Notably, β-cells from AAb+ donors and donors with T1D showed significantly increased nuclear localization of the spliced mRNA (Fig. 5E).
Quantification of the expression patterns of HLA-DMB splice variants in human pancreas using smFISH. A–B: smFISH was performed to quantitate the expression and spatial distribution of full-length HLA-DMB (red) (A) and spliced HLA-DMB mRNA (red) (B) in pancreatic tissue sections obtained from Network for Pancreatic Organ Donors With Diabetes (nPOD) organ donors without diabetes (control upper) and nPOD organ donors with AAb+ but no diabetes (AAb+ middle) and nPOD organ donors with established diabetes (T1D bottom). Immunofluorescent staining of insulin (green), glucagon (white), and DAPI (blue) was performed in parallel to determine the cell-specific expression patterns of full-length and spliced HLA-DMB mRNAs in pancreatic tissue sections obtained from nPOD donors. C: Single cells were classified based on glucagon and insulin expression. Classification details can be found in the Supplementary Methods, and the green lines indicate the β-cell boundaries. D: Expression ratios of spliced (exon 5 exclusion) HLA-DMB mRNA versus full-length HLA-DMB mRNA in single β-cells were compared among control, AAb+, and T1D donors. The copy number of the spliced (exon 5 exclusion) mRNA of each β-cell was normalized by the mean copy number per cell of the full-length mRNA in each respective category (control, AAb+, and T1D). Data for panel D were acquired from 78 β-cells from the control donors, 286 β-cells from the AAb+ donors, and 76 β-cells from donors with established T1D. E: Quantification of single-cell mRNA spatial distribution for full-length HLA-DMB and spliced (exon 5 exclusion) HLA-DMB mRNA in the nucleus and cytoplasm. Fractions were determined as the copy number in the nucleus or cytoplasm divided by the total copy number counts. Shown are means ± SEMs. Differences between groups were analyzed using one-way ANOVA.
Quantification of the expression patterns of HLA-DMB splice variants in human pancreas using smFISH. A–B: smFISH was performed to quantitate the expression and spatial distribution of full-length HLA-DMB (red) (A) and spliced HLA-DMB mRNA (red) (B) in pancreatic tissue sections obtained from Network for Pancreatic Organ Donors With Diabetes (nPOD) organ donors without diabetes (control upper) and nPOD organ donors with AAb+ but no diabetes (AAb+ middle) and nPOD organ donors with established diabetes (T1D bottom). Immunofluorescent staining of insulin (green), glucagon (white), and DAPI (blue) was performed in parallel to determine the cell-specific expression patterns of full-length and spliced HLA-DMB mRNAs in pancreatic tissue sections obtained from nPOD donors. C: Single cells were classified based on glucagon and insulin expression. Classification details can be found in the Supplementary Methods, and the green lines indicate the β-cell boundaries. D: Expression ratios of spliced (exon 5 exclusion) HLA-DMB mRNA versus full-length HLA-DMB mRNA in single β-cells were compared among control, AAb+, and T1D donors. The copy number of the spliced (exon 5 exclusion) mRNA of each β-cell was normalized by the mean copy number per cell of the full-length mRNA in each respective category (control, AAb+, and T1D). Data for panel D were acquired from 78 β-cells from the control donors, 286 β-cells from the AAb+ donors, and 76 β-cells from donors with established T1D. E: Quantification of single-cell mRNA spatial distribution for full-length HLA-DMB and spliced (exon 5 exclusion) HLA-DMB mRNA in the nucleus and cytoplasm. Fractions were determined as the copy number in the nucleus or cytoplasm divided by the total copy number counts. Shown are means ± SEMs. Differences between groups were analyzed using one-way ANOVA.
Discussion
AS is a tightly regulated mechanism whereby precursor mRNA transcripts are processed to generate structurally distinct mRNA and protein isoforms with potentially different biological functions (30). AS is regulated in a cell-specific manner and can be heavily influenced by extrinsic cues. During the evolution of insulitis, islet-infiltrating immune cells produce a variety of cytokines, which are known to induce β-cell death and dysfunction while stimulating a feed-forward cycle of tissue inflammation (34). Consistent with this, islets from individuals with recent-onset T1D obtained via biopsy showed a clear cytokine-exposure signature (35), while a high degree of concordance between gene expression signatures is observed between human islets exposed to IFN-γ plus IL-1β (8,36,37) and islets from donors with T1D (25). Previous reports have demonstrated significant changes in the β-cell transcriptome and AS splicing patterns in cytokine-treated human islets, and differential splicing events have been proposed as one potential mechanism responsible for inflammatory-induced β-cell dysfunction as well as the generation of neoantigens in T1D (8). However, while AS has been documented in >90% of human genes, not all events will have obvious biological consequences (30). Here, we used a computational strategy to prioritize potentially pathogenic splicing events in cytokine-treated human islets.
rMATS was used to detect differential splicing events in our data set. This is an exon-centric analysis tool specifically developed to improve detection of AS in paired-replicate study designs (15). The output from the rMATS algorithm was subsequently analyzed using ExonImpact, a machine learning–based algorithm developed to assesses the functional consequence of AS events (12). Our results showed that widespread alternative exon use occurs in human islets treated with proinflammatory cytokines, including in a number of genes linked to T1D risk and pathophysiology, while ∼38% of AS events were predicted to affect protein structure. To compare how results from this ex vivo model compared to islets from human donors with T1D, we applied the same computational strategy to a published RNA sequencing data set from human donors with and without T1D that was generated using bulk-sorted β-cells (GSE121863) (38). We observed consistency within overlapping genes, as well as enrichment in biological functions that included translational initiation, mRNA splicing, and cell-cell adhesion pathways.
Genetic risk of T1D is highly associated with variation within the MHC region (33). HLA class I molecules, which are recognized by CD8+ T cells, are ubiquitously expressed, and β-cell class I overexpression is a well-accepted hallmark of T1D (39). In contrast, MHC class II molecules, which are recognized by CD4+-expressing T cells (25), are highly expressed in professional antigen-presenting cells (i.e., dendritic cells). The notion that β-cells express class II genes and thus could serve as antigen-presenting cells has been controversial. Recently, RNA sequencing analysis of bulk-sorted β-cells from individuals with T1D identified the presence of mRNAs for HLA class II as well as several molecular components of the class II antigen-presentation pathway (25). Interestingly, through colocalization analysis between T1D GWAS loci and pancreas-specific sQTLs, we found that T1D risk variants and SNPs in high linkage disequilibrium with lead T1D SNPs were significantly enriched in pancreas tissue–specific sQTLs, while sQTLs mapped to a variety of immune-related biological processes, including antigen processing and presentation of peptide or polysaccharide antigen via the MHC class II pathway. In addition, we identified an overall increase in alternative exon use within MHC class II molecules in our data set. For validation, we focused on HLA-DMB, a class II gene identified to be upregulated following cytokine treatment and found to have an ExonImpact score of 0.78, indicating high likelihood of a change in protein structure (25). HLA-DMB catalyzes the exchange and loading of peptides onto MHC class II molecules, thereby shaping MHC class II immunopeptidomes (40). This process is thought to occur in early lysosomal compartments, and interestingly, exon 5 of HLA-DM encodes the lysosomal targeting signal, which includes the tyrosine-based motif YTPL, whose first residue begins at the start of the exon (41). Thus, removing this targeting signal may alter the kinetics and conditions under which it first encounters class II MHC (23). We were able to validate reduced inclusion of exon 5 in HLA-DMB using PCR and in a second RNA sequencing data set, while smFISH revealed an increase in the ratio of the splice variant to full-length HLA-DMB mRNA in β-cells from donors with T1D and AAb+, compared with donors without diabetes. smFISH provides information on both the expression levels as well as the spatial distribution of individual HLA-DMB RNA species at a single-cell level (42,43). We observed increased β-cell nuclear localization of sHLA-DMB mRNA in sections from both AAb+ and T1D donors. While cytoplasmic mRNAs are associated with active translation, splicing is known to occur cotranscriptionally. Thus, increased nuclear expression of the spliced variant could reflect increased transcription or reduced nuclear export of the variant (44). A limitation of our study is that we were not able to distinguish these possibilities or measure protein levels of the splice variant; we observed increased expression of total HLA-DMB protein as well as reduced colocalization of HLA-DMB with a lysosomal marker in cytokine-treated islets.
Lastly, we used a k-mer–based RBP motif enrichment analysis to predict regulatory RBPs responsible for changes in splicing patterns following cytokine treatment. This analysis revealed that inclusion events were enriched for A-rich motifs and exclusion events were enriched for G-rich motifs. In this regard, G-rich sequences have been reported to lead to G-quadruplexes that increase splicing efficiency (3,45). Serine/arginine-rich splicing factors SRSF2, SRSF7, and HNRNPH2 proteins have been reported to regulate AS events in a context-dependent manner (28) within multiple diseases (46), including diabetes (47), with SRS proteins more likely to enhance splicing and hnPNPs more likely to inhibit splicing, which is consistent with results from our analysis. Here we observed a general perturbation of expression and, importantly, activation of several splicing factors upon cytokine treatment, including SRSF2, SRSF3, and SRSF7. In agreement with these findings, a recent study in EndoC-β1H cells reported upregulation of several splicing regulatory proteins in response to treatment with cytokines, hypoxia, and high glucose and lipids (47). Interestingly, SRSF2 was predicted to target 37.2% of AS events with an ExonImpact score of ≥0.5 and was predicted to bind the flanking intron region of HLA-DMB exon 5. A limitation of our study is that we were not able to link modulation of SRSF2 expression or activity with a functional phenotype, but this question will be prioritized in future studies.
In summary, our data indicate that alternative use of exons is a prominent response of the β-cell to cytokine-mediated stress. Here we used a machine learning–based approach to help prioritize potentially pathogenic splicing events. We uncovered a previously unappreciated role for AS in the regulation of MHC class II molecules and also identified SRSF2 as a potential RBP responsible for a large subset of inflammatory-induced splicing events. Taken together, these findings suggest that stimulus-specific isoform expression and changes in protein function may control how the β-cells respond to inflammatory signals in T1D. Additional mechanistic experiments are needed to define fully the functional impact of AS events in T1D pathophysiology; however, splicing inhibitors have shown utility in other diseases, including cancer (48), suggesting that similar strategies may merit testing in T1D.
This article contains supplementary material online at https://doi.org/10.2337/figshare.16828255.
Article Information
Funding. This work was supported by National Institutes of Health grants R01 DK093954 and DK127308 (C.E.-M.), R01 DK060581 and R01 DK105588 (R.G.M.), UC4 DK 104166 (C.E.-M. and R.G.M.), and U01 DK127786 (C.E.-M., D.L.E., and R.G.M.); VA Merit Award I01BX001733 (C.E.-M.); JDRF grant 2-SRA-2018-493-A-B (C.E.-M. and R.G.M.); and gifts from the Sigma Beta Sorority, the Ball Brothers Foundation, and the George and Frances Ball Foundation (C.E.-M.). D.L.E. received support from WELBIO and Fonds National de la Recherche Scientifique (Brussels, Belgium) grants CR-2015A-06 and CR-2019C-04 and Innovate2CureType1–Dutch Diabetes Research Foundation and startup funding from the Indiana Biosciences Research Institute. This research was performed with the support of the Network for Pancreatic Organ Donors With Diabetes (nPOD) (RRID:SCR_014641), a collaborative type 1 diabetes research project sponsored by JDRF (nPOD 5-SRA-2018-557-Q-R), and the Leona M. & Harry B. Helmsley Charitable Trust (grant 2018PG-T1D053). The authors acknowledge the support of the Translation Core of the Indiana Diabetes Research Center (P30DK097512).
The content and views expressed are the responsibility of the authors and do not necessarily reflect the official views of nPOD. Organ procurement organizations partnering with nPOD to provide research resources are listed at https://www.jdrfnpod.org/for-partners/npod-partners/.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. W.W., Y.L., and C.E.-M. conceived and designed the study. F.S. and C.-C.L. performed experiments. W.W., E.S., J.L., G.C., C.D., C.S., and Y.L. performed computational analyses. W.W., F.S., C.C.-L., J.L., D.L.E., R.G.M., Y.L., and C.E.-M. interpreted data. W.W. and C.E.-M. wrote the manuscript. All authors provided critical revisions and edits to the manuscript, and all authors read and approved the final manuscript. C.E.-M. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.