Predisposition to type 1 diabetes and juvenile obesity is influenced by the susceptibility locus IDDM2 that includes the insulin gene (INS). Although the risk conferred by IDDM2 has been attributed to a minisatellite upstream of INS, intragenic variants have not been ruled out. We examined whether INS polymorphisms affect pre-mRNA splicing and proinsulin secretion using minigene reporter assays. We show that IVS1-6A/T (−23HphI+/−) is a key INS variant that influences alternative splicing of intron 1 through differential recognition of its 3′ splice site. The A allele resulted in an increased production of mature transcripts with a long 5′ leader in several cell lines, and the extended mRNAs generated more proinsulin in culture supernatants than natural transcripts. The longer mRNAs were significantly overrepresented among β-cell-expressed sequenced tags containing the A allele as compared with those with T alleles. In addition, we show that a rare insertion/deletion polymorphism IVS1+5insTTGC (IVS-69), which is exclusively present in Africans, activated a downstream cryptic 5′ splice site, extending the 5′ leader by 30 bp. These results indicate that −23HphI and IVS-69 are the most important INS variants affecting pre-mRNA splicing and suggest that −23HphI+/− is a common functional single nucleotide polymorphism at IDDM2.
Type 1 diabetes results from the autoimmune destruction of the insulin-producing pancreatic β-cells. In addition to a major susceptibility locus in the HLA region, termed IDDM1, genetic predisposition to this disease is conferred by a locus on chromosome 11, designated IDDM2 (1). The genetic risk at IDDM2 has been attributed to the INS minisatellite (2–6), which is composed of a variable number of tandem repeat sequences. However, reanalysis of allelic association data in type 1 diabetes did not rule out intragenic variants, including a single nucleotide polymorphism (SNP) −23HphI (7), which is located in position −6 relative to the 3′ splice site of intron 1 (IVS1-6A/T). This SNP has been used as a surrogate marker for INS genotyping in a large number of studies to infer minisatellite haplotypes (class I/III) in disease susceptibility (2,3,6 and refs. therein). IVS1-6A/T is located in the polypyrimidine tract (PPT), a splicing signal of central importance for vertebrate 3′ splice site recognition, and in a position exhibiting a great depletion of purine residues in the PPT (8). Since uridine is the preferred PPT nucleotide in pre-mRNA (9), the A allele, which reduces the Shapiro-Senapathy matrix score and other splicing prediction scores for the acceptor splice site of intron 1 (online appendix Table 1, available at http://diabetes.diabetesjournals.org), is likely to weaken efficient splicing of this intron. As intron 1 separates noncoding and coding exons, differential utilization of its 3′ splice site by the uridine- and adenosine-containing pre-mRNAs would be predicted to result in distinct representation of mature transcripts with short and long 5′ untranslated regions (UTRs) with potential effects on translation.
To test whether the IVS1-6A→T base change promotes PPT recognition and splicing of intron 1, we transiently transfected the wild-type and mutated INS minigenes (Fig. 1A) into several cell lines and examined their splicing pattern. Nucleotide sequencing of the resulting mRNAs identified six alternatively spliced isoforms designated 1–6 (Fig. 1B). Isoforms 2, 4, and 6 have correctly spliced the 3′ splice site of intron 2, whereas isoforms 1, 3, and 5 lack the 5′ part of exon 3 due to a cryptic 3′ splice site activation (Fig. 1A and B), confirming an earlier observation (10). As predicted, minigenes with the IVS1-6A→T mutation produced a significantly lower proportion of isoforms lacking exon 2 and isoforms retaining intron 1, which was confirmed by selective amplification of naturally spliced transcripts with primer B upstream of the cryptic 3′ splice site (Fig. 1A, C, and D). In an attempt to remove the cryptic 3′ splice site activation in exon 3, we truncated both the wild-type and mutated minigenes with primer B and transfected shorter reporter constructs into 293T cells. We found that skipping of exon 2 and retention of intron 1 was reduced for both pre-mRNAs, suggesting that the deleted segment contains splicing regulatory sequences (Fig. 1E).
To test the influence of other intragenic polymorphisms (11,12) on splicing, we examined constructs mutated in all common SNPs and in less prevalent INS variants that are located close to the 3′ or 5′ splice site (Fig. 1A). Interestingly, a 4-nt insertion IVS-69+ (IVS1+5insTTGC) in the vicinity of the authentic 5′ splice site of intron 1 activated a downstream cryptic 5′ splice site, extending the 3′ end of noncoding exon 1 (Figs. 1B–D and 2A). The insertion allele, which is present in ∼25% of Africans but absent in Caucasians (12), introduces a G-to-T mutation in position +5 of the 9-nt 5′ splice site consensus, thus reducing the strength of the authentic 5′ splice site and increasing relative expression of intron 1-containing transcripts (Fig. 1B–D). Identical 5′ splice site alterations in other genes have been previously shown to cause splicing defects that resulted in genetic disease (13), suggesting that this variant may affect proinsulin expression. The remaining SNPs had no major influence on splicing of reporter pre-mRNAs (Fig. 1B and C), although we could not exclude minor effects.
To examine the splicing pattern of INS in the natural context, we transfected our reporter constructs into a cell line derived from β-cells (Fig. 2B). We observed a similar decrease of intron 1-retaining transcripts for the IVS1-6T allele in insulin-producing rat INS-1E cells, as well as an activation of the cryptic 5′ splice site of intron 1 for pre-mRNAs carrying the 4-nt insertion. Similar splicing patterns were also observed for pancreatic ductal adenocarcinoma PANC1 cells, HeLa cells, and H1299 cells derived from small-cell lung carcinoma.
The more efficient splicing of T-containing intron 1 observed in both β- and non-β-cells (Figs. 1B–D and 2B) would be predicted to result in a biased representation of INS isoforms in gene sequencing databases. We therefore determined distribution of transcripts that lack intron 2 in expressed sequence tag (EST) libraries created from pancreatic β-cells. Of 19 INS EST libraries, two contained transcripts with complete retention of intron 1 (Table 1). Analysis of aligned cDNA sequences in both libraries indicated that the HR85 library contained only class I haplotypes (IVS1-6A alleles), whereas the human insulinoma library had only class III haplotypes (IVS1-6T alleles), suggesting that they were prepared from homozygous donors. Comparison of correctly spliced transcripts and transcripts retaining intron 1 showed that the latter mRNAs were significantly overrepresented in the HR85 library as compared with a human insulinoma library (Table 1) (χ2 = 62.2, 1 df, P = 3.0 × 10−15).
A search for transcripts spliced to the cryptic 3′ splice site in exon 3 revealed one such transcript in the HR85 library (0.52%) and nine clones in the insulinoma library (0.29%), indicating that these transcripts are rare in β-cells in vivo. In addition, the insulinoma library contained several transcripts (3/3,026 [0.10%]) spliced to another cryptic 3′ splice site in exon 3 located 36 bp downstream of the authentic 3′ splice site of intron 2 (online appendix Table 2). We found such transcriptsby sequencing a rare RT-PCR product amplified from a cDNA sample prepared from pancreatic mRNA (Fig. 2C), suggesting that this splice site is occasionally used in normal β-cells in vivo. RNA products spliced to the cryptic 5′ splice site in intron 1 (Fig. 2A) were present only in the insulinoma library (974/3,026 [32%]). Of 974 transcripts, 688 were informative at IVS-69, and all of them had the TTGC insertion, consistent with cryptic 5′ splice site utilization of our mutated reporters (Fig. 1B and C) and the presence of this insertion in the donor’s germline. Five of these transcripts (0.72%) were spliced to cryptic 3′ splice site E3 + 37 (Fig. 2C), further expanding the complexity of INS mRNAs.
Unlike isoform 2, isoforms 4 and 6 encode a functional peptide, but the two differ by the length of the 5′ UTR, which often contains sequences that influence translation (14). We therefore cloned cDNAs representing each isoform and measured levels of secreted proinsulin in culture supernatants after their transient transfection into 293T cells. Significantly, equimolar amounts of plasmids containing isoform 6 repeatedly generated higher proinsulin levels in culture supernatants than those with isoform 4 (Fig. 3A). Transfection experiments with shorter reporters prepared with primers D and E to create exact replicas of previously used minigenes (15) gave a similar difference (data not shown).
Serine/arginine-rich (SR) proteins are a family of RNA-binding proteins that regulate splicing, mRNA export, translation, and other cellular processes. To investigate how SR proteins affect the splicing pattern of INS reporter pre-mRNAs, we cotransfected our reporter constructs with plasmids expressing SC35, SRp40, SRp55, and SRp75. Reporter pre-mRNAs coexpressed with SR proteins in 293T cells increased the proportion of transcripts retaining intron 1 (Fig. 3B), and this increase correlated with proinsulin levels measured in culture supernatants of cotransfected cells (r = 0.90, P < 0.05), further supporting a causal link between the enhanced isoform 6 expression and higher proinsulin secretion.
Our results identified INS variants that affect pre-mRNA splicing and illustrated how transcript complexity of a small gene could be significantly influenced by intragenic DNA polymorphism. They also showed marked differences in the proinsulin production generated by these splice variants and suggested that proinsulin production is controlled by a subset of SR proteins. Uridine substitutions in PPT position −6 have been shown to increase exon skipping and retention of weakly spliced introns (16), most likely by diminishing interaction between the PPT and the 65-kDa subunit of the U2 small nuclear RNA auxiliary factor or competing polypyrimidine binding proteins. INS intron 1 is weakly spliced in both the rat (17) and the mouse gene, where transcripts retaining this intron have been reported with higher translational efficiency (18). Absence of introns reduced the cytoplasmic level of rat insulin mRNA and protein expression by approximately sixfold (19), which is comparable with our observation (Fig. 3A) and the mouse data (18). Conversely, extension of the rat 5′ UTR by inserting unstructured oligonucleotides resulted in repeat-dependent enhancement of translation (14). Chicken embryonic proinsulin is also regulated through an extended 5′ UTR and upstream open reading frames (20). It will therefore be interesting to examine translation efficiency of mutated constructs to identify underlying cis-acting elements in the first introns. Assuming a direct action of coexpressed SR proteins, the observed reduction of canonical transcripts in cotransfected cells (Fig. 3B) could be explained by the promotion of proximal splicing and/or reduction of distal splicing (16 and refs. therein) or altered mRNA export of intron 1-retaining transcripts.
The existence of functional INS variants that are more frequent or unique in Africans raises the hypothesis that they may have been subject to selection following a shift of out-of-Africa ancestral population to primitive agriculture with consequent increased carbohydrate intake. This could be the case for both the IVS1-6T allele, which has a frequency of ∼82% in Africans and ∼19% in non-Africans, and the insertion allele at IVS-69, with a frequency of ∼25 and 0%, respectively (12). Utilization of the cryptic 5′ splice site in intron 1 was found to be increased in human insulinomas, but the ethnic origin of donors was not provided and primers amplifying the extended mRNA were designed to amplify the deletion not the insertion allele (21). Both our EST data and transfection experiments (Figs. 1B–D and 2B) indicate that utilization of the cryptic 5′ splice site is allele dependent. Future studies should therefore confirm that use of this splice site in insulinomas is indeed increased both in IVS-69− and IVS-69+ individuals and that β-cells of the IVS-69+ donors produce higher amounts of transcripts spliced to the cryptic 5′ splice site of intron 1.
Marked differences in the proinsulin secretion observed for isoforms 4 and 6 in humans (Fig. 3A) and mice (18) well illustrate the power of alternative pre-mRNA splicing to regulate gene expression. One can speculate that even minute differences in the ratios of these isoforms in pancreatic β-cells may account for a small but significant increase in fasting insulin levels and more rapid weight gain in late childhood and adolescence observed in carriers of the −6A allele (6). Since previous studies correlating INS genotypes and steady-state mRNA expression used primers in exon 3/3′ UTR (3,4) or bridging intron 1 (5), which could not measure the relative ratios of INS isoforms, future work should attempt to determine the expression of alternatively spliced mRNAs in β-cells of AA, AT, and TT carriers.
Finally, the allele-dependent alternative splicing described here may affect “promiscuous” intrathymic selection processes and tolerance to the central antigenic determinant of diabetic autoimmunity. Such a hypothesis is supported by the recent identification of oligoclonally expanded T-cells from diabetic subjects with DR4, a susceptibility molecule for type 1 diabetes. These cells have been shown to recognize the DR4-restricted insulin epitope A1-15 (22), which is encoded by the 5′ part of exon 3, corresponding exactly to a segment absent in transcripts spliced to the downstream cryptic 3′ splice site (Fig. 1A). Smaller transcripts that may represent mRNAs lacking this segment were seen in developing thymi (Fig. 3 in ref. 23), although this finding was not discussed by the authors. Pre-mRNAs spliced to this acceptor site observed in our reporter system (Fig. 1B) are most likely to be preferentially degraded in vivo by “nonstop” mRNA surveillance mechanisms as these transcripts lack termination codons. The existence of noncanonical transcripts in transfected cells (Figs. 1B and 2A and C) and in EST libraries (online appendix Table 2) would be consistent with this concept, which appears to be further supported by the observed genetic linkage and allelic association of INS with type 1 diabetes (1) but not type 2 diabetes or other metabolic traits (24), except in childhood obesity (6). Subsequent studies should therefore examine each of these possibilities in greater detail, since alternative splicing and associated pathways might become suitable targets for therapeutic approaches in the future.
RESEARCH DESIGN AND METHODS
Splicing reporter constructs.
The wild-type INS reporters (Fig. 1A) were cloned into the HindIII/XbaI restriction sites of the mammalian expression vector pCR3.1 (Invitrogen) using primers A (5′ ATC TAA GCT TGG GAG ATG GGC TCT GAG ACT A) and C (GTC ATC TAG ATG GTT CAA GGG CTT TAT TCC A). Minigenes lacking the 3′ part of exon 3 to avoid the cryptic 3′ splice site were subcloned by amplifying the full-length clones with primer B (ATA ATC TAG ACA CAA TGC CAC GCT TCT GC). In addition, we cloned splicing reporters using primers D (ACC AAG CTT AGC CCT CCA GGA CAG GCT) and E (ACC TCT AGA GGC TGC GTC TAG TTG CAG TA) to obtain minigenes with shorter UTRs as described (15). Mutated minigene constructs were prepared by overlap-extension PCR using Pfu (Promega) or Pwo (Roche) polymerases. Mutagenic oligonucleotide primers are available on request. The wild-type and mutated reporters were fully sequenced as described (16) to confirm intended mutations and exclude clones with undesired alterations. Plasmids expressing SR proteins were obtained as described (16).
Cell lines and transfections.
The human embryonic kidney 293T, HeLa, PANC1, and H1299 cells were grown under standard conditions in RPMI-1640 supplemented with 10% (vol/vol) FCS (Gibco) as described (16). INS-1E cells were grown in RPMI-1640 supplemented with 10% FCS, 10 mmol/l Hepes, 50 μmol/l 2-mercaptoethanol, and 1 mmol/l sodium pyruvate. Transient transfections were performed in 12-well plates or Falcon T25 using FuGENE 6 (Roche) or GeneCarrier-1 (Epoch-Biolabs). The plating density was 105 cells 17–24 h before transfection. The medium was changed 2 h before adding a DNA mixture prepared by combining 2–6 μl transfection reagent and 50 μl serum-free medium, followed by the addition of 500 ng purified plasmid DNA (Wizard Plus SV Minipreps; Promega). The DNA mixture was incubated for 20 min at room temperature before transfection. Cells were harvested 48 h post-transfection.
Detection of spliced products.
Total RNA was extracted as described (16), treated with DNase I (Ambion), and reverse transcribed using oligo(dT)15 primers and Moloney murine virus reverse transcriptase (Promega) according to the manufacturer’s recommendations. Three microliters of the first-strand cDNA reaction were amplified with vector-specific primers PL3 (GGG AGA CCC AAG CTG GCTA) and PL4 (AGTC GAG GCT GAT CAG CGG) and primers PL3/B (Fig. 1) to validate the ratios of RNA products in independent PCRs. The number of PCR cycles was 29 or lower to maintain an approximately linear relationship between the RNA input and signal. DNA fragments were extracted from the gel using the BIO101 Geneclean Kit (Q-BIOgene) and sequenced to confirm the identity of each fragment. RNA products were measured with FluorImager 595 using ImageQuant and Phoretix software (Nonlinear Dynamics, Sunnyvale, CA) as described (16).
Quantification of secreted proinsulin.
Total and intact proinsulin levels were measured in culture supernatants (48 h post-transfection) by dissociation-enhanced lanthanide fluoroimmunoassay using monoclonal antibodies 3B1 (capture), CPT3F11 (total), and A6 (intact) as described (25).
Analysis of INS EST libraries.
INS cDNA sequences (online appendix Table 2) were obtained from EST libraries available at from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/dbEST/). The statistical significance of their distribution was estimated by a standard contingency-table χ2 test. The Python scripts to analyze INS isoforms from EST libraries and their haplotypes are available in online appendix Table 3. Unique cDNA clones were aligned with a contig assembly program (CAP) available at http://bio.ifom-firc.it/ASSEMBLY/assemble.html, and haplotypes/isoforms were reexamined manually.
EST library* . | Total number of available ESTs . | HphI allele/minisatellite† . | Isoform 4 (%) . | Isoform 6 (%) . |
---|---|---|---|---|
HR85 islet | 192 | A/class I | 47 | 15 (24)‡ |
Human insulinoma | 3,026 | T/class III | 825 | 25 (3)§ |
EST library* . | Total number of available ESTs . | HphI allele/minisatellite† . | Isoform 4 (%) . | Isoform 6 (%) . |
---|---|---|---|---|
HR85 islet | 192 | A/class I | 47 | 15 (24)‡ |
Human insulinoma | 3,026 | T/class III | 825 | 25 (3)§ |
Genbank accession numbers for each database entry and corresponding sequences are available in online appendix Table 2. The proportion of informative ESTs was similar in both libraries (∼30%).
INS haplotypes and isoform structure were determined using Python scripts (www.python.org) available in online appendix Table 3.
Of 15 ESTs, 10 were informative for the IVS1+5insTTGC polymorphism (INS-69+/−), and all of them had the deletion allele.
Of 25 ESTs, 10 were informative for the IVS1+5insTTGC polymorphism, and all of them had the insertion allele, suggesting that the insulinoma donor was of African origin.
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Article Information
This work was supported by the University of Southampton, the Wessex Medical Trust, and the Medical Research Council of the U.K. T.R.G. is a British Heart Foundation Intermediate Fellow (FS/05/065/19497). This work is subject to U.K. patent application 0510555.6.
We thank Professor C. Wollheim, Dr. P. Maechler (University of Geneva), and Dr. J. Blaydes (University of Southampton) for providing cell lines; Dr. J. Caceres (University of Edinburgh) and Dr. G. Screaton (University of Oxford) for providing plasmids expressing SR proteins; and Ms. C. Glenn and D. Smith for technical help with dissociation-enhanced lanthanide fluoroimmunoassay.