Next-generation targeted sequencing of HLA-DRB1 and HLA-DRB3, -DRB4, and -DRB5 (abbreviated as DRB345) provides high resolution of functional variant positions to investigate their associations with type 1 diabetes risk and with autoantibodies against insulin (IAA), GAD65 (GADA), IA-2 (IA-2A), and ZnT8 (ZnT8A). To overcome exceptional DR sequence complexity as a result of high polymorphisms and extended linkage disequilibrium among the DR loci, we applied a novel recursive organizer (ROR) to discover disease-associated amino acid residues. ROR distills disease-associated DR sequences and identifies 11 residues of DRB1, sequences of which retain all significant associations observed by DR genes. Furthermore, all 11 residues locate under/adjoining the peptide-binding groove of DRB1, suggesting a plausible functional mechanism through peptide binding. The 15 residues of DRB345, located respectively in the β49–55 homodimerization patch and on the face of the molecule shown to interact with and bind to the accessory molecule CD4, retain their significant disease associations. Further ROR analysis of DR associations with autoantibodies finds that DRB1 residues significantly associated with ZnT8A and DRB345 residues with GADA. The strongest association is between four residues (β14, β25, β71, and β73) and IA-2A, in which the sequence ERKA confers a risk association (odds ratio 2.15, P = 10−18), and another sequence, ERKG, confers a protective association (odds ratio 0.59, P = 10−11), despite a difference of only one amino acid. Because motifs of identified residues capture potentially causal DR associations with type 1 diabetes, this list of residuals is expected to include corresponding causal residues in this study population.
Introduction
Genetic studies, from classical HLA studies or genome-wide association studies (1–3), have confirmed that the strongest genetic associations of type 1 diabetes (T1D), potentially causal, are with the HLA-DR and -DQ genes on chromosome 6, dwarfing ∼50 non-HLA genetic associations (1,4–6). While many studies attempted to distinguish two closely located HLA-DR and -DQ associations (7–10), extending to the HLA-DP locus (11), a recent effort aims to investigate individual amino acid residues (or residues for short) and their associations with T1D, following the shared epitope hypothesis in rheumatoid arthritis research (12). One approach is to scan individual residues by testing differential frequencies of individual residues between patients and their control subjects (13), but this fails to account for linkage disequilibrium (LD) among residues. An alternative approach is conditional analysis (14), which repeats residue-specific association tests while adjusting for selected residues. Because of the extensive LD within the HLA genes, it is nearly impossible to deconvolute empirically observed associations into biological versus LD-induced associations; results from this recent study lead to an ongoing controversy. At issue is whether identified residues through conditional analysis are “causal” or just the best surrogate markers for defining HLA genotype because they contain the most discriminative ability (15). Additionally, HLA genotypes used in the conditional study (14) are imputed at an intermediate resolution with an average accuracy of 96.7%, but their accuracy for the majority of HLA alleles (that are less frequent) is substantially lower (16). Genotyping quality could negatively affect the precision of deducing causal residues.
Using next-generation targeted sequencing (NGTS) technology, we recently sequenced 962 pediatric patients with T1D and 636 control subjects on four DR genes (HLA-DRB1, -DRB3, -DRB4, and -DRB5), explored their associations with T1D, and reported their association results earlier with a subset of 448 control subjects (17). For clarity, we use DRB1 to denote the gene and DRβ1 to denote the corresponding protein sequence. A similar convention is applicable to HLA-DRB3, -DRB4, and -DRB5 genes. Unlike diploidic DRB1, each chromosome carries at most one of the DRB3, DRB4, and DRB5 genes (18,19). For simplicity, we use DRB345 to denote the three separate genes. Additionally, we developed a novel method, known as recursive organizer (ROR), to achieve statistical parsimony by likelihood-based associations while retaining the sequence-specific LD among all residues and sequence motifs of multiple selected amino acids that retain DR-T1D associations (20). In essence, ROR shared the same analytic objective with the conditional analysis (14), but avoided the influence of local LD on eventually selected residues, assuring inclusion of all possible causal variants in the discovery process.
Here, we applied ROR to the case-control data of T1D and high-quality HLA gene sequences to identify potentially causal residues in DRB1 and DRB345. Among all patients with T1D, the study also measured a panel of autoantibodies against GAD65 (GADA), insulin (IAA), IA-2 (IA-2A), and the three variants (residue R, W, or Q on position 325) of ZnT8 autoantibodies (ZnT8A) (21–23) at the time of T1D diagnosis. This availability also allowed us to investigate HLA-DR associations with increased titer of these autoantibodies at the time of the disease diagnosis.
Research Design and Methods
Study Design and Population
This case-control study involved 962 patients from the nationwide Swedish Better Diabetes Diagnosis (BDD) study (21,22,24) and included 636 nationwide and geographically representative control subjects (25), which has been detailed earlier (17). The Karolinska Institute Ethics Board approved the BDD study (2004/1:9).
DNA Extraction
The Plasmid Maxi Isolation Kit (QIAGEN, Bothell, WA) was used to isolate DNA according to the manufacturer’s instructions from frozen whole-blood samples of patients and control subjects.
HLA NGTS Analysis
HLA typing was carried out using the ScisGo HLA v6 typing kit (Scisco Genetics, Seattle, WA). Briefly, the method uses an amplicon-based two-stage PCR, followed by sample pooling and sequencing using a MiSeq v2 PE 500 (Illumina, San Diego, CA). The protocol yields three-field coverage of all HLA loci, including exons 1–4 for DRB1 and DRB345, and genotypes of two-field coverage are used here. Phase is determined in part by overlapping sequences for HLA class I and database lookup for HLA class II (18). These types were converted into amino acid sequences corresponding to amino acids in the β-sheet of codon β1–237 as well as those residues in the signal peptide from −29 to −1. The mature protein started from codon β1, extending to codon β237.
Islet Autoantibodies
Statistical and Molecular Simulation Analysis
By the case-control design, we assessed allelic and haplotypic association analysis of DR genes (27) (using the haplo.stats package of R) and then applied the ROR (20) to discover associated amino acids. To examine potential roles of these amino acids, we performed molecular simulation analysis. Details are provided in the Supplementary Data.
Results
HLA-DR Associations at the Sequence Resolution
The disease association analysis of high-resolution DRB1 and DRB345 in this case-control study (636 control subjects and 962 patients) uncovered detailed DR associations with T1D, most of which are consistent with reported associations on DR-T1D. It is worth noting that associations are impressive, for example, of DRB1*03:01 (odds ratio [OR] 2.09, P = 2.71 * 10−20) and DRB1*04:01 (3.64, P = 3.19 * 10−64) (see Supplementary Table 1A for details). DRB1 had a total of 25 alleles with more than five copies in this study. Five risk alleles (P values highlighted in green) were found to have an OR ranging from 1.58 to 5.95, in which some P values were as low as 3.19 * 10−64. Meanwhile, 12 protective alleles (P values highlighted in red) had an OR ranging from 0.03 to 0.33 (P value was as low as 3.99 * 10−60). Note that if P < 0.05 (without correcting for multiple comparisons), it is declared to be significant and is color-coded for risk/positive (green) or protective/negative (red) associations. Furthermore, some exceptionally small P values are approximated from Gaussian normal distribution and could best be interpreted qualitatively as extremely significant.
DRB345 had nine alleles (Supplementary Table 1B). Two risk alleles (DRB3*01:01 and DRB4*01:03) had significant associations (ORs 1.36 and 2.00, P = 3.26 * 10−5 and 9.94 * 10−49, respectively). Five protective alleles associated with T1D had an OR ranging from 0.04 to 0.80.
Haplotyping DRB1 and DRB345 genes resulted in 28 haplotypes (Supplementary Table 1C). Six haplotypes, colored green, associated with an increased T1D risk, while 14 haplotypes were protective. Most haplotypes are uniquely formed by one DRB1 and DRB345 allele, and hence, corresponding haplotypic associations are attributed to both genes. There are three haplotype pairs (4, 5), (13, 14), and (21, 22) (highlighted in light blue font), in which one DRB1 allele shares haplotypes with two DRB345 alleles. Such pairs provide an opportunity to investigate the independent and interactive role of DRB345 on the corresponding DRB1 allele. For example, for individuals carrying DRB1*03:01, the DRB3*01:01 conferred an OR of 1.95, and DRB 3*02:02 had an OR of 2.94, implying that DRB 3*02:02 adds to the risk. Yet, by itself, DRB 3*02:02 was supposedly protective (OR 0.48).
The current study included seven DR4 alleles (DRB1*04:01, *04:02, etc., in Supplementary Table 1A). Despite that they had the same antigen (DR4), their associations with T1D were influenced by polymorphic residues; three DR4 alleles were associated with T1D risk, three were protective, and one had a neutral association (Supplementary Table 1A). Hence, dissecting responsible residues would provide insights into how genetic polymorphisms influence T1D susceptibility.
Observed associations with individual alleles/haplotypes of DRB1 and/or DRB345 are largely consistent with reported associations in the literature. With the better precision of NGTS and large sample size, association results presented here (Supplementary Table 1) constitute a synthesis of DR-T1D associations in a European population.
Parsimonious Sequences of 11 Residues Capture Disease Associations With DRB1
Recognizing DRB1 polymorphisms, extensive LD, and complex associations with T1D, ROR sought to achieve a parsimony by recursively eliminating residues without altering the observed disease associations with full DRB1 sequence alleles (Supplementary Table 1A). Starting with 266 residues, in the HLA-DRB1 and DRB345 signaling peptides and mature protein structures, ROR recursively eliminated residues and eventually led to a subset of residues (see ROR methodology in the Supplementary Data). Through the ROR process, we computed an overall P value and incremental P values that measure significances of all and incrementally deleted residues (Fig. 1A). Precisely which residues were deleted step by step, eventually leading to a group of 11 residues selected at ROR step 12, is shown in Fig. 1B. Identified residues were β14, β25, β47, β57, β60, β71, β73, β74, β78, β85, and β86 (Table 1). These 11 residues formed 22 unique sequence alleles (Table 1). Each unique sequence allele corresponded to one or more classical alleles and is referred to as an allele group (see Allele Group column in Table 1). Allelic association analysis of the 11 amino acid sequences revealed 5 risk alleles and 12 protective alleles (Table 1), capturing the full allelic associations listed in Supplementary Table 1A, while several uncommon alleles were merged into their respective common alleles (e.g., DRB1*15:03 with DRB1*15:01 and DRB1*11:02 and *11:03 with DRB1*13:01).
The allelic frequencies are n (%) in 636 control subjects and 936 patients along with their estimated ORs, haplotype scores (HSs), and P values. These amino acid positions in the Protein Data Bank are listed above the individual amino acids. The dots are used to denote the same amino acid as the referenced one highlighted in red. Highlighted amino acids, ERKA, in blue, are involved in associations with IA-2A.
Most of the 11 Residues Fall in the Peptide-Binding Grove of DRB1, While the Remainder Are Nearby
As shown in Fig. 1C, all 11 residues were located between positions 14 and 86. When we examined the crystal-resolved structure of DRB1*04:01 (Fig. 1D), we found that all 11 residues were nearby or in the peptide-binding grove of DR. Notably, β86 was in pocket 1, β71 was at the border separating pockets 4 and 7 while being part of both pockets, β74 and β78 were in pocket 4, β47 was in pocket 7, and β57 was in pocket 9. Of the remaining residues, β14 and β25 were underneath the groove below pocket 4, β60 was just over pocket 9, and β73 was adjacent to pocket 4.
Four Residues of DRB1 Capture the Risk, Protective, or Neutral Associations With DR4 Subtypes
Upon eliminating seven monomorphic residues across all DR4 subtypes, associated with seven common DR4 subtypes (Supplementary Table 1A) with an uncommon subtype (DRB1*04:06) merged with DRB1*04:03 (Supplementary Table 2), one had seven alleles with four residues (β57, β71, β74, and β86) corresponding to DR4 subtypes. Evidently, the presence of amino acid E at β74 is associated with the strong protection against T1D, an unusual protection exhibited by the otherwise high-risk set of DR4 alleles. We conducted a molecular simulation of the structure of the DRB1*04:03 molecule in complex with two human insulin A-chain peptides that could bind with high affinity to this molecule (27,28) (Fig. 2A and B, Supplementary Fig. 2, details of the method in the Supplementary Data). Figure 2A and B show the tight fitting of the insulin A8–19 insulin A-chain peptide with p4Leu, p6Gln, and p9Asn anchors, where the latter two anchors are not common for DRB1*04 alleles (29–31) (Fig. 2A and B). Indeed, it was evident that the combination of β74Glu with β13His, β26Phe, β70Gln, β71Arg, and β78Tyr of pocket 4 in the DRB1*04:03 protein excludes acidic anchors at this pocket, accommodating Leu. The proteins DRB1*04:04 and B1*04:05, variably conferring susceptibility to T1D, also accommodate p4Leu in the epitope of GAD65 555–567 (32).
Fifteen Residues Account for Disease Association With DRB345
Processing alleles associated with HLA-DRB345, the ROR procedure removed 195 monomorphic amino acids and recursively organized the remaining 72 residues with 14 unique sequence alleles. After the first step, ROR eliminated 49 amino acids, leaving 23 residues forming 14 unique alleles. Following four additional iterative steps, ROR achieved a parsimony at ROR step 5, where the P value exceeded 5% (Fig. 3A). Visually, the ROR procedure selected 15 residues at positions β18, β25, β40, β41, β44, β48, β51, β70, β76, β81, β104, β135, β180, β181, and β187 (Fig. 3B). The allelic association analysis of DRB345 showed a total of two risk amino acid sequences and four protective amino acid sequences (Table 2), which captured the full DRB345 allelic associations (Supplementary Table 1B).
These 15 Residues Have Different Functional Roles in Their Causal Association With Disease
Of these residues, the ones from β18–81 are in the α1β1 antigenic peptide-binding domain, while the remainder are in the α2β2 accessory domain of the DRβ345 molecules. We have placed all 15 residues onto the crystal structure of HLA-DRB3*03:01, which has a nearly identical secondary structure to the now-established MHC class II (MHC II) fold (33) (Fig. 3C). Yet, the great majority of the residues identified with HLA-DRB345 linkage to susceptibility to or protection from T1D project from the β1β2 domain surfaces. According to one hypothesis, these surfaces form the homodimerization surface of the so-called (αβ)2 homodimer of αβ heterodimers, loaded with identical antigenic peptides and, consequently, the two neighboring surfaces onto which may fit two accessory CD4 molecules from the T cell bearing the cognate T-cell receptor (TCR) (33). Only two of the first 10 residues that belong to the α1β1 domain (β70 and β81) point into the groove and participate in the formation of pockets 4/7 and 1, while the rest are either pointing directly away from the groove, being on the β-sheet floor, or to the side of the groove. It turns out, nonetheless, that, of the remaining thirteen residues, one is β25, also seen in linkages to DRB1 alleles. β104 is at the presumed interphase of the two αβ heterodimers forming the (αβ)2 homodimer, while the other remaining 11 residues are all located on the face of the β1β2 surface that is supposed to interact productively with the accessory molecule CD4 (Fig. 3C). Of these residues, β135 is within the β134–148 binding site of the CD4 molecule, while most of the others are on the surface that presumably interacts with CD4 (33,34). Residue β51 is located near the antigenic peptide C-terminus opening of the peptide-binding groove right next to β52Glu, a residue considered pivotal with β55Arg, in the formation of cognate TCR-mediated homodimerization of HLA-DRαβ heterodimers in the so-called β49–55 homodimerization patch (33,35).
Haplotype-Based Additive Associations and Interactions Between HLA-DRB1 and -DRB345
Local LD between two genes in proximity dictated that DRB1 alleles shared haplotypes with none or one of the DRB345 genes (Table 3). Twenty-seven haplotypes were formed between, respectively, 22 and 6 alleles of DRB1 and DRB345. Full allelic associations of individual genes were captured by five risk haplotypes and nine protective haplotypes. To gain an insight into the potential modifying effects of DRB345 on DRB1 effects, haplotype sets were sorted according to DRB1 sequences, and sorted haplotype sets (4, 5), (6, 7), (16, 17, 18), and (26, 27) were colored in purple or red. Consistent disease associations of haplotypes between two haplotypes suggest that DRB345 may have limited modifying effects in these pairs. In Supplementary Table 3, the same 27 haplotypes and their association statistics are listed by sorting DRB345 first. Examination of DRB1 sequence variations across groups of DRB345 sequences suggested that DRB1 probably drove the haplotype associations with T1D.
Specific Residues of DR Associate With Islet Autoantibodies Among Patients
Elevated titers of islet autoantibodies against various autoantigens were long suspected to associate with polymorphisms of DR genes. Table 4 lists 20 HLA-DR haplotypes and their associations with IAA, GADA, IA-2A, and ZnT8RA, ZnT8WA, and ZnT8QA among all 962 patients. Evidently, DR haplotypes had strong associations with alterations of IA-2A (minimum P = 5.28 * 10−17) and of GADA (minimum P = 6.58 * 10−5), while their associations with IAA or with ZnT8A were relatively weaker.
Four Residues of DRB1 Are Responsible for IA-2A Positivity
By applying the ROR procedure to DRB1 association with six islet autoantibodies, we established that associations of DRB1 with IAA, GADA, ZnT8RA, or ZnT8WA were not significant by the threshold P < 0.01 (Supplementary Fig. 1A). DRB1 was found to be strongly associated with IA-2A positivity among patients, with four detected residues at positions β14, β25, β71, and β73 with the successive log likelihood P < 10−10 (Supplementary Fig. 1A and Table 5). The respective amino acid sequence (ERKA) was found to be highly significant and positively associated with IA-2A positivity (OR 2.15, P = 8.83 * 10−17). Meanwhile, two other amino acid sequences (EREA and ERKG) associated with protection from developing IA-2A (OR 0.70 and 0.59, P = 0.0461 and 7.84 * 10−11, respectively). The motif (ERKA) captures three DRB1 alleles (DRB1*04:01, *04:13, and *13:03). In contrast, the motif ERKG captures two alleles (DRB1*03:01 and *03:05), while EREA captures six alleles (DRB1*01:03, *04:02, *11:02, *11:03, *13:01, *13:02).
The statistical significance at the 1% level is demonstrated for HLA-DRB1 with IA-2A, with four amino acids at β14, β25, β71, and β73.
1Frequency counts of [neg]ative and [pos]itive IA-2A islet autoantibody and their percentages (%).
2OR, z score statistic, and P value computed under the null hypothesis.
DRB1*ERKA Is More Associated With IA-2A–Positive Onset, and DRB1*ERKG Is More Associated With IA-2A–Negative Onset
DRB1*ERKA was found to be positively associated with IA-2A positivity (OR 2.15, P = 8.83 * 10−18), while DRB1*ERKGY was negatively associated with IA-2A positivity (OR 0.59, P = 7.84 * 10−11) (Table 5). When contrasting with population-based controls, both DRB1*ERKAY and *ERKGY were associated with IA-2A–positive and –negative patients, except that their association magnitudes were reversed (OR 3.77 vs. 1.75 for ERKAY and 1.85 vs. 3.14 for ERKGY) (Table 6). DRB1*ERKGY uniquely corresponded with DRB1*03:01:01, while DRB1*ERKAY corresponded with DRB1*04:01:01 and *13:03:01. In Table 1, ERKA is highlighted in light blue. It is also clear that sequences of these five DRB1 residues caught major genetic associations with T1D risks, indicating that IA-2A elevation is an important seroconversion event leading to the disease onset.
DRB1*ERKA and *ERKG Genotypes Have Additive Associations With Type 1 Diabetes Risks
Homozygous carrier of DRB1*ERKA appeared to have a much stronger association than its allelic association (OR 8.71, P = 5.50 * 10−13), indicating an additive association with IA-2A positivity (Supplementary Table 4). On the other hand, the homozygous DRB1*ERKG had an additive association with patients with T1D whose IA-2A association tended to be with T1D with a normal IA-2A.
Discussion
Recursively organizing complex amino acid sequences reveals that sequences of 11 residues capture all risk/protective associations of DRB1 full sequence alleles and, likewise, so do sequences of 15 DRB345 residues. All 11 residues of DRB1 are located either under (β14 and β25 and interacting with each other as well as neighboring amino acids) or within the peptide-binding groove (β47, β57, β60, β71, β73, β74, β78, β85, and β86), suggesting that specific patterns of these residues recognize specific peptides from autoantigens as the functional mechanism. This list includes β71, which is one of two residues discovered in a previous study (14). On the other hand, residues β18, β25, β40, β41, β44, β48, β51, β70, β76, and β81 of DRB345 are all in the α1β1 domain, with β70 participating in the formation of pockets 4 and 7 and β81 as part of pocket 1 and hydrogen bond acceptor from the antigenic peptide backbone (33,35). The remaining residues are either under or to the side of the antigen-binding groove and all to the side that constitutes the presumed CD4 binding surface (33–36) (Fig. 3C). Likewise, residues β135, β180, β181, and β187 are in the α2β2 domain and are also in the visible side of Fig. 3C. Remarkably, β135 is part of the β134–148 stretch directly involved in CD4 binding (33,36), although one work a long time ago (albeit with no follow-up since) showed the polymorphisms in β179–191 of DRB345 to be responsible for poor CD4 binding and consequent activation (37). Only two amino acids, β51 and β104, could conceivably influence the presumed TCR-induced HLA-RA(αβ)2 homodimerization: The first is actually within the HLA-DR homodimerization patch, next to near invariant β52Glu that participates in this process by salt bridges to β55Arg from the other homodimerized HLA-DR molecule (35,36); the other faces into the position of the other HLA-DRαβ–presumed heterodimer (33,35). The homodimerization and subsequent CD4 binding hypothesis have been contested, and evidence has been produced by several laboratories that is either consistent with or against this hypothesis (33,36–43). One matter complicating all analyses is the considerable in vitro instability of the human CD4 molecule, necessitating in many cases the use of mutant forms that form stable complexes with HLA-DR molecules to obtain crystallization of CD4-HLA-DR complexes (41,42). It is certain, however, that F(ab′)2-fragment antibody-mediated homodimerization of HLA-DP or -DQ or -DR on monocytes leads to distinct downstream activation patterns (44).
Further examination of the role of given amino acids shows five residues of DRB1 (β14, β25, β71, β73, and β78) to have strong associations with elevated IA-2A titers for patients with T1D at the time of diagnosis. In particular, the sequence ERKA, found in DRB1*04:01 and *13:03, associates with IA-2A–positive patients with T1D, while ERKG, uniquely found in DRB1*03:01, associates more with IA-2A–negative patients with T1D. Despite the difference of only one amino acid, A versus G at position β73, this difference indicates a potential effect on peptide binding or subsequent cognate TCR binding mechanisms as well as possible effector outcomes (thymocyte deletion or T-helper or T-regulatory cell selection) with respect to IA-2A (45). Finally, effects of specific amino acid sequences ERKA or ERKG appear to be additive. A recent study underscored the importance of HLA-DRB4*01:01 allele for T1D pathogenesis over HLA-DR4 alleles in LD with the former (46).
Our list of 11 residues in DRB1 points to these residues’ involvement in four different pockets. Both β85 and β86 are within pocket 1, with the Gly/Val dimorphism of the latter position determining the aromatic/aliphatic crucial p1 anchor for HLA-DR molecules (30,35). Furthermore, β71 is part of both pockets 4 and 7, while β74 and β78 are within pocket 4, and β47 is within pocket 7. Finally, β57 is within pocket 9, with Asp/non-Asp occupancy at this position determining the type of anchor residue at this pocket (mostly small aliphatic vs. acidic, respectively) (30,35). This residue has also been shown to contribute immensely to the stability of the peptide-HLA-DR complex (40,47).
Also of interest are DRB1 residues β14, β25, β60, and β73, which do not locate in any one of known pockets. The given residues are either under (β14 and β25) or adjoining (β60 and β73) the groove or within it and as parts of pockets (β47, β57, β71, β74, β78, β85, and β86). To our knowledge, the variations in positions 14, 25, 60, and 73 have never been tested in any alleles with regard to peptide binding, cognate TCR recognition, or accessory CD4 binding, and one could well speculate on what effect such substitutions might have (e.g., β73Ala→Gly would make the MHC II molecule slightly more flexible and peptide binding weaker or stronger depending on the anchor residue at adjacent pocket 4; it could, of course, also affect TCR recognition because of the extra flexibility involved [45]). Because there may certainly be interactions of pMHC II complexes with other molecules in the course of immune tolerance formation and immune response, one cannot easily speculate any further.
It has been shown that T1D-susceptible and -resistant HLA-DR alleles bind to different peptide epitopes within the same protein autoantigen and with different binding affinities and, even more importantly, with slower off-kinetics for the resistant alleles (28,31). This work also demonstrates how tight the binding can be for the resistant alleles, such as DRB1*04:03, because of just a few key-positioned amino acids, among them β74Glu. The sum of knowledge generated regarding T1D pathogenesis through particular HLA-DR alleles binding to specific autoantigenic epitopes could lead to the design of antigen-specific interventions (perhaps coupled with nonspecific ones) either in the pre- or in the new-onset T1D phase. The targets would probably have to include effector as well as regulatory CD4/CD8 T cells (48–52).
One by-product from the current analysis is that many uncommon alleles of HLA genes are now naturally grouped with common alleles on the basis of ROR-selected amino acids. Following the selection of 11 residues by the ROR procedure, we have now merged several common alleles with several uncommon alleles, netting 22 common amino acid sequences and 4 uncommon ones (data not shown). Grouping common alleles with similar disease associations achieves analytic efficiency. More importantly, grouping uncommon alleles into common ones allows us to assess genetic risks for those individual carriers of uncommon alleles, risks for which would otherwise be unknown.
Another practical benefit of reducing complex HLA gene sequences and related nomenclature to a representation with a minimum set of residues is its stable representation, despite ever-evolving sequencing technologies and increasing numbers of uncommon alleles. When a minimum set of amino acids, referring to tagging residues (patterning after tagging single nucleotide polymorphisms), can be identified for each disease-associated HLA gene, it would be stable, regardless of how many novel nucleotide sequences and related nucleotide/amino acid sequences are added with better genotyping technologies or with more diverse populations.
In comparison with the early discovery of β13 and β71, our list includes one but not the other (14). The consistently discovered amino acid β71, which is expected, has one of four possible residues (Lys, Arg, Ala, and Glu), with Lys being the most common, and is indicative of DRB1*03:01 and *04:01, two of the most common risk alleles (Table 1). Also, because of their high allelic frequencies, imputations of HLA genotypes from single nucleotide polymorphisms tend to be accurate. On the other hand, it is somewhat surprising that β13, also a member of pocket 4, was not picked up by ROR. The selection process of ROR (Fig. 1B) indicates that β13 is deselected quite early on, probably because of limited association or limited polymorphisms in the current HLA sequence data. Interestingly, however, there are two residues, β14 and β25, both underneath β13, where the electrostatic interactions between the first two, as well as among them and neighboring residues, add to the stability of the MHC II-peptide complex (Fig. 1C). Given the nature of conditional analysis (14), it is possible that the former analysis may not find β14 or β25, possibly because of local LD. However, if there was strong local LD between β13 and β14/β25 and both were significantly associated with T1D, ROR would retain β13 because its analysis is not conditional. One possible explanation is that imputations may introduce extra variations in calling uncommon HLA alleles tagged by β13. Another possible scenario is that this occurs as a result of limited population variation, as our study population is restricted only to the Swedish population. Of course, such conjectures would not be addressable without obtaining HLA genotypes at the nucleotide sequence resolution, more so when we acquire knowledge of specific HLA-DR-peptide combinations being responsible for susceptibility or resistance to T1D.
Besides the difference in HLA genotyping technology, another important difference is the analytic method (i.e., the conditional analysis vs. ROR). While both share the same analytic goal of achieving a parsimony with respect to amino acids, ROR recursively eliminates amino acids if they are monomorphic or their deletion does not alter the association profile. ROR avoids extraction of any information from LD between residues in the selection process. In contrast, the conditional analysis uses the stepwise procedure to search for significantly associated amino acids with T1D while conditioning on those selected amino acids. It uses both disease associations with residues and LD associations between amino acids to achieve the parsimony. Consequently, the conditional analysis can achieve a greater reduction in the number of selected residues, but selected residues should be viewed as best biological surrogate markers that encapsulate both disease association and LD associations. Indeed, it is well suited for developing risk prediction models when parsimonious models achieve greater statistical stability in predictions.
Before concluding, it is important to recognize three limitations of this study. First, our analysis focuses on T1D association with HLA-DR genes without including HLA-DQ genes. If a particular HLA-DR allelic association with T1D is mediated by LD between DR and DQ genes, and hence is not causal, the corresponding motif of amino acids should be treated as a surrogate marker for T1D. For the same reason, discovered amino acids may not include causal residues. The remedy is to investigate HLA-DQ genes and their amino acids jointly with HLA-DR genes, which will be pursued and reported in a separate study. Second, the current investigation is restricted to genetic variants in protein-coding exons, ignoring genetic variants in noncoding regions. Given the available high-resolution HLA genotypes, we are positioned to investigate regulatory roles of noncoding genetic variants, results from which will be reported elsewhere. Finally, the Swedish population is relatively homogeneous, and hence, HLA-DR diversity is limited. Thus, identified residues and related motifs are inevitably limited. To be comprehensive, it is important to conduct such studies in other ethnic populations or, even better, to carry out an international consortium study with multiple ethnic populations. Indeed, our discovery should be thought of as the first stepping stone toward fully understanding T1D associations with class II genes.
B.X. and M.K. were summer interns at Fred Hutchinson Cancer Research Center during the summer months of 2018.
G.P.B. is an adjunct member of the Laboratory.
Article Information
Acknowledgments. The authors would like to thank Dr. Gun Forsander (Department of Pediatrics, the Queen Silvia Children’s Hospital, Sahlgrenska University Hospital, Gothenburg, Sweden) and Dr. Sten A. Ivarsson (Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden) for clinical expertise in the conduct of the BDD study. Equally, the authors thank Anita Ramelius, Linda Faxius, Karl Moritz, Petra Moritz, Ingrid Wigheden, Ida Jönsson, and Rasmus Håkansson (Department of Clinical Sciences, Lund University/Clinical Research Centre, Malmö, Sweden) for expert technical assistance.
Funding. The study was supported by National Institute of Diabetes and Digestive and Kidney Diseases grants R56-DK-117276, RO1-DK-117276, DK-63861, and DK-26190 and by a grant from the European Foundation for the Study of Diabetes Clinical Research Grants Programme 2013. The study also was supported in part by the Swedish Child Diabetes Foundation (Barndiabetesfonden); a Swedish Research Council Linné grant to Lund University Diabetes Centre; the Skåne County Council for Research and Development; the Swedish Association of Local Authorities and Regions; and Forskningsrådet om Hälsa, Arbetsliv och Välfärd. The Silicon Graphics Fuel instrument and the accompanying software were obtained through a grant from the Epirus Regional Development Program to the Technological Educational Institute of Epirus (MIS 91949) through the 3rd Community Support Framework of the European Union (80% European Union funds, 20% Hellenic state funds).
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. L.P.Z., G.K.P., W.W.K., D.E.G., and Å.L. researched and analyzed the data and wrote the manuscript. B.X., M.K, R.W., C.-W.P., and W.C.N. contributed to the NGTS, researched data, and reviewed the manuscript. A.K.M. and G.P.B. carried out molecular simulations and graphical representations of select HLA-DR molecules. A.C., H.E.-L., J.L., C.M., M.P., and U.S. designed the BDD study, researched data, contributed to the discussion, and reviewed the manuscript. Å.L. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Data and Resource Availability. Interested researchers are encouraged to contact Å.L. about accessing the underlying data set, following institutional review of human subject research and institutional policies.