Genetic risk scores (GRS) aid classification of diabetes type in White European adult populations. We aimed to assess the utility of GRS in the classification of diabetes type among racially/ethnically diverse youth in the U.S.
We generated type 1 diabetes (T1D)- and type 2 diabetes (T2D)-specific GRS in 2,045 individuals from the SEARCH for Diabetes in Youth study. We assessed the distribution of genetic risk stratified by diabetes autoantibody positive or negative (DAA+/−) and insulin sensitivity (IS) or insulin resistance (IR) and self-reported race/ethnicity (White, Black, Hispanic, and other).
T1D and T2D GRS were strong independent predictors of etiologic type. The T1D GRS was highest in the DAA+/IS group and lowest in the DAA−/IR group, with the inverse relationship observed with the T2D GRS. Discrimination was similar across all racial/ethnic groups but showed differences in score distribution. Clustering by combined genetic risk showed DAA+/IR and DAA−/IS individuals had a greater probability of T1D than T2D. In DAA− individuals, genetic probability of T1D identified individuals most likely to progress to absolute insulin deficiency.
Diabetes type–specific GRS are consistent predictors of diabetes type across racial/ethnic groups in a U.S. youth cohort, but future work needs to account for differences in GRS distribution by ancestry. T1D and T2D GRS may have particular utility for classification of DAA− children.
Introduction
There is increasing recognition that various types of diabetes can occur at all ages and that classification of diabetes type can be challenging. Type 1 diabetes (T1D) is the most common diabetes type in children, accounting for >85% of pediatric diabetes in White populations. However, other types of diabetes, such as type 2 diabetes (T2D) and maturity onset diabetes of the young, also occur during this period, and treatment differs by diabetes type. In T1D, rapid destruction of β-cells necessitates insulin replacement to avoid acute, severe metabolic complications and to aid long-term glycemic control. In T2D, the gradual decline of insulin secretion in the presence of insulin resistance (IR) often allows for initial treatment with lifestyle and oral medications. Youth with T2D occasionally present with severe hyperglycemia or diabetic ketoacidosis requiring insulin therapy. The increasing prevalence of obesity has led to an increase in youth-onset T2D (1) and an increase in the proportion of people with T1D who are overweight or obese. A particular problem in the U.S. population is the increased frequency of T2D, especially among people of non-White race and ethnicity (2–5). Because of the overlap in presenting features of all diabetes types, there is no single biomarker or clinical feature that can perfectly distinguish diabetes type (6).
SEARCH for Diabetes in Youth (SEARCH) is a population-based, multicenter, and multiethnic study of youth-onset (<20 years) diabetes in the U.S. With the aim of rigorously characterizing the epidemiology of diabetes in youth, this study collected detailed clinical and pathophysiological data on a large number of youth. SEARCH described four etiologic categories of diabetes type in youth across a bidimensional spectrum of autoimmunity (defined by islet autoantibody positivity) and insulin sensitivity (IS) (using an equation validated against hyperinsulinemic-euglycemic clamps) (7,8). While most individuals fell into two categories that aligned with traditional features of either T1D or T2D, 29% fell into intermediate categories, raising a question about diabetes etiology in these individuals.
We have previously demonstrated that a T1D genetic risk score (GRS), comprising HLA and non-HLA T1D-associated single nucleotide polymorphisms (SNPs), can discriminate between T1D and T2D in adults between the ages of 20 and 40 years (9) in addition to having utility in prediction of T1D onset (10–13). We also showed that accurate classification of T1D can be achieved by combining variables to aid classification (9,14). Since, we have developed an improved T1D GRS that incorporates a greater diversity of HLA variation, including alleles more commonly found in non-White populations, and additional loci not independently associated with T2D (13). Advances in the genetic understanding of T2D mean that a T2D-specific GRS containing >400 associated variants may further increase power in classification of diabetes type (15). There are very few studies of the utility of T1D or T2D GRS in classification of pediatric diabetes and in non-White, non-European ancestry populations. In addition, it is not known whether diabetes-type GRS can be a useful tool for classification of pediatric diabetes, whether GRS derived from cohort studies of White Europeans is useful for diabetes classification in other ethnicities, and whether diagnostic models integrating diabetes-type GRS could aid classification.
In this study, we used the previously described T1D GRS and a T2D GRS to assess genetic associations within SEARCH etiologic types and across U.S. racial and ethnic groups represented in the SEARCH study (13,15). We also assessed whether GRS were able to aid classification of diabetes type in cases of difficult-to-classify intermediate SEARCH etiologic types.
Research Design and Methods
We performed analyses on data from SEARCH, a multicenter prospective U.S. cohort study (16). SEARCH investigators conducted population-based ascertainment of youth with incident diabetes diagnosed at <20 years of age from 2002 through 2020. Participants were recruited from four geographically defined populations in Ohio, Colorado, South Carolina, and Washington; Indian Health Service beneficiaries from several American Indian populations; and enrollees in a managed health care plan in California. Participants with newly diagnosed diabetes in 2002–2006 and 2008 were invited to participate in a baseline research visit (median duration of diabetes 8 months, interquartile range [IQR] 4.5–14.5 months). Participants who had a baseline research visit and diabetes duration of at least 5 years were then invited to participate in a follow-up visit, at which time median diabetes duration was 8 years. At each research visit, fasting blood samples were obtained from metabolically stable participants (defined as no episode of diabetic ketoacidosis during the previous month), physical measurements taken, and questionnaires administered. The study was reviewed and approved by the local institutional review boards that had jurisdiction over the local study population, and all participants provided informed consent and/or assent. Study visits occurred after an 8-h overnight fast. Participants did not take diabetes medications on the morning of the visit; if needed, long-acting insulin was administered on the evening before the visit and then discontinued.
Inclusion criteria for these analyses were participation in the SEARCH study and the availability of DNA for generation of SNP array data, as well as measures used to define SEARCH etiologic type (autoantibodies, lipid profile, HbA1c) and of C-peptide. We used data from SEARCH baseline visits and follow-up visits for which key data elements were available.
Blood Analyses
Fasting blood samples were used to analyze diabetes autoantibodies (DAAs), HbA1c, lipids, fasting C-peptide, and fasting plasma glucose. Assays were performed at the Northwest Lipid Metabolism and Diabetes Research Laboratories, University of Washington, the central laboratory for SEARCH. GAD antibody 65 (GADA), IA-2 antigen (IA-2A), and zinc transporter 8 (ZnT8) autoantibodies were analyzed using a standardized protocol and a common serum calibrator developed by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)-sponsored harmonization group (17). The cutoff values for positivity were 33 NIDDK units (NIDDKU)/mL for GADA, 5 NIDDKU/mL for IA-2A, and 0.02 NIDDKU/mL for ZnT8. ZnT8 testing was added to the study protocol during the study, and only a subset of individuals had ZnT8 testing. HbA1c was measured in whole blood within an automated nonporous ion-exchange high performance liquid chromatography system (model G-7; Tosoh Bioscience, San Francisco, CA). Measurements of plasma cholesterol, triacylglycerols, and HDL cholesterol were performed on a Roche Modular-P autoanalyzer (Roche Diagnostics, Indianapolis, IN). Serum C-peptide concentration was determined by a two-site immunoenzymetric assay (Tosoh 1800; Tosoh Bioscience) with a sensitivity of 0.05 ng/mL. A subset of participants did not fast for the study visit, and so these individuals had nonfasting C-peptide and glucose measurements. We defined a surrogate of severe insulin deficiency as fasting or nonfasting C-peptide <0.24 ng/mL. Individuals who had undetectable C-peptide at their baseline visit were assumed to have undetectable C-peptide at follow-up, even if no measurement was taken.
Weight and height were measured using standardized procedures and used to calculate BMI z score. Waist circumference was measured using the National Health and Nutrition Examination Survey (NHANES) III protocol (18).
Race/Ethnicity
Data on self-reported race and ethnicity were collected using 2000 U.S. Census questions and classified as Hispanic, non-Hispanic White, non-Hispanic Black, American Indian, and Asian/Pacific Islander (19). We defined four racial subgroups based on self-reported race and ethnicity where we had sufficient power to study diabetes classification: Hispanic, non-Hispanic individuals self-reporting as White, non-Hispanic individuals self-reporting as Black, and all other races/ethnicities.
Definition of Diabetes Type
For this report, SEARCH etiologic diabetes type was used and assigned from the baseline study visit. Data on diabetes type assigned by the participant’s health care provider was extracted from the participant’s medical record but was not used in these analyses. Etiologic diabetes type was previously defined by the SEARCH study across a bidimensional spectrum of autoimmunity (defined by islet autoantibody positivity) and IS (7,8). DAA+ was defined as being positive for any GADA, IA-2A, or ZnT8 autoantibodies and DAA− as negative for all three.
This equation was developed and validated using direct measurements of glucose disposal rate from euglycemic-hyperinsulinemic clamps (7). IR was defined as an ISI <25th percentile (IS <8.15) of 2,860 nondiabetic youth from multiple race/ethnic groups aged 12–20 years participating in the U.S. National Health and Nutrition Examination Survey (NHANES) in 1999–2004. SEARCH participants were categorized by DAA+/− status and IS or IR into four mutually exclusive groups: DAA+/IS, DAA+/IR, DAA−/IS, and DAA−/IR.
Genotyping
We performed genotyping using the Infinium Multi-Ethnic Global Array (MEGA; Illumina) with 1,697,069 genotyped variants, including 748,291 with minor allele frequency <0.01. Genotyping and preliminary quality control checks where performed at the Colorado Center for Personalized Medicine. After additional quality control, 2,238 samples and 900,743 variants remained. The cohort genotyped on MEGA was categorized using SEARCH etiologic type and consisted of predominantly T1D cases (n = 2,051) as well as those of other diabetes (n = 133 T2D; n = 52 other diabetes, including monogenic diabetes [genetic confirmation with either a genetic clinical test or test performed as an ancillary study to SEARCH]). The median reported age at DNA collection was 11.2 years (IQR 7.6–14.1 years) with a minimum age of 1.9 years and maximum of 21.9 years.
We used additional data genotyped on the Affymetrix 500K imputation scaffold chip with 239,279 genotyped variants. After additional quality control, 537 samples and 235,967 variants remained. This cohort consisted of predominantly T2D cases (n = 417) as well as those of T1D (n = 104) and other diabetes types (n = 16). The median reported age at collection was 11.2 years (IQR 8.1–14.2 years) with a minimum of 2.0 years and a maximum of 21.1 years.
Combined Genetic Data
A subset of samples (n = 137 after quality control) was genotyped in both batches to assess concordance between the data sets. The data sets had 230,228 genotyped variants in common, and concordance between the genotypes was high (mean correlation r2 for SNPs used in GRS = 0.95). We used the 1000 Genomes reference panel to impute each data set separately, resulting in a total of 34.5 and 27.8 million well-imputed variants (r2 >0.8) for the MEGA and Affymetrix data sets, respectively. We combined high-quality imputed variants for analysis.
GRS
Two previously published T1D GRS were generated for analyses, including a 30-SNP score used in many studies (9) and a more recent 67-SNP score with a greater diversity of HLA variation, including 18 interactions between major HLA class 2 alleles and additional independent variants. We generated these GRS as per previous publications (9,13). Additionally, we generated a T2D GRS from summary statistics of a previously published comprehensive genome-wide association study in individuals of White European ancestry (15). We used all genome-wide–associated variants from this study where we had either directly genotyped or well-imputed SNP results available (r2 >0.8). Details of variants used are in the Supplementary Material, and the code to generate the scores used in this article can be found on an open GitHub repository (https://github.com/sethsh7/hla-prs-toolkit). To allow for comparison of T1D and T2D genetic risk, we generated standardized z scores within the cohort (Supplementary Fig. 5).
Clustering by Genetic Risk
We included the T1D and T2D GRS in a multivariate clustering model. We used Gaussian mixture modeling with the assumption that polygenic risk across a population is well represented by a mixture of Gaussian distributions. We used unsupervised clustering, specifying two clusters with DAA+/IS and DAA−/IR individuals to identify two distinct groups of genetic risk. This enabled unbiased comparison with intermediate groups.
Statistical Methods
We assessed the validity of the 30 SNP and 67 SNP T1D GRS among SEARCH participant samples stratified by self-reported race/ethnic group, with SEARCH etiologic type as an outcome. We used SEARCH etiologic types with concordant features (DAA+/IS and DAA−/IR as reference T1D and T2D groups, respectively) for comparison. In regression analyses, we compared receiver operating characteristic areas under the curve generated by DeLong algorithm to assess power of the GRS as a classifier of diabetes type. We compared differences in GRS and other continuous variables using parametric and nonparametric tests as appropriate. Basic statistical analysis was performed in Stata and R software. Gaussian mixture model clustering was performed using the SKLearn package for Python.
Results
A total of 2,045 individuals had baseline assessments and associated genetic data sufficient to generate all GRS described and sufficient phenotype information to define SEARCH etiologic type. Cohort characteristics are described in Supplementary Table 1.
A 67-SNP T1D GRS Incorporating Greater HLA Diversity Outperforms Previous GRS in Non-White Race/Ethnicity
We tested our original 30-SNP T1D GRS against a recently published 67-SNP score incorporating greater HLA diversity by comparing the discriminative power between the two most clearly defined etiologic groups as a gold standard: DAA+/IS versus DAA−/IR. Overall, the 67-SNP score was more discriminative than the 30-SNP score (receiver operating characteristic area under the curve 0.903 [95% CI 0.886–0.920] vs. 0.852 [0.830–0.873], P < 0.0001) (Supplementary Fig. 1). This was particularly evident in Hispanic participants (0.935 vs. 0.825, P < 0.0001) and to a lesser extent in Black participants (0.851 vs. 0.807, P = 0.11) (Supplementary Figs. 1 and 2). We therefore used the 67-SNP score for the remainder of the analyses.
T1D and T2D GRS in SEARCH Etiologic Types
We analyzed the T1D and T2D GRS across all SEARCH etiologic categories. The T1D GRS was highest in the DAA+/IS group and lowest in the DAA−/IR group (Fig. 1A and B). The mean GRS in the DAA+/IR and DAA−/IS groups was between these (Fig. 1B). Both groups had a mean score closer to the DAA+/IS group (Fig. 1B). T2D GRS showed an inverse pattern to this (Fig. 1A and B), and T2D GRS in the DAA+/IR and DAA−/IS groups had a closer mean score to the DAA+/IS group.
T1D and T2D GRS overlap across SEARCH etiologic types but show an inverse relationship when comparing T1D and T2D genetic risk. A: The standardized distribution of the 67-SNP T1D GRS and 397-SNP T2D GRS stratified by SEARCH etiologic type: DAA+/IS, DAA+/IR, DAA−/IS, and DAA−/IR. Boxes represent median and IQR of each score, with whiskers representing the range, excluding outliers (defined by 1.5 × IQR outside the upper or lower quartile). B: The mean T1D GRS z score plotted against mean T2D GRS z score for SEARCH etiologic types. Error bars are the SEM.
T1D and T2D GRS overlap across SEARCH etiologic types but show an inverse relationship when comparing T1D and T2D genetic risk. A: The standardized distribution of the 67-SNP T1D GRS and 397-SNP T2D GRS stratified by SEARCH etiologic type: DAA+/IS, DAA+/IR, DAA−/IS, and DAA−/IR. Boxes represent median and IQR of each score, with whiskers representing the range, excluding outliers (defined by 1.5 × IQR outside the upper or lower quartile). B: The mean T1D GRS z score plotted against mean T2D GRS z score for SEARCH etiologic types. Error bars are the SEM.
The Impact of Race/Ethnicity on T1D and T2D GRS Across Diabetes Types
We examined patterns of T1D and T2D GRS across SEARCH etiologic type, stratified by self-reported ethnicity and genetically defined race/ethnicity. Self-reported ethnicity was highly concordant with genetically defined ethnicity. (Supplementary Fig. 3 shows principle component plots stratified by self-reported ethnicity, and Supplementary Fig. 4 shows concordance of genetically defined most likely ethnicity against self-reported ethnicity.) We had sufficient numbers for analysis of participants self-reporting as either non-Hispanic White, Hispanic, or Black, with the remainder of participants grouped together as other race/ethnicity. Our primary analysis focused on self-reported ethnicity, as this is routinely available to clinicians.
We observed a similar pattern of high T1D GRS and low T2D GRS in the DAA+/IS group and the inverse for the DAA−/IR group in all races/ethnicities (White, Black, Hispanic, and other). However, within each racial/ethnic group, the distribution and mean GRS in each type differed (Fig. 2A). Non-Hispanic White children had a very close clustering of T1D and T2D GRS in DAA+/IS, DAA+/IR, and DAA−/IS. Black, Hispanic, and other children had a higher overall T2D GRS. DAA+/IS Hispanic children had higher T1D and T2D GRS than White children (Fig. 2A). We stratified all children by progression to insulin deficiency, an alternate outcome that can be used to define T1D defined by fasting or nonrandom C-peptide <0.24 ng/mL (Fig. 2B). This demonstrated a similar trend of genetic risk, with the highest T1D GRS in those with severe insulin deficiency and the lowest T2D GRS in those with persistent endogenous insulin. As highlighted in Fig. 2A and B, even if GRS are discriminative of diabetes type across major ethnic groups, population stratification is still important to consider.
T1D and T2D GRS show ethnic stratification but similar patterns across race/ethnicity in the SEARCH study. A: T1D and T2D GRS z scores stratified by SEARCH etiologic type: DAA+/IS, DAA+/IR, DAA−/IS, and DAA−/IR. B: T1D and T2D GRS z scores stratified by progression to insulin deficiency defined by fasting or nonfasting C-peptide (C-pep) <0.24 ng/mg and self-reported race/ethnicity. Data are mean ± SEM.
T1D and T2D GRS show ethnic stratification but similar patterns across race/ethnicity in the SEARCH study. A: T1D and T2D GRS z scores stratified by SEARCH etiologic type: DAA+/IS, DAA+/IR, DAA−/IS, and DAA−/IR. B: T1D and T2D GRS z scores stratified by progression to insulin deficiency defined by fasting or nonfasting C-peptide (C-pep) <0.24 ng/mg and self-reported race/ethnicity. Data are mean ± SEM.
Combined Clustering Shows an Increased Probability of T1D in Intermediate Etiologies
We combined both T1D and T2D GRS in an unsupervised multivariate Gaussian mixture model. Restricting the data to the well-defined etiologic types (DAA+/IS and DAA−/IR), we trained two distinct genetic clusters of high T1D risk versus high T2D risk (Fig. 3A and B, modeled distributions represented by red and blue). We used the model to generate a probability of T1D as the probability of being in the DAA+/IS cluster. Within intermediate groups, the DAA+/IR group clustered most like the well-defined DAA+/IS group (median [IQR] probability DAA+/IS 0.959 [0.782–0.991] vs. DAA+/IR 0.937 [0.638–0.987]) (Fig. 3D). The DAA−/IS intermediate group clustered with a lower overall probability but still had higher T1D probability (mean DAA−/IS 0.876 [0.140–0.979]) (Fig. 3C).
A Gaussian mixture model (GMM) was trained on the assumption that DAA−/IR and DAA+/IS were two groups defined by the relationship of T2D GRS and T1D GRS. A and B: Black dots represent individuals in these groups, with the blue cloud representing the modeled DAA−/IR probable T2D distribution and the red cloud representing the DAA+/IS probable T1D distribution. The model was trained on these individuals. C and D: Where individuals in the DAA−/IS and DAA+/IR groups lay relative to these distributions.
A Gaussian mixture model (GMM) was trained on the assumption that DAA−/IR and DAA+/IS were two groups defined by the relationship of T2D GRS and T1D GRS. A and B: Black dots represent individuals in these groups, with the blue cloud representing the modeled DAA−/IR probable T2D distribution and the red cloud representing the DAA+/IS probable T1D distribution. The model was trained on these individuals. C and D: Where individuals in the DAA−/IS and DAA+/IR groups lay relative to these distributions.
Stratification by Genetic Probability of T1D Within Etiologic Types
We investigated how the probability of being in the T1D cluster was split among the four etiologic groups and how this probability affected progression to C-peptide deficiency within groups (Fig. 4 and Supplementary Fig. 6A and B). In both DAA+/IS and DAA+/IR, irrespective of IR, there was very little relationship between progression to insulin deficiency and model probability of T1D, with the majority of children progressing to severe insulin deficiency, and those with persistent endogenous insulin also having high T1D probability (Fig. 4A). It is therefore likely that autoantibody-positive children with persistent C-peptide are likely to have T1D but less β-cell destruction. However, genetically defined probability of T1D was strongly associated with progression to insulin deficiency in DAA−/IS and DAA−/IR. The majority of children with persistent endogenous insulin had low genetic T1D probability, and the majority who progressed to insulin deficiency had high T1D probability, independent of etiologic type (Fig. 4). This finding suggests that the greatest clinical utility of T1D and T2D genetic risk for classification may be in classifying autoantibody-negative children.
T1D and T2D GRS gaussian mixture model (GMM) derived probability of type 1 diabetes associates with progression to severe insulin deficiency, particularly in autoantibody negative individuals. We compared GMM probability of being in the T1D (DAA+/IS) distribution (defined by T1D and T2D GRS), to progression to insulin deficiency measured by follow up fasting C-peptide at a median of 8 years follow up in SEARCH. A: Association of T1D probability in autoantibody-positive individuals (overlapping points includes random noise to highlight majority of points at high probability of T1D). B: Association in autoantibody-negative individuals.
T1D and T2D GRS gaussian mixture model (GMM) derived probability of type 1 diabetes associates with progression to severe insulin deficiency, particularly in autoantibody negative individuals. We compared GMM probability of being in the T1D (DAA+/IS) distribution (defined by T1D and T2D GRS), to progression to insulin deficiency measured by follow up fasting C-peptide at a median of 8 years follow up in SEARCH. A: Association of T1D probability in autoantibody-positive individuals (overlapping points includes random noise to highlight majority of points at high probability of T1D). B: Association in autoantibody-negative individuals.
Conclusions
Our study builds on previous work by showing that a T1D GRS is discriminative of T1D in a racially and ethnically diverse youth cohort from the U.S. The finding that a 67-SNP T1D GRS is similarly discriminative in non-White U.S. race/ethnicity highlights that incorporating greater diversity of genetic variation in the HLA region may be key in generating autoimmune disease GRS with multiethnic utility. The strong association of T1D and T2D GRS with SEARCH etiologic types supports the notion that the majority of youth with intermediate etiologic types (DAA+/IR, DAA−/IS) are likely to have T1D, particularly in non-Hispanic White children. By studying an independent outcome of progression to C-peptide deficiency defined by follow-up fasting C-peptide measurements, we were able to show that a model combining T1D and T2D genetic risk may predict progression to insulin deficiency in autoantibody-negative children.
We and others have previously demonstrated that a T1D GRS was discriminative of T1D in both prediction and diagnosis (9,10). Winkler et al. (10) used HLA alleles and 40 non-HLA SNPs to quantify T1D genetic risk and demonstrated improved prediction of T1D compared with HLA alone. In our 67-SNP score, we captured more T1D HLA risk alleles and interactions and included additional independent loci from more recent studies (13). A limitation across genetic association studies is a focus on White European populations. Perry et al. (20) described the utility of a T1D GRS in discriminating diabetes type in Hispanic individuals but found less benefit in people self-reporting as Black (independent of Hispanic status). Onengut-Gumuscu et al. (21) highlighted genetic similarities in T1D associations between African Americans and Europeans, as well as differences in SNPs at associated risk loci, with improved discrimination of T1D using an African American–specific GRS. The 67-SNP T1D GRS used in this study outperformed our original risk score perhaps because it more robustly assesses HLA alleles that carry a similar risk independent of ethnicity. Importantly, T1D and T2D GRS showed ethnic stratification so that even if scores aid in diabetes discrimination, it is clear that an adjustment for ethnicity is needed to apply any cutoffs for classification. Increased population admixture may make racial and ethnic groups less well defined by self-report or genetically, and therefore, an approach that can incorporate this, or account for ethnic differences, is important for future work.
We were able to assess genetic risk for both T1D and T2D across the major etiologic types previously defined by SEARCH (8). We previously demonstrated that people in the DAA+/IS, DAA+/IR, and DAA−/IS groups were enriched for T1D HLA risk (8), suggesting that the majority likely had T1D. Our study supports this conclusion, with elevated T1D genetic risk in the DAA+/IS, DAA+/IR, and DAA−/IS groups (Fig. 1) and an inverse trend for T2D. Both DAA+/IR and DAA−/IS groups had lower T1D GRS than the DAA+/IS group, particularly in participants who were Black or Hispanic, likely reflecting a subset of individuals within these groups who have T2D (Fig. 2).
We used an unsupervised clustering approach incorporating both T1D and T2D GRS (Fig. 3). We identified two clusters of T1D and T2D risk and evaluated the continuous distribution of model probability ranging from high to low T1D probability. This enabled us to examine the distribution of genetic risk within the intermediate etiologic groups compared with the DAA+/IS and DAA−/IR groups, which we assumed represented T1D and T2D, respectively. Interestingly, the utility of genetics to predict insulin deficiency was most evident in the DAA− participants, who formed 27% of the cohort. The DAA−/IS group showed a lower mean probability of T1D but still had an average higher T1D than T2D probability. Larger cohorts with heterogeneous diabetes types, across all U.S. ethnicities, are needed to validate this approach and allow generation of ethnicity specific or ethnicity-normalized cutoffs to aid classification. Further analysis of clustering methods is required to improve multivariate classification and investigation of intermediate etiologies, including the study of individual genetic associations when sufficient power exists.
A reference standard for classification is elusive in diabetes research and clinical care, as no single metric is a perfect classifier or gold standard for diabetes type. Unlike many diseases, biopsy to define diabetes etiology is not possible. It is therefore difficult to identify individuals who may be potentially better classified. Our study is limited to highlighting individuals who could be considered for reclassification but has not investigated whether this would be practical at an individual level or would inform treatment changes. Assessment of progression to β-cell failure may have potential implications for treatment (e.g., the choice of physiological replacement doses of insulin, oral agents that rely on β-cell function). While not available at diagnosis, progression to C-peptide deficiency is one approach that we have previously taken to validate diabetes classification models (14,22). Our study is unique in its collection of data from close to diagnosis with follow-up C-peptide testing >3 years (median 8 years) postdiagnosis on a subset of the sample, thus allowing assessment of progression to insulin deficiency.
Our study has limitations. We performed a primary analysis using self-reported race/ethnicity, but genetically defined race/ethnicity is a potentially more precise method. However genetically defined race/ethnicity is not routinely available to clinicians, and our primary analysis and stratification focused on variables that could be available to clinicians at the time of diagnosis. Our study was also limited to race/ethnicities presenting to the recruiting centers for this study, of which the majority was non-Hispanic White with a smaller sample size of non-Hispanic Black and Hispanic individuals, and cannot necessarily be extrapolated to other populations. There was a delay between diabetes diagnosis and the assessment of autoantibody status. It is possible that some individuals who were autoantibody negative in this study may have been autoantibody positive at diagnosis; however, the prevalence of autoantibody positivity in SEARCH is very similar to previously reported autoantibody-positive proportions at diabetes onset, and the mean diabetes duration at the time of the initial visit was <1 year. Autoantibody testing for ZnT8 autoantibodies was not available in all individuals; it is possible that some individuals who were autoantibody negative would have been positive for ZnT8 autoantibodies or had autoimmunity to other autoantigens not measured. We used T1D and T2D GRS primarily derived from White European populations, but as more transancestry analyses of larger case-control cohorts emerge, it is likely that more discriminative polygenic scores will become available for clinical application. We assumed that the variants contributing to the GRS were acting together, but it is possible that individual variants within these scores act differentially across groups. More detailed association analyses may reveal differences in the impact on individual variants, but we were not able to analyze this as part of this study. We performed the cluster analysis using the whole cohort. In the future, it will hopefully be possible to perform analyses stratified by ethnicity in larger samples. We were not able to assess individual genetic loci differences between diabetes types or other molecular differences between the diabetes types or individuals of different genetic risk. Further molecular characterization of individuals across the various etiologic types and of opposing genetic risk may help to explain the mechanism of these differences. Currently, there is no facility to routinely generate a T1D or T2D GRS in clinical care. However, we and others (23,24) have now developed inexpensive and accurate genotyping assays for a T1D GRS that could be used for this purpose (24). Commercial availability and popularity of SNP genotyping and genome sequencing data may also make this more easily accessible and effective in the near future.
In conclusion, in a large U.S. study of youth with diabetes diagnosed before age 20 years, we have shown that an improved T1D GRS incorporating greater HLA diversity is discriminative of diabetes type across self-reported White, Hispanic, and Black race/ethnicity. We have confirmed that the majority of youth with intermediate etiologic diabetes types likely have T1D. While our results imply that autoantibody positivity in pediatric diabetes is strongly associated with progression to insulin deficiency, we also highlight that the significant number of children who are “atypical” for pediatric T1D by being autoantibody negative (27%) or who have intermediate biomarker and clinical features (30%) may benefit from incorporating GRS into clinical classification. A combined model incorporating information from T1D and T2D GRS may allow identification of children most likely to have T1D, and therefore progress to insulin deficiency, but needs further study in larger numbers of individuals, particularly across all non-White U.S. ethnicities.
R.A.O. and S.A.S. contributed equally to this work.
This article contains supplementary material online at https://doi.org/10.2337/figshare.19145813.
Article Information
Acknowledgments. The SEARCH study investigators are indebted to the many youth and their families and health care providers whose participation made this study possible.
Funding. The authors acknowledge the involvement of the Kaiser Permanente Southern California Marilyn Owsley Clinical Research Center (funded by Kaiser Foundation Health Plan and supported in part by the Southern California Permanente Medical Group), the South Carolina Clinical and Translational Research Institute at the Medical University of South Carolina (National Institutes of Health [NIH]/National Center for Advancing Translational Sciences [NCATS] grants UL1 TR000062 and UL1 TR001450), Seattle Children’s Hospital and the University of Washington (NIH/NCATS grant UL1 TR00423), University of Colorado Pediatric Clinical and Translational Research Center (NIH/NCATS grant UL1 TR000154), the Barbara Davis Center at the University of Colorado at Denver (Diabetes and Endocrinology Research Center NIH grant P30 DK57516), the University of Cincinnati (NIH/NCATS grants UL1 TR000077 and UL1 TR001425), and the Children with Medical Handicaps program managed by the Ohio Department of Health. The SEARCH 4 study is funded by the NIH NIDDK (grants 1R01DK127208-01 and 1UC4DK108173) and supported by the Centers for Disease Control and Prevention. The Population Based Registry of Diabetes in Youth Study is funded by the Centers for Disease Control and Prevention (DP-15-002) and supported by the NIH NIDDK (grants 1U18DP006131, U18DP006133, U18DP006134, U18DP006136, U18DP006138, and U18DP006139). The SEARCH 1–3 studies are funded by the Centers for Disease Control and Prevention (PA no. 00097, DP-05-069, and DP-10-001) and supported by NIDDK. NIH funding supported the Kaiser Permanente Southern California (grants U48/CCU919219, U01 DP000246, and U18DP002714), University of Colorado Denver (grants U48/CCU819241-3, U01 DP000247, and U18DP000247-06A1), Cincinnati Children’s Hospital Medical Center (grants U48/CCU519239, U01 DP000248, and 1U18DP002709), University of North Carolina at Chapel Hill (grants U48/CCU419249, U01 DP000254, and U18DP002708), Seattle Children’s Hospital (grants U58/CCU019235-4, U01 DP000244, and U18DP002710-01), and Wake Forest University School of Medicine (grants U48/CCU919219, U01 DP000250, and 200-2010-35171). R.A.O is funded by a Diabetes UK Harry Keen Fellowship (16/0005529). S.A.S is funded by a Diabetes UK PhD studentship (16/0005529). R.A.O, W.A.H., and L.F. are funded by a JDRF strategic research agreement (3-SRA-2019-827-S-B).
This study includes data provided by the Ohio Department of Health, which should not be considered an endorsement of this study or its conclusions.
Duality of Interest. R.A.O previously had U.K. Medical Research Council “confidence in concept” funding to develop a T1D GRS biochip with Randox Laboratories R&D and reports research funding from Randox Laboratories R&D. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. R.A.O., S.A.S., C.P., W.A.H., J.D., and D.D. researched data, wrote the manuscript, and contributed to discussion. L.F. contributed to the data analysis. M.N.W. contributed to GRS generation, reviewed and edited the manuscript, and contributed to the discussion. G.I., A.W., M.J.R., L.W., and L.M.D. reviewed and edited the manuscript and contributed to the discussion. R.D. analyzed data, reviewed and edited the manuscript, and contributed to discussion. J.M.L. collected the data, reviewed and edited the manuscript, and contributed to discussion. R.A.O is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. Parts of this study were presented in abstract form at the 79th Scientific Sessions of the American Diabetes Association, San Francisco, CA, 7–11 June 2019.