African Americans (AAs) have been underrepresented in polygenic risk score (PRS) studies. Here, we integrated genome-wide data from multiple observational studies on type 2 diabetes (T2D), encompassing a total of 101,987 AAs, to train and optimize an AA-focused T2D PRS (PRSAA), using a Bayesian polygenic modeling method. We further tested the score in three independent studies with a total of 7,275 AAs and compared the PRSAA with other published scores. Results show that a 1-SD increase in the PRSAA was associated with 40–60% increase in the odds of T2D (odds ratio [OR] 1.60, 95% CI 1.37–1.88; OR 1.40, 95% CI 1.16–1.70; and OR 1.45, 95% CI 1.30–1.62) across three testing cohorts. These models captured 1.0–2.6% of the variance (R2) in T2D on the liability scale. The positive predictive values for three calculated score thresholds (the top 2%, 5%, and 10%) ranged from 14 to 35%. The PRSAA, in general, performed similarly to existing T2D PRS. The need remains for larger data sets to continue to evaluate the utility of within-ancestry scores in the AA population.

Article Highlights
  • This study aimed to better understand the performance of existing and a novel polygenic risk scores (PRS) for type 2 diabetes in African American (AA) populations.

  • A PRS was developed using only genetic data from AA populations (PRSAA) and compared with scores developed using genetic data from other ancestral populations.

  • The performance metrics of the PRSAA were comparable to a published multiancestry PRS developed using training data from much larger study samples.

  • The utility of single-ancestry PRS in AAs should be reevaluated when larger AA training data sets are available.

Type 2 diabetes (T2D) is a multifactorial disease increasing in prevalence likely due to genetic risk activated in the presence of obesogenic environments (1–3). Genome-wide association study (GWAS) discovery efforts in T2D have been substantial and, most recently, have accrued data on >1 million individuals. Replicated variants from these studies have given important insight into the genetic background of disease, but results generally have not been useful to predict individual disease risk. Overall, since T2D is preventable, the need remains for better screening tools.

Polygenic risk scores (PRS) that capture variation across thousands to millions of disease-associated markers may prove promising for predicting individual level T2D risk by leveraging results from existing, large GWAS efforts (4). For instance, a study by Vujkovic et al. (5) reported that European ancestry individuals in the top decile of a score derived from a large consortia GWAS data set (∼900,000 participants) had a 5.21-fold higher odds (95% CI 4.9–5.5) of developing diabetes compared with those in the lowest decile (4,5). However, the score in that study was not applied to other ancestry groups, and evidence exists of the poor transferability of European-derived scores to other populations (6), especially individuals of African ancestry. PRS trained on within-ancestry data could improve performance in this group. For instance, Chikowore et al. (7) reported that an African American (AA)-tuned PRS for T2D outperformed a multiancestry score in continental Africans. Overall, more work is needed to ensure that individuals of African ancestry equally benefit from PRS clinical translation efforts to avoid further exacerbating existing health disparities (6,8).

While several PRS for T2D have been described, they have either excluded or underrepresented AA populations (4,5,9,10). Here, we present the construction and evaluation of an AA-focused T2D PRS and assess its performance compared with other scores created in European and multiancestry cohorts. Specifically, we integrated T2D GWAS summary statistics from the Electronic Medical Records and Genomics (eMERGE) Network, the Reasons for Geographic and Racial Differences in Stroke Study (REGARDS), the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium, and the Million Veteran Program (MVP) using state-of-the-art Bayesian polygenic modeling methods to derive a novel T2D PRS (PRSAA) (11–13). We optimized the score in the Genetics of Hypertension Associated Treatment Study (GenHAT) and evaluated the score in an additional three independent cohorts. This effort represents one of the first steps toward addressing the need for developing PRS that focus on representation of individuals of AA ancestry.

Overview of Study Design

To train the novel PRSAA, we used GWAS summary statistics from the African ancestry individuals belonging to four large-scale population cohorts: 1) 12,472 individuals from eMERGE III (2,688 case subjects and 9,784 control subjects) (14); 2) 6,745 individuals from REGARDS (1,659 case subjects and 5,086 control subjects) (12); 3) a meta-GWAS of T2D in 23,827 individuals performed by the MEDIA Consortium (8,284 case subjects and 15,543 control subjects); and 4) 53,445 individuals from the MVP (23,305 case subjects and 30,140 control subjects) (13). Next, using fixed-effect meta-analyzed summary statistics from the four above-mentioned studies, PRS were derived using the Bayesian polygenic modeling continuous shrinkage (PRS-CS) method (15). Scores were then optimized in 2,776 case subjects and 2,722 control subjects from the GenHAT study (16) and tested separately in three additional cohorts, including the Hypertension Genetic Epidemiology Network (HyperGEN) (17), Warfarin Pharmacogenetics Cohort (WPC), and the BioMe BioBank (18,19), totaling 1,937 case subjects and 5,338 control subjects. Figure 1 summarizes the construction and evaluation of the PRSAA constructed using the PRS-CS software. The sample characteristics of the studies making up the summary statistics, validation, and testing samples are presented in the Supplementary Material. Below, we briefly describe the sample characteristics and processing of genetic data in each data set.

Figure 1

Study overview.

Case Subject-Control Subject Definitions

T2D case subjects were defined with T2D ICD-9 and ICD-10 codes using study specific definitions (eMERGE [9], BioMe [19], WPC, and MVP) (20), a single measurement of glucose (fasting glucose ≥126 mg/dL [7 mmol/L] or random glucose ≥200 mg/dL [11.1 mmol/L]), or use of any glucose-lowering medications (REGARDS [21] and HyperGEN [17]). GenHAT used a single fasting glucose of ≥140 mg/dL or use of any glucose-lowering medications for the baseline definition of T2D based on the guidelines when the data were collected (22). T2D definitions for the various cohorts making up the MEDIA consortia have been described (13).

Meta-analysis of Summary Statistics

GWAS was conducted for T2D separately in eMERGE and REGARDS using logistic regression models adjusted for age, sex, and principal components (PCs) of ancestry in PLINK 1.9 (23). We used inverse variance–weighted fixed-effect meta-analysis to combine the summary statistics from MEDIA, eMERGE, REGARDS, and MVP AAs (totaling 96,489 participants) using the METASOFT meta-analysis framework. After the meta-analysis, summary effect size estimates for variants contributing genotypes from at least two of the four studies were retained in the PRS variants weight-derivation process.

PRS Construction and Optimization

We used PRS-CS, a Bayesian polygenic modeling method, to construct the PRSAA (15). We applied PRS-CS to the T2D AA meta-GWAS summary statistics and used the 1000 Genomes African linkage disequilibrium reference panel for PRS construction. Given a global shrinkage parameter (φ), the PRS-CS output included 1,218,275 HapMap3 variants and their posterior weights. To optimize the hyper-parameter φ, we applied the single nucleotide polymorphism (SNP) weights to the validation sample, GenHAT, across four φ values (1.0, 1 × 10−02, 1 × 10−04, and 1 × 10−06), and used a combination of R2 (variation explained in the T2D status) and area under the receiver operator curve (AUC) to select the best performing model. For comparison, in GenHAT, we also constructed a score using the pruning and threshold method implemented in PRSice2 (24). The results were not superior to PRS-CS, and thus, we did not apply the score to the testing cohorts.

Evaluation of PRS

We assessed the optimized PRS, denoted as PRSAA, in three AA cohorts (i.e., HyperGEN, WPC, and BioMe). For each of these data sets, we calculated the PRS for each individual by multiplying the number of risk alleles by the algorithm-inferred weights for each variant and summing across the genome using the –score function in PLINK 1.9 (23). For comparison, we calculated six other PRS scores: a single-ancestry PRS constructed using European T2D GWAS (4) and PRS-CS, a single-ancestry PRS constructed using African T2D GWAS from the MEDIA consortium and PRS-CS (13), a multiancestry genome-wide PRS (9) constructed using GWAS from Mahajan (European) + MEDIA (AA) + Biobank Japan (BBJ-East Asian) and PRS-CSx (applied weights from the PGS Catalog using the score function in PLINK 1.9; PGS Publication ID PGP000331) (9), and significance restricted scores based on 582 SNPs from the Vujkovic et al. GWAS (5) tuned to three ethnic strata (multiancestry, African, and European) published by Polfus et al. (10) (also applied weights from the PGS Catalog using the score function in PLINK 1.9; PGS Publication ID PGP000193). As a sensitivity analysis, we additionally constructed another genome-wide multiancestry score developed using PRS-CSx (cross-population continuous shrinkage priors) based on the European and Asian components of the summary statistics from Ge et al. (9) and applied the score to GenHAT, HyperGEN, and WPC.

The predictive performance of the PRSAA was assessed via a range of metrics. To measure the overall prediction accuracy, we calculated 1) the proportion of variation in T2D case subject-control subject status explained by the PRSAA on the liability scale (R2 liability) (25) after accounting for a basic set of covariates (age, sex, top 10 PCs, and study site); 2) the AUC for a covariates-only model (age, sex, top 10 PCs, and study site), a PRSAA-only model, and a PRSAA combined with the covariates model; 3) odds ratio (OR) per SD change in the covariate-adjusted PRSAA, as well as the OR for being in the top 2%, 5%, or 10% of the PRSAA distribution relative to the remaining participants; and 4) we calculated the sensitivity, specificity, positive predictive value (PPV; the proportion of identified high-risk individuals who are true T2D case subjects), and negative predictive value (NPV; the proportion of unidentified individuals that are T2D control subjects) to examine the clinical utility of the score. Since PPV and NPV depend on the prevalence of the disease, we report prevalence-adjusted PPV and NPV calculated as:
where the prevalence of T2D (prev) was extracted from the literature (3).

Data and Resource Availability

GWAS summary statistics from the MEDIA study are available in the database of Genotypes and Phenotypes (dbGaP; Study Accession: phs000930.v9.p1) (12). GWAS summary statistics from the MVP study are also available on dbGaP (Study Accession: phs001672.v11.p1) (26). The eMERGE phenotypic and genetic data are available on dbGaP (Study Accession: phs001584.v2.p2) (10). Individual-level phenotypic and genetic data from REGARDS (Study Accession: phs002719.v1.p1) (11), GenHAT (Study Accession: phs002716.v1.p1), WPC (Study Accession: phs000708.v1.p1), and HyperGEN (Study Accession: phs001293.v3.p1) are also on dbGaP (17). The BioMe genotype data sets used in this study were generated by Regeneron and are not publicly available. However, the data will be made available for purposes of replicating the results by contacting the corresponding author and appropriate collaboration and/or data-sharing agreements (19). For scores compared with the PRSAA in Fig. 2, the genome-wide multiancestry T2D PRS (Mahajan+MEDIA+BBJ) is deposited in the PGS Catalog (https://www.pgscatalog.org; PGS ID: PGS002308) (8). The significance-restricted T2D PRS from Polfus et al. (10) are also available in the PGS Catalog (https://www.pgscatalog.org; PGS ID: PGP000193) (9). PRS from Mahajan (European) and MEDIA (AA) were generated in PRS-CS using the publicly available summary statistics from the DIAGRAM consortium website (https://diagram-consortium.org) and the MEDIA GWAS on dbGaP (Study Accession: phs000930.v9.p1.

Figure 2

OR per SD and AUC difference (AUC Diff) between covariate-adjusted model and a model that adds PRS to the covariate adjusted model), and R2 liability (R2 Liab) of the PRS for the new AA-derived score (PRSAA) compared with six other scores, including one genome-wide AA-specific score, one genome-wide European score, one genome-wide multiancestry score, and three (multiancestry, European, and African ancestry) significance-restricted SNP scores. AFR, African; EA, European American; EAS, East Asian; HIS, Hispanic.

Figure 2

OR per SD and AUC difference (AUC Diff) between covariate-adjusted model and a model that adds PRS to the covariate adjusted model), and R2 liability (R2 Liab) of the PRS for the new AA-derived score (PRSAA) compared with six other scores, including one genome-wide AA-specific score, one genome-wide European score, one genome-wide multiancestry score, and three (multiancestry, European, and African ancestry) significance-restricted SNP scores. AFR, African; EA, European American; EAS, East Asian; HIS, Hispanic.

Close modal

Baseline characteristics of the study populations making up the summary statistics, validation, and testing data sets are provided in Table 1. Manhattan and quantile-quantile plots for the fixed-effect meta-analysis of eMERGE, REGARDS, MEDIA, and MVP are provided in Supplementary Fig. 1. Women made up ∼13–63% of each sample. The average age within each testing cohort ranged from 47 to 57 years.

Table 1

Baseline demographics

Age, yearsSexCase subjectsControl subjects
(mean ± SD)(% female)(n)(n)
EMERGE 45.80 ± 22.9 60.00 2,688 9,784 
MEDIA *  8,284 15,543 
REGARDS 63.75 ± 9.34 60.53 1,659 5,086 
MVP 61.70 ± 12.3 12.80 23,305 30,140 
Validation     
 GenHAT 66.09 ± 7.52 55.31 2,776 2,722 
Testing     
 HyperGEN 47.01 ± 12.81 63.52 402 1,494 
 WPC 57.48 ± 15.24 57.56 300 355 
 BioMe 56.28 ± 16.02 58.98 1,235 3,489 
Age, yearsSexCase subjectsControl subjects
(mean ± SD)(% female)(n)(n)
EMERGE 45.80 ± 22.9 60.00 2,688 9,784 
MEDIA *  8,284 15,543 
REGARDS 63.75 ± 9.34 60.53 1,659 5,086 
MVP 61.70 ± 12.3 12.80 23,305 30,140 
Validation     
 GenHAT 66.09 ± 7.52 55.31 2,776 2,722 
Testing     
 HyperGEN 47.01 ± 12.81 63.52 402 1,494 
 WPC 57.48 ± 15.24 57.56 300 355 
 BioMe 56.28 ± 16.02 58.98 1,235 3,489 
*

Average age ranged from 39.35 ± 4.1 to 77.4 ± 5.7 years across 17 studies. See Supplementary Table 2 in Ng et al. (13). Percentage of women ranged from 54 to 100% across 17 studies. See Supplementary Table 2 in Ng et al. (13).

The prediction performance of the PRSAA in GenHAT across four global shrinkage parameters (1.0, 1.0 × 10−02, 1.0 × 10−04, and 1.0 × 10−06) from PRS-CS is presented in Supplementary Table 1. Results for PRSice2 in GenHAT are presented in Supplementary Table 2, with and without clumping across a range of significance thresholds from 1.0 × 10−08 to 1.0. Based on the strongest association results across both methods, we selected the PRS-CS model with shrinkage parameter 1.0 × 10−06 for further evaluation in independent testing cohorts. Results for HyperGEN, WPC, and BioMe are presented in Table 2. PRSAA was associated with T2D status in each cohort. Each 1-SD increase in the PRSAA was associated with increased odds of T2D, with an OR of 1.60 (95% CI 1.37–1.88), OR of 1.40 (95% CI 1.16–1.70), and OR of 1.45 (95% CI 1.30–1.62) in HyperGEN, WPC, and BioMe, respectively. The liability scale R2 for these models ranged from 1.0 to 2.6% across the three studies. Being in the top 10% of the score distribution relative to the bottom 90% had an OR of 1.89 (95% CI 1.32–2.69) for T2D in the HyperGEN cohort, OR of 1.75 (95% CI 1.03–3.02) in the WPC, and OR of 1.37 (95% CI 1.09–1.72) in BioMe. In BioMe, being in the top 2% of the score distribution relative to the bottom 98% was associated with an OR of 1.82 (95% CI 1.15–2.86; P = 9.5 × 10−03). The sensitivity, specificity, PPV, and NPV for these models are presented in Supplementary Table 3, which, in general, shows that the thresholded scores have high specificity (>90%) but not sensitivity (<20%). The improvement in AUC ranged from 0 to 3% for the PRSAA above and beyond a model with standard covariates only (age, sex, PCs, and study site) across the three testing cohorts (Fig. 2 and Supplementary Table 2). The prevalence-adjusted PPV (adjusted PPV) and NPV (adjusted NPV) are presented in Table 2. Results show that the proportion of individuals who are truly case subjects as determined by the evaluated “high-risk” PRSAA thresholds were 20–35%, 14–20%, and 15–20% for HyperGEN, the WPC, and BioMe, respectively.

Table 2

Prediction accuracy of the AA T2D PRS across three AA populations

CohortCutoff, %R2 liabilityAUC full model/AUC covariate-only modelOR* (95% CI)PAdjusted PPV&Adjusted NPV&
HyperGEN 10 — — 1.89 (1.32–2.69) 4.86 × 10−4 0.20 0.88 
— — 2.70 (1.70–4.25) 2.09 × 10−5 0.27 0.88 
— — 4.28 (2.15–8.56) 3.34 × 10−5 0.35 0.88 
SD 0. 02582 0.74/0.72 1.60 (1.37–1.88)* 4.32 × 10−9 — — 
WPC 10 — — 1.75 (1.03–3.02) 4.20 × 10−2 0.20 0.88 
— — 1.28 (0.62–2.67) 5.06 × 10−1 0.17 0.88 
— — 1.15 (0.38–3.48) 8.07 × 10−1 0.14 0.88 
SD 0. 01234 0.63/0.60 1.40 (1.16–1.70)* 4.14 × 10−4 — — 
BioMe 10 — — 1.37 (1.09–1.72) 7.23 × 10−3 0.15 0.88 
— — 1.78 (1.31–2.39) 1.82 × 10−4 0.19 0.88 
— — 1.82 (1.15–2.86) 9.51 × 10−3 0.20 0.88 
SD 0. 01037 0.74/0.74 1.45 (1.30–1.62)* 2.64 × 10−11 — — 
CohortCutoff, %R2 liabilityAUC full model/AUC covariate-only modelOR* (95% CI)PAdjusted PPV&Adjusted NPV&
HyperGEN 10 — — 1.89 (1.32–2.69) 4.86 × 10−4 0.20 0.88 
— — 2.70 (1.70–4.25) 2.09 × 10−5 0.27 0.88 
— — 4.28 (2.15–8.56) 3.34 × 10−5 0.35 0.88 
SD 0. 02582 0.74/0.72 1.60 (1.37–1.88)* 4.32 × 10−9 — — 
WPC 10 — — 1.75 (1.03–3.02) 4.20 × 10−2 0.20 0.88 
— — 1.28 (0.62–2.67) 5.06 × 10−1 0.17 0.88 
— — 1.15 (0.38–3.48) 8.07 × 10−1 0.14 0.88 
SD 0. 01234 0.63/0.60 1.40 (1.16–1.70)* 4.14 × 10−4 — — 
BioMe 10 — — 1.37 (1.09–1.72) 7.23 × 10−3 0.15 0.88 
— — 1.78 (1.31–2.39) 1.82 × 10−4 0.19 0.88 
— — 1.82 (1.15–2.86) 9.51 × 10−3 0.20 0.88 
SD 0. 01037 0.74/0.74 1.45 (1.30–1.62)* 2.64 × 10−11 — — 
*

Per SD. &Adjusted PPV = (sensitivity × prev)/[sensitivity × prev + (1 − specificity) × (1 − prev)]. Adjusted NPV = specificity × (1 − prev)/[specificity × (1 − prev) + (1 − sensitivity) × prev], where the prevalence of T2D was extracted from the literature.

Finally, we compared the performance of the PRSAA with other published scores in the optimization (GenHAT) and testing data sets (HyperGEN, WPC, and BioMe). Figure 2 displays the ORs and 95% CIs for a 1-SD change in the score for the PRSAA (novel African PRS) and six comparison scores: one multiancestry genome-wide, one African genome-wide score, one European genome-wide score, and three significance-restricted SNP scores (multiethnic, African, and European) (4,9,10,13). The SNP overlap of the scores (original score SNP list compared with study available SNP list) was generally very good and >90% (with the exception of HyperGEN and WPC), with some scores having overlap <90% (Supplementary Table 4). In two of the three testing data sets the genome-wide multiancestry score (orange in Fig. 2) was associated with a higher odds of T2D per 1-SD change compared with the novel PRSAA (blue in Fig. 2). For example, in HyperGEN, the PRSAA had and OR of 1.60 (95% CI 1.37–1.88) per 1 SD, while the genome-wide multiancestry score had an OR of 1.75 (95% CI 1.52–2.02). In the largest testing data set, BioMe, the genome-wide multiancestry score was the best score compared with the PRSAA and other five scores. In the sensitivity analysis, the genome-wide multiancestry score that omitted the AA summary statistics from the training data was comparable to the original genome-wide multiancestry score in GenHAT, HyperGEN, and the WPC (Supplementary Table 5). In consideration of R2 liability and AUC (Supplementary Table 3), the novel PRSAA generally performed similarly to the other PRS but was inferior to the genome-wide multiancestry PRS (with the exception of the WPC). For instance, in BioMe, the liability R2 ranged from 0.5 to 1.0% for the PRSAA and other published ancestry-specific scores, but was 2.5% for the genome-wide multiancestry PRS.

T2D is a common chronic condition that disproportionately affects AA populations. Genetic background is known to play a role in the development of the disease. Genome-wide studies have identified thousands of variants associated with T2D risk. The usefulness of individual variants has been limited in the clinical setting, but now, polygenic risk scores using the cumulative effect of hundreds to millions of variants on T2D risk are being tested in clinical settings. The overrepresentation of European descent populations in existing data sets has meant African ancestry populations have been substantially underrepresented in studies aimed at developing and testing PRS. This has the potential to introduce new health disparities in at-risk populations (6). In the current study, we used data on >100,000 AAs to construct, optimize, and test a novel PRSAA including >1 million variants.

Similar to many other chronic conditions, the vast majority of GWAS for T2D have been conducted in populations of European ancestry. This has resulted in the development and testing of PRS being largely driven by this ancestral group. For instance, in one of the largest GWAS of T2D, Vujkovic et al. (5) conducted a meta-analysis of summary statistics in 228,499 T2D case subjects and 1,178,783 T2D control subjects across multiple studies and consortia (MVP, Diabetes Meta-Analysis of Trans-Ethnic association studies [DIAMANTE], Biobank Japan, and others) of which ∼80% were of European descent, ∼4% African descent, ∼15% Asian descent, and <1% Hispanic descent (5). More recent multiethnic GWAS efforts have diversified the contributing studies, but African ancestry populations are largely still underrepresented. For example, Mahajan et al. (27) included 180,834 case subjects and 1,159,055 control subjects with 48.9% of the study population being of non-European descent. However, those of African descent made up only ∼6% of the total population, with most of the rest of the non-European samples comprising participants of Asian ancestry (>35%). Newer initiatives, such as the Transomics for Precision Medicine Program (TOPMed), Population Architecture Using Genomics and Epidemiology (PAGE II), and the All of Us Research Program, are focusing on increasing the representation of diverse populations and should greatly improve representation of African ancestry individuals in future efforts (accumulating >200,000 new samples of African ancestry) (28).

Available GWAS data have informed multiple PRS for T2D via different methods. For example, Polfus et al. (10) used summary statistics from the Vujkovic et al. (5) study to train a PRS based on 582 significant T2D risk variants. The PRS were calculated for 467,951 participants, which included 44,222 from PAGE (∼35% AA) and 423,729 individuals of European ancestry from the UK Biobank. Comparing individuals in the top 10% of the score distribution with those at average genetic risk (in the 40–60% risk range), they observed significant differences in effect estimates by ancestral population (heterogeneity P = 3.85 × 10−20). In that report, the multiethnic-weighted PRS and population-specific-weighted PRS performed similarly within each ancestry group. For example, the African ancestry-weighted PRS and the multiancestry-weighted PRS were comparable when applied to the African strata in that study. Specifically, the AUC for the PRS-only model was 0.56 and 0.57 with the population-specific versus the multiancestry weights, respectively. That trend agrees with our results for the Polfus et al. (10) scores in Fig. 2.

Mahajan et al. (27) took advantage of the population diversity in the DIAMANTE study to compare the prediction performance of multiancestry and ancestry-specific T2D PRS constructed using significance-restricted scores. The investigators reported that for the ancestry groups with the smallest effective sample size (African, Hispanic, and South Asian), the predictive power of the ancestry-specific PRS was weak (pseudo R2 ≤1%). For instance, in the African test population, the pseudo R2 for the African ancestry PRS was ∼1%, while it was ∼2.5% for the transancestry PRS in the African strata. In that study, the European ancestry-specific PRS mostly outperformed the ancestry-matched PRS in the individual ancestral groups. However, similar to the report from Polfus et al. (10), the greatest predictive power from testing data in all ancestry groups was achieved by the multiancestry PRS.

The two above-mentioned efforts used stringent thresholds to select individual SNPs in GWAS into the scores; however, PRS are robust to the inclusion of more nominal findings and false-positive variants. Data are showing better prediction of chronic diseases in target samples with a much larger representation of SNPs at the genome-wide level (millions as opposed to thousands or hundreds) (29). Indeed, Fig. 2 shows that the significance-restricted scores were not the best-performing scores across our cohorts. More recent PRS methods assume that SNP effects are drawn from mixtures of effect size distributions with the key parameters defining the genetic architectures estimated through Bayesian frameworks (e.g., PRS-CS). In our recent study, we applied a multiancestry T2D PRS, trained in >1 million European, AA, and East Asian ancestry samples, using the Bayesian polygenic modeling method PRS-CSx (30), and assessed the prediction accuracy in the multiethnic eMERGE data (11,945 case subjects; 57,694 control subjects), four AA cohorts (5,137 case subjects; 9,657 control subjects), and the Taiwan Biobank (4,570 case subjects; 84,996 control subjects) (9). The multiancestry PRS was strongly associated with T2D status across the ancestral groups examined but most strongly associated with T2D in East Asians. For instance, being in the top 2% of the PRS distribution compared with the lower 98% was associated with an OR ranging from 2.6 for AAs to 4.6 for East Asians for T2D. The discriminative ability of the multiancestry PRS estimated by AUCs supported similar conclusions. In our study, this score (orange PRS in Fig. 2) performed better than the PRSAA in two of the three validation data sets.

It is well accepted that the transferability of PRS developed in data from Europeans to other ancestral populations has limitations due to potential differences in linkage disequilibrium patterns, allele frequencies, and genetic architecture (6,31). For that reason, PRS trained in multiancestry data have generally outperformed PRS trained in European data when applied to non-European ancestral groups. However, less is known about whether multiancestry scores are superior to within-ancestry scores, especially for populations of African descent. Overall, information is mixed, with reports favoring both within-ancestry (7) and multiancestry weights for T2D scores in African populations (10,27). Reasons that have been cited in favor of multiethnic scores include multiethnic weights being more likely to reflect the true causal effect of a variant, in addition to their being estimated from much larger sample sizes. In AA populations, the enrichment of European T2D signals among more admixed individuals compared with less admixed individuals may affect the performance of a multiancestry versus an ancestry-specific trained score. For instance, in our study, the WPC was the least admixed cohort (Supplementary Fig. 2) to which we applied the T2D scores, and the PRSAA performed slightly better than the genome-wide multiancestry score in that population. Given that the WPC was also the smallest of the testing studies, this trend needs to be further studied. An important limitation of current within-ancestry scores compared with multiancestry scores has to do with the size of the GWAS summary statistics. Importantly, our PRSAA was comparable to the genome-wide multiancestry score in each of our testing cohorts, and the PRSAA was trained in one-tenth of the data. As more African ancestry data become available through new consortia studies, whether larger GWAS summary statistics for PRS development pipelines will improve ancestry-specific scores in this population has yet to be determined. This type of PRS development work is essential as genetically predicted risk remains lower in AA populations compared with European populations using genome-wide multiancestry scores (9).

Our study has several limitations. Two of the testing data sets were small, and the CIs surrounding the ORs were accordingly wide. Still, these studies represent important independent resources for the study of T2D that were not represented in the training data sets. Additionally, the larger BioMe study showed robust validation of our PRSAA as well as other scores available in the PGS Catalog. Also, this report focused on the evaluation of PRS in AAs, which is not representative of the highly diverse genetic ancestries in the continental African population. Further, evaluation of the capability of T2D PRS in identifying incident case subjects (i.e., individuals at risk to develop the disease in a future time window) in prospective cohorts was beyond the scope of our study. New data collection in clinical populations can help address this gap in future studies.

Conclusions

In summary, we constructed a novel PRSAA using a Bayesian approach incorporating >1 million SNPs for risk prediction. We showed that the score is comparable to published scores but does not outperform a multiancestry score developed using a similar pipeline trained with more data. The need remains for future work to expand the scale of non-European genomic resources to further evaluate the need for ancestry-specific models given the still limited scope of African ancestry data compared with other ancestries in genomic databases. Further, additional collection of genetic data linked to incident T2D can help determine whether these scores predict future disease and help reduce the burden of T2D in African ancestry populations.

This article contains supplementary material online at https://doi.org/10.2337/figshare.25375699.

Acknowledgments. The authors used dbGaP accession number phs001672.v11.p1. to retrieve the summary statistics data. The authors thank Million Veteran Program (MVP) staff, researchers, and volunteers, who contributed to MVP, and especially participants who previously served their country in the military and now generously agreed to enroll in the study (see https://www.research.va.gov/mvp/ for more details).

Funding. The eMERGE Network was initiated and funded by National Human Genome Research Institute through the following grants: U01HG006828 (Cincinnati Children's Hospital Medical Center and Boston Children’s Hospital), U01HG006830 (Children's Hospital of Philadelphia), U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation, and Pennsylvania State University), U01HG006382 (Geisinger Clinic), U01HG006375 (Group Health Cooperative and the University of Washington), U01HG006379 (Mayo Clinic), U01HG006380 (Icahn School of Medicine at Mount Sinai), U01HG006388 (Northwestern University), U01HG006378 (Vanderbilt University Medical Center), and U01HG006385 (Vanderbilt University Medical Center serving as the Coordinating Center). The eMERGE IV Mass General Brigham site was funded by the National Human Genome Research Institute through U01HG008685, the Columbia University site was funded through U01HG008680, and the University of Alabama site was funded through National Human Genome Research Institute (U01HG011167). This work was additionally supported by the Doris Duke Charitable Foundation grant 2020096 to A.L. and by the following National Institutes of Health grants: National Center for Advancing Translational Sciences, UL1TR001873; The New England Precision Medicine Consortium of the All of Us Research Program, OT2OD026553; National Heart, Lung, and Blood Institute, OT2HL161841, R01HL151855 (J.B.M.), and R01HL092173 (N.A.L.); National Institute of Arthritis and Musculoskeletal and Skin Diseases, P30AR070253, P30AR069625, R01AR063759 (E.W.K.), and R21AR078339 (E.W.K.); National Human Genome Research Institute, U01HG011723 and R01HG012354 (T.G.); and National Institute of Diabetes and Digestive and Kidney Diseases, U01DK105556 (M.C.Y.N.), R01DK066358 (M.C.Y.N.), R01DK078616 (J.B.M.), and K25DK128563 (A.K.). The University of Alabama at Birmingham (UAB) cohorts were funded through the following grants: National Heart, Lung, and Blood Institute supported the REGARDS genetics (R01HL136666), HyperGEN (R01HL055673), GenHAT (R01HL123782), and WPC (R01HL092173 and K24HL133373) studies. The parent REGARDS study is supported by the National Institute of Neurological Disorders and Stroke cooperative agreement U01NS041588 co-funded by the National Institute on Aging, the National Institutes of Health, and the Department of Health and Human Services. M.R.I. was further supported by National Heart and Lung Institute R35HL155466.

Duality of Interest. No potential conflicts of interest relevant to this article were reported.

Author Contributions. M.R.I. and B.D. contributed to writing the original draft and visualization. T.G. contributed to writing, reviewing, and editing the manuscript and to methodology and investigation. A.P. contributed to the investigation. V.S., N.D.A., A.C.J., E.P., and L.S. contributed to the methodology and investigation. M.L., E.K., R.J.F.L., M.C.Y.N., and L.A.L. contributed to writing, reviewing, and editing the manuscript. J.W.S., J.B.M., and E.W.K. contributed to writing, reviewing, and editing the manuscript, study conceptualization, methodology, and investigation. N.A.L. contributed to writing, reviewing, and editing the manuscript and to methodology, study conceptualization, and funding acquisition. H.K.T. contributed to writing, reviewing, and editing the manuscript, study conceptualization, and funding acquisition. M.R.I. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

1.
Langenberg
C
,
Lotta
LA.
Genomic insights into the causes of type 2 diabetes
.
Lancet
2018
;
391
:
2463
2474
2.
Hu
FB.
Globalization of diabetes: the role of diet, lifestyle, and genes
.
Diabetes Care
2011
;
34
:
1249
1257
3.
Wang
L
,
Li
X
,
Wang
Z
, et al
.
Trends in prevalence of diabetes and control of risk factors in diabetes among US adults, 1999-2018
.
JAMA
2021
;
326
:
1
13
4.
Mahajan
A
,
Taliun
D
,
Thurner
M
, et al
.
Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps
.
Nat Genet
2018
;
50
:
1505
1513
5.
Vujkovic
M
,
Keaton
JM
,
Lynch
JA
, et al.;
HPAP Consortium
;
Regeneron Genetics Center
;
VA Million Veteran Program
.
Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis
.
Nat Genet
2020
;
52
:
680
691
6.
Martin
AR
,
Kanai
M
,
Kamatani
Y
,
Okada
Y
,
Neale
BM
,
Daly
MJ.
Clinical use of current polygenic risk scores may exacerbate health disparities
.
Nat Genet
2019
;
51
:
584
591
7.
Chikowore
T
,
Ekoru
K
,
Vujkovi
M
, et al
.
Polygenic prediction of type 2 diabetes in Africa
.
Diabetes Care
2022
;
45
:
717
723
8.
Doumatey
AP
,
Ekoru
K
,
Adeyemo
A
,
Rotimi
CN.
Genetic basis of obesity and type 2 diabetes in Africans: impact on precision medicine
.
Curr Diab Rep
2019
;
19
:
105
9.
Ge
T
,
Irvin
MR
,
Patki
A
, et al
.
Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations
.
Genome Med
2022
;
14
:
70
10.
Polfus
LM
,
Darst
BF
,
Highland
H
, et al
.
Genetic discovery and risk characterization in type 2 diabetes across diverse populations
.
HGG Ad
2021
;
2
:
100029
11.
Stanaway
IB
,
Hall
TO
,
Rosenthal
EA
, et al.;
eMERGE Network
.
The eMERGE genotype set of 83,717 subjects imputed to ∼40 million variants genome wide and association with the herpes zoster medical record phenotype
.
Genet Epidemiol
2019
;
43
:
63
81
12.
Howard
VJ
,
Cushman
M
,
Pulley
L
, et al
.
The reasons for geographic and racial differences in stroke study: objectives and design
.
Neuroepidemiology
2005
;
25
:
135
143
13.
Ng
MC
,
Shriner
D
,
Chen
BH
, et al.;
FIND Consortium
;
eMERGE Consortium
;
DIAGRAM Consortium
;
MuTHER Consortium
;
MEta-analysis of type 2 DIabetes in African Americans Consortium
.
Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes
.
PLoS Genet
2014
;
10
:
e1004517
14.
Gottesman
O
,
Kuivaniemi
H
,
Tromp
G
, et al.;
eMERGE Network
.
The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future
.
Genet Med
2013
;
15
:
761
771
15.
Ge
T
,
Chen
CY
,
Ni
Y
,
Feng
YA
,
Smoller
JW.
Polygenic prediction via Bayesian regression and continuous shrinkage priors
.
Nat Commun
2019
;
10
:
1776
16.
Armstrong
ND
,
Srinivasasainagendra
V
,
Chekka
LMS
, et al
.
Genetic contributors of efficacy and adverse metabolic effects of chlorthalidone in African Americans from the Genetics of Hypertension Associated Treatments (GenHAT) Study
.
Genes (Basel)
2022
;
13
:
1260
17.
Palmieri
V
,
Bella
JN
,
Arnett
DK
, et al
.
Effect of type 2 diabetes mellitus on left ventricular geometry and systolic function in hypertensive subjects: Hypertension Genetic Epidemiology Network (HyperGEN) study
.
Circulation
2001
;
103
:
102
107
18.
Tayo
BO
,
Teil
M
,
Tong
L
, et al.
Genetic background of patients from a university medical center in manhattan: implications for personalized medicine
.
PLoS One
2011
;
6
:
e19166
19.
Kho
AN
,
Hayes
MG
,
Rasmussen-Torvik
L
, et al
.
Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study
.
J Am Med Inform Assoc
2012
;
19
:
212
218
20.
Limdi
NA
,
Brown
TM
,
Shendre
A
,
Liu
N
,
Hill
CE
,
Beasley
TM.
Quality of anticoagulation control and hemorrhage risk among African American and European American warfarin users
.
Pharmacogenet Genomics
2017
;
27
:
347
355
21.
Malla
G
,
Long
DL
,
Judd
SE
, et al
.
Does the association of diabetes with stroke risk differ by age, race, and sex? Results From the REasons for Geographic and Racial Differences in Stroke (REGARDS) study
.
Diabetes Care
2019
;
42
:
1966
1972
22.
Barzilay
JI
,
Davis
BR
,
Bettencourt
J
, et al.;
ALLHAT Collaborative Research Group
.
Cardiovascular outcomes using doxazosin vs. chlorthalidone for the treatment of hypertension in older adults with and without glucose disorders: a report from the ALLHAT study
.
J Clin Hypertens (Greenwich)
2004
;
6
:
116
125
23.
Chang
CC
,
Chow
CC
,
Tellier
LC
,
Vattikuti
S
,
Purcell
SM
,
Lee
JJ.
Second-generation PLINK: rising to the challenge of larger and richer datasets
.
Gigascience
2015
;
4
:
7
24.
Choi
SW
,
O’Reilly
PF.
PRSice-2: polygenic risk score software for biobank-scale data
.
Gigascience
2019
;
8
:
giz082
25.
Lee
SH
,
Wray
NR
,
Goddard
ME
,
Visscher
PM.
Estimating missing heritability for disease from genome-wide association studies
.
Am J Hum Genet
2011
;
88
:
294
305
26.
Gaziano
JM
,
Concato
J
,
Brophy
M
, et al
.
Million Veteran Program: a mega-biobank to study genetic influences on health and disease
.
J Clin Epidemiol
2016
;
70
:
214
223
27.
Mahajan
A
,
Spracklen
CN
,
Zhang
W
, et al.;
FinnGen
;
eMERGE Consortium
.
Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation
.
Nat Genet
2022
;
54
:
560
572
28.
Bentley
AR
,
Callier
SL
,
Rotimi
CN.
Evaluating the promise of inclusion of African ancestry populations in genomics
.
NPJ Genom Med
2020
;
5
:
5
29.
Ni
G
,
Zeng
J
,
Revez
JA
, et al.;
Schizophrenia Working Group of the Psychiatric Genomics Consortium
;
Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium
.
A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts
.
Biol Psychiatry
2021
;
90
:
611
620
30.
Ruan
Y
,
Lin
YF
,
Feng
YA
, et al.;
Stanley Global Asia Initiatives
.
Improving polygenic prediction in ancestrally diverse populations
.
Nat Genet
2022
;
54
:
573
580
31.
Kachuri
L
,
Chatterjee
N
,
Hirbo
J
, et al.;
Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group
.
Principles and methods for transferring polygenic risk scores across global populations
.
Nat Rev Genet
2024
;
25
:
8
25
Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at https://www.diabetesjournals.org/journals/pages/license.