Polygenic Prediction of Type 2 Diabetes in Africa

OBJECTIVE Polygenic prediction of type 2 diabetes (T2D) in continental Africans is adversely affected by the limited number of genome-wide association studies (GWAS) of T2D from Africa and the poor transferability of European-derived polygenic risk scores (PRSs) in diverse ethnicities. We set out to evaluate if African American, European, or multiethnic-derived PRSs would improve polygenic prediction in continental Africans. RESEARCH DESIGN AND METHODS Using the PRSice software, ethnic-specific PRSs were computed with weights from the T2D GWAS multiancestry meta-analysis of 228,499 case and 1,178,783 control subjects. The South African Zulu study (n = 1,602 case and 981 control subjects) was used as the target data set. Validation and assessment of the best predictive PRS association with age at diagnosis were conducted in the Africa America Diabetes Mellitus (AADM) study (n = 2,148 case and 2,161 control subjects). RESULTS The discriminatory ability of the African American and multiethnic PRSs was similar. However, the African American–derived PRS was more transferable in all the countries represented in the AADM cohort and predictive of T2D in the country combined analysis compared with the European and multiethnic-derived scores. Notably, participants in the 10th decile of this PRS had a 3.63-fold greater risk (odds ratio 3.63; 95% CI 2.19–4.03; P = 2.79 × 10−17) per risk allele of developing diabetes and were diagnosed 2.6 years earlier than those in the first decile. CONCLUSIONS African American–derived PRS enhances polygenic prediction of T2D in continental Africans. Improved representation of non-European populations (including Africans) in GWAS promises to provide better tools for precision medicine interventions in T2D.

(1). Therefore, urgent strategies and resources for improving screening and early identification interventions are required to help curb this pandemic in Africa.
T2D is a multifactorial disease that is hypothesized to be increasing in prevalence due to the interaction of genetic and environmental factors (3). Although the genetic factors are stable over time, the surge in diabetes prevalence over the past decades is thought to have been caused by urbanization and the adoption of westernized lifestyles characterized by consumption of energydense foods and physical inactivity (3,4). However, diabetes has been noted to be preventable, and its onset was delayed for 15 years by diet and exercise interventions in the Diabetes Prevention Program (5). Because diet and exercise strategies are readily accessible and relatively low cost, coupling these lifestyle interventions with approaches that identify people earlier who are more susceptible to developing diabetes might effectively lower the diabetes burden. The use of polygenic risk scores (PRSs) for early identification of people who are more genetically susceptible to developing T2D is such an approach (6). Recent studies conducted in Europeans have indicated that individuals in the 10th decile have a 5.21-fold higher risk (odds ratio [OR] 5.21; 95% CI 4.94-5.49) of developing diabetes compared with those in the first decile (7). However, evidence exists of the poor transferability of European-derived polygenic scores in diverse populations. For example, Martin et al. (8) reported that European PRSs had a 4.9-fold reduced predictive value in African Americans across 17 traits. There is now a concern that African ancestry and other similarly understudied population groups may not benefit from the clinical translation efforts of these PRSs, thereby exacerbating existing health disparities (8,9).
Large multiethnic cohorts such as the Million Veteran Program improve the representation of African Americans in genome-wide association studies (GWAS) and offer a promise of enhanced polygenic prediction in this group (10). However, the representation of continental Africans in GWAS is still very low, both in the number of studies and the total number of study participants. For example, T2D GWAS with >1 million European participants are being reported, while the sample sizes of continental Africans remain under 10,000 (7,11). Therefore, continental Africans face a much worse threat than do African Americans of under-representation in precision medicine efforts for T2D (9). It has been reported that multiethnic PRSs (compared with European-only PRSs) might enhance prediction in diverse populations (12,13). However, the predictive ability of the multiethnic-derived PRSs and that of African Americans who originated mainly from the Western part of Africa and have approximately an 80% Africa admixture is yet to be evaluated in continental Africans (12,13). We set up this study to assess the predictive ability of European-, African American-, and multiethnic-derived PRSs for T2D in continental Africans.

Study Participants
Black South African participants from the Durban Case-Control Study (n 5 1,602 case subjects) who were attending a diabetes clinic in the same location in Durban as the 981 control subjects from the cross-sectional Durban Diabetes Study were aggregated and collectively regarded as the South African Zulu study, as indicated elsewhere (11,14). These individuals were older than 18 years, not pregnant, and from urban black African communities in Durban, South Africa (14). The World Health Organization criteria were used to define T2D status. The validation-study participants were from the Africa America Diabetes Mellitus (AADM) study, which has been described in detail elsewhere (15-17). The 2,148 case subjects and 2,161 control subjects from this study were enrolled at university medical centers in Nigeria (n 5 1,325 case subjects and 1,363 control subjects), Ghana (449 cases and 435 controls) and Kenya (374 cases and 363controls) (17). In this study, diabetes was defined based on an oral glucose tolerance test or pharmacological treatment of diabetes (17). Written informed consent was completed by the study participants. The respective studies were approved by relevant ethics committees under the following references: Durban Case-Control Study, BF078/08; Durban Diabetes Study, BF030/12; and AADM, 14/WM/1061).

Genotyping and Imputation
Participants in the South African Zulu study (Supplementary Table 1) were genotyped using the Illumina Multi-Ethnic Genotyping Array (Illumina, San Diego, CA). The Affymetrix Axiom PanAFR single nucleotide polymorphism (SNP) array (Thermo Fisher Scientific, Waltham, MA) or Illumina Multi-Ethnic Genotyping Array was used to genotype participants in the AADM study. Detailed quality control and imputation for these studies were performed using African whole genomes from the Uganda 2000 Genomes and the 1000 Genomes as reference panels, as has been described elsewhere (11,18). A minimum minor allele frequency threshold of 0.5% and imputation information score >0.4 was applied (11).

Statistical Analysis
PRSice 2 software was used to implement the clumping and threshold approach for developing PRSs. After sensitivity analysis, a clumping distance of 500 kb and an r 2 of 0.5 were parameters used for computing PRSs. GWAS summary statistics from the multiancestry GWAS of T2D by Vujkovic et al. (7), comprising participants representative of Europeans, African Americans, Hispanics, and Asians, were used as the base (discovery), and genotype data from the South African Zulu study and AADM were used as the target data and validation data sets, respectively, as listed in Table 1.
In the discovery analysis, multiple PRSs were computed at P value thresholds from 1 to 5 × 10 À8 of the base data set and linkage disequilibrium clumping was done using the target data set as the reference. The predictivity of these PRSs was then evaluated through linear models that adjusted for age, sex, and population stratification (five principal components). The P values of these PRSs and the Nagelkerke R 2 were evaluated to assess transferability and predictability, respectively (Supplementary Figs. 2-4). The best predictive multiethnic, African American, and European PRSs were then validated in the AADM study, as shown in Table 1 and Supplementary  Table 2.
During the validation stage, the best predictive PRSs were assessed for transferability and predictivity through the P values and Nagelkerke R 2 in linear models implemented in PRSice, which corrected for age, sex, BMI, and population stratification (five principal components), as shown in Table 1. This was first done for the whole of the AADM study and then at the country level, as shown in Fig. 1B.
The best predictive PRS from the three discovery data sets was then used to assess its risk stratification and diagnostic utility. Logistic regression models for the PRS deciles as a predictor variable were computed while correcting for age, sex, BMI, and residual population, structure using principal components (five principal components). A shape plot was computed to show the differences in risk of the PRS deciles from the first decile, as shown in Fig. 1A. Finally, a linear regression model was used to evaluate whether the age at which patients are diagnosed with diabetes (n 5 1,031) is affected by PRS in the AADM study.

Polygenic Score Development and Validation
From the linear models of the multiple PRSs generated using the PRSice software ( Supplementary Figs. 2-4), the best predictive PRS from the data on Europeans, the multiethnic group, and African Americans was significant and had the highest variance as indicated by Nagelkerke R 2 values of 0.69% (P 5 5.09 × 10 À6 ), 0.69% (P 5 3.90 × 10 À9 ), and 1.11% (P 5 4.62 × 10 À6 ), respectively ( Table 1). The best PRSs were validated in the AADM study and were noted to be all significant in a similar trend. The African American PRS had the highest predictability, indicated by a Nagelkerke R 2 of 2.92% (9.38 × 10 À24 ) in the combined analysis of the countries, as reported in Table 1.

PRS Stratification and Transferability in African Countries
The participants in the 10th decile of the African American-derived PRS had a more than threefold higher risk for developing T2D per risk allele, compared with those in the first decile in the AADM study (OR 3.63; 95% CI 2.19-4.03; P 5 2.79 × 10 À17 ) (Fig. 1A). On average, participants in the 10th decile of the African American PRS in the AADM study were diagnosed with T2D 2.6 years earlier (b 5 À2.61; P 5 0.046) than participants in the first decile (Fig. 2B). The African American PRS was transferable in all countries compared with the multiethnic PRS that was not in Kenya. The PRS predictability (indicated by Nagelkerke R 2 ) varied greatly between the East Africa country of Kenya and two West Africa countries, Ghana and Nigeria, where predictability was much higher for both the African American and the multiethnic PRSs.

Discriminatory Ability of the PRS
The model with the conventional risk factors of age, BMI, five principal components, and sex had an area under the curve (AUC) C-statistic of 67.9%, whereas that of the African American PRS, five PCs, age, BMI, and sex was 69.8% (Fig.  2), which was almost similar to the multiethnic PRS of multiethnic of 69.9%. Therefore, there was improved discriminatory ability by 1.9%, with the addition of the African American PRS to the conventional risk factors.

CONCLUSIONS
We set out to assess the predictive value of T2D PRS in continental Africans. We compared the polygenic prediction of African American, European, and multiethnic PRSs for T2D in continental Africans. The PRS with the best prediction was derived from an African American restricted GWAS (7). Participants in the 10th decile of this PRS had a more than threefold increased risk of developing Limited studies of candidate SNP PRS have been performed on data from continental Africans. Previously, we reported a genetic risk score with weights from Europeans that was associated with an OR of 1.21 (95% CI 1.02-1.43) for T2D in Black South Africans (19). This genetic risk score had an AUC of 0.665, together with conventional risk factors for T2D (19). However, this study was limited due to the small sample size (n 5 356), the availability of only genotyped SNPs, and the use of weights that were derived from European-only studies. In the present study, we have substantially expanded the sample size (n 5 2,383), enhanced genome coverage by imputing to 1000 Genomes and local African Ancestry whole genomes (18), and used a multiethnic discovery data set GWAS that included 1.4 million individuals, including people of African American ancestry. We performed a country-level analysis that showed less variable predictability within regional countries in West Africa (Ghana and Nigeria) and greater variability when comparing with other countries from other regions, such as Kenya in East Africa. This phenomenon is suggestive of the usefulness of regional PRSs in Africa. However, this will need to be validated by additional studies.
Nonetheless, polygenic predictions of European-derived PRSs in Europeans are still higher than that of the African Americans in continental Africans (7). Notably, participants in the top decile of a European-derived PRS have recently been reported to have a greater than fivefold risk for developing T2D than those in the first decile in Europeans (7). In our study, failure to reach predictions denoted in Europeans might be because the African American-derived PRSs are from an admixed population group that is not representative of the genetic diversity and linkage disequilibrium patterns of continental Africans (13,20). In addition, vast improvements in sizes of the European cohorts that are now >1 million individuals is indicative of substantial power compared with African diabetes cohorts that are still below the 10,000 mark (21). More investments are required to increase the representation of continental Africans in GWAS of T2D.
Recently, it was reported that the multiancestry PRS outperforms the population-specific ones from Europeans and East Asians (22). However, this phenomenon is yet to be validated in continental Africans. Considering that 80% of GWAS have been done in Europeans, most multiancestry GWAS meta-analyses are biased toward this population group (8). Marquez-Luna et al. (12) combined the training and the target data set summary statistics to compute the PRS and then showed that the multiethnic PRSs improve prediction in diverse populations. However, this approach is not widely accepted, and more research is still required to validate if the multiethnic PRS outperforms the population-specific PRS for all the ancestries (23,24). In our study, the African American and multiethnic PRSs had similar discriminatory abilities. However, the African American PRS was slightly more predictive than the multiancestry PRS for the combined AADM study and, with improved representations of Africans, these predictions might increase in the future. In addition, the countrystratified analyses also indicated that the multiancestry PRS was not transferable to participants from Kenya. The failure to tag the causal variant due to differences in allele frequencies, linkage disequilibrium patterns, and heterogeneity of effect sizes is a potential reason for the limited predictivity of multiancestry meta-analysis of continental Africans, who have greater genetic diversity (25)(26)(27).
The utility of PRSs is an issue of paramount importance for clinical translation diabetesjournals.org/care Chikowore and Associates (6). The African American PRS, though it was predictive for T2D in continental Africans, only improved the AUC of conventional risk factors by 1.9%, and when combined with principal components, its AUC was 69.8%, and that of the conventional risk factors was 67.9%. Similarly, in a Swedish T2D study, the Europeanderived PRS increased the AUC by 1%, compared with conventional risk factors (28). However, the use of AUC as a measure to evaluate the clinical utility of polygenic prediction is being debated, because AUC is regarded a less-sensitive metric (29). There are ongoing efforts to develop better metrics (30). Nonetheless, findings from this study that people with T2D and a high PRS are typically diagnosed with diabetes at an earlier age and have a 3.6-fold risk of developing diabetes are of clinical importance. They may be useful in the prevention and treatment of diabetes. Our study was limited by the sparse number of T2D GWAS in continental Africans. Nonetheless, the African American-derived PRS improved disease classification in this population. The clumping and thresholding approach used to compute the genome-wide PRS did not account for environmental factors such as diet and exercise that might confound the predictive accuracy of these measures. The strengths of our study include validation of the African American PRS in the AADM study and that we used GWAS summary statistics of varied ethnicities from the same study, which minimized bias due to genotyping and GWAS designs.
In summary, an African American-derived PRS seems to be the best predictor of T2D in continental Africans compared with European and multiethnic PRSs. More studies are required to determine whether using continental African GWAS might further enhance these predictions and reach a similar accuracy as in Europeans. Although the PRS prediction of diabetes had low specificity and sensitivity, patient stratification by PRS may prove clinically useful.