Perry et al. (1) performed a pathway-based approach aiming to identify biological pathways associated with type 2 diabetes. They used genome-wide association (GWA) data from the type 2 diabetes study in the U.K. Wellcome Trust Case Control Consortium (WTCCC) for the initial analysis and validated the findings with data from the Diabetes Genetics Initiative (DGI) and Finland–United States Investigation of NIDDM Genetics (FUSION) studies. The Wnt signaling pathway was the most strongly associated, and they therefore postulated this was the most interesting candidate pathway. However, after correcting for multiple testing, none of the top-ranking pathways reached statistical significance. Perry et al. concluded that type 2 diabetes genes are likely to reside in multiple pathways.
We recently performed comparable genome-wide pathway analysis in two of the three GWA datasets used by Perry et al. (the WTCCC and DGI) and found overlapping but also different results to theirs (2). However, we encountered several problems using these pathway methods. Our main conclusion is therefore that pathway-based approaches have many limitations that need to be addressed before these methods can be used to provide accurate results and conclusions can be drawn.
First, in classification systems like Kyoto Encyclopedia of Genes and Genomes (KEGG) or BioCarta, the majority of human genes are currently not sorted on any pathway. Of the 18 type 2 diabetes susceptibility loci recently identified, only 5 (CDKN2A-2B, PPARG, NOTCH2, VEGFA, and TCF7L2) could be assigned to known biological pathways. In addition, β-cell function, one of the mechanisms suggested to underlie type 2 diabetes, has not been specifically described as a pathway in either KEGG or BioCarta. Thus, although type 2 diabetes genes may well play a role in multiple pathways, we feel that this conclusion cannot be drawn based on the results from pathway-based analyses.
Second, as Perry et al. discuss, larger pathways are favored to become significantly overrepresented in pathway analysis. This is due to the statistical attribute that the power of tests increases as the numbers for comparison become larger, which is the case in analyzing lager pathways. One of the top associated pathways in both our study and that of Perry et al. is the Wnt signaling pathway, which comprises many genes. It is therefore highly likely to become statistically overrepresented in pathway analyses. We analyzed 30 randomly selected sets of genes, encompassing around 1,500 genes per set, and in 16 of the 30 sets the Wnt signaling pathway was in the list of the top 10 ranked pathways, and in 5 of the 30 sets it was even ranked in the top 3.
We would like to emphasize that the limitations of pathway-based analyses in GWA data should be kept in mind when drawing conclusions based on overrepresented pathways.
Acknowledgments
No potential conflicts of interest relevant to this article were reported.