Obesity is an increasingly common disorder that predisposes to several medical conditions, including type 2 diabetes. We investigated whether large and rare copy-number variations (CNVs) differentiate moderate to extreme obesity from never-overweight control subjects.
Using single nucleotide polymorphism (SNP) arrays, we performed a genome-wide CNV survey on 430 obese case subjects (BMI >35 kg/m2) and 379 never-overweight control subjects (BMI <25 kg/m2). All subjects were of European ancestry and were genotyped on the Illumina HumanHap550 arrays with ∼550,000 SNP markers. The CNV calls were generated by PennCNV software.
CNVs >1 Mb were found to be overrepresented in case versus control subjects (odds ratio [OR] = 1.5 [95% CI 0.5–5]), and CNVs >2 Mb were present in 1.3% of the case subjects but were absent in control subjects (OR = infinity [95% CI 1.2–infinity]). When focusing on rare deletions that disrupt genes, even more pronounced effect sizes are observed (OR = 2.7 [95% CI 0.5–27.1] for CNVs >1 Mb). Interestingly, obese case subjects who carry these large CNVs have moderately high BMI and do not appear to be extreme cases. Several CNVs disrupt known candidate genes for obesity, such as a 3.3-Mb deletion disrupting NAP1L5 and a 2.1-Mb deletion disrupting UCP1 and IL15.
Our results suggest that large CNVs, especially rare deletions, confer risk of obesity in patients with moderate obesity and that genes impacted by large CNVs represent intriguing candidates for obesity that warrant further study.
Obesity has become the most common health disorder worldwide. Obesity predisposes to multiple diseases, particularly diabetes, and it has been estimated that life expectancy may diminish in the next generation as a result (1). Numerous studies have shown that body weight and obesity are strongly influenced by genetic factors, with heritability estimates in the range of 65–80% (2). However, single gene mutations are quite rare, and common variation (e.g., in FTO [3] and MC4R [4]) account for a small percentage of familial risk. Recent large-scale meta-analysis of genome-wide association studies (GWASs) identified six additional genes that associate with BMI, but all eight genes collectively explain merely 0.84% of the BMI variation in human populations (5). Therefore, it is unlikely that expansion of sample sizes in GWASs will identify common variants with major effect sizes.
The examination of copy-number variations (CNVs) offers novel insights into the genetic architecture of common and complex human diseases. CNVs are defined as a chromosomal segment whose copy number varies across individuals in the population (6). Recurrent CNVs such as 16p11.2 deletions were reported to account for 0.7% of morbid obesity cases (7). In addition, several reports demonstrated that large and rare CNVs collectively associate with schizophrenia (8,–10), extreme early-onset obesity (11), and variation in BMI (12).
In the current study, we investigated the potential role of rare variants in obesity, by performing comparative CNV analysis on obese case and control subjects who were genotyped by Illumina single nucleotide polymorphism (SNP) arrays. Case subjects had moderate to extreme obesity, and control subjects had never been overweight. Although our sample size precludes the definitive identification of specific CNVs/genes that associate with obesity, we demonstrate that large, yet rare, CNVs, as a group, are collectively associated with obesity. Furthermore, we identified previously implicated obesity candidate genes in some of these large and rare CNVs, making them especially attractive for additional follow-up studies and functional assays.
RESEARCH DESIGN AND METHODS
Obese case and control subjects.
The case subjects were obese (BMI ≥35 kg/m2) with a lifetime BMI >40 kg/m2. Independent control subjects were selected who had a current and lifetime BMI ≤25 kg/m2. All the case and control subjects who participated in the current study were part of a previous candidate gene study (13). Sample characteristics are summarized in Table 1 for 430 case and 379 control subjects passing quality control. The median age at obesity onset was 12 years, and 90% had an onset prior to age 26 years. All subjects gave informed consent, and the protocol was approved by the committee on studies involving human beings at the University of Pennsylvania.
. | n . | Minimum . | Maximum . | Mean . | SD . |
---|---|---|---|---|---|
Control subjects | |||||
Age | 379 | 16 | 65 | 42.80 | 8.92 |
BMI | 379 | 16 | 25 | 20.72 | 1.82 |
Percent fat | 369 | 7 | 40 | 23.77 | 5.43 |
Case subjects | |||||
Age | 430 | 18 | 64 | 40.88 | 9.34 |
BMI | 430 | 35 | 97 | 49.24 | 8.79 |
Percent fat | 388 | 31 | 71 | 49.85 | 5.89 |
Onset age | 344 | 0 | 55 | 13.72 | 9.00 |
. | n . | Minimum . | Maximum . | Mean . | SD . |
---|---|---|---|---|---|
Control subjects | |||||
Age | 379 | 16 | 65 | 42.80 | 8.92 |
BMI | 379 | 16 | 25 | 20.72 | 1.82 |
Percent fat | 369 | 7 | 40 | 23.77 | 5.43 |
Case subjects | |||||
Age | 430 | 18 | 64 | 40.88 | 9.34 |
BMI | 430 | 35 | 97 | 49.24 | 8.79 |
Percent fat | 388 | 31 | 71 | 49.85 | 5.89 |
Onset age | 344 | 0 | 55 | 13.72 | 9.00 |
SNP genotyping.
DNA was extracted from whole blood or lymphoblastoid cell lines using a high-salt method and genotyped on the Illumina HumanHap550 SNP arrays (Illumina, San Diego, CA). Standard Illumina data normalization procedures and canonical genotype clustering files were used to process the genotyping signals. All case and control subjects passed call-rate (>95%) measures and were genetically inferred to be of European ancestry, based on multidimensional scaling analysis (supplementary Fig. 1 in the online appendix, available at http://diabetes.diabetesjournals.org/cgi/content/full/db10-0192/DC1).
CNV calling.
Using log R ratio and B allele frequency measures for all markers, the CNV calls were generated by PennCNV software (Version 2009Aug27) (14). The quality-control procedure was described in detail in supplementary Fig. 2. We removed samples with low quality of signal intensity values, so that the remaining samples have log R ratio <0.3, B allele frequency_drift <0.01, wave factor <0.05, and that the number of calls is <50. We removed CNV calls with <10 SNPs or with a confidence score <10, sparse calls (average intermarker distance >50 kb), calls in the immunoglobulin regions, and calls in centromeric regions and telomeric regions (100 kb within the start or end of the chromosomes). The overlapping genes or exons for CNV calls were annotated using the scan_region.pl program, based on RefSeq gene annotation (15). We compiled a set of common CNV regions (cCNVRs), which occur at >1% frequency, and then classified the CNV call as common or rare by the scan_region.pl program: if >50% of a CNV call overlaps with a cCNVR, it is referred to as a common CNV. The comparison of number of CNV calls in case versus control subjects was performed by t test, while the comparison of fraction samples with large CNVs was performed by the Fisher exact test.
RESULTS
CNV calling and quality control.
To examine whether CNVs represent genetic risk factors for obesity, we analyzed CNV calls on 430 obese case and 379 control subjects who were genotyped by Illumina SNP arrays and passed quality-control measures for CNV analysis. The sample characteristics were described in Table 1. We first compared the general characteristics of CNV calls between case and control subjects. The number of CNVs per subject did not differ between case and control subjects (5.8 ± 3.3 vs. 6.0 ± 3.1, P = 0.35). The number of gene-disrupting CNVs per subject is similar in case versus control subjects (3.8 ± 3.1 vs. 4.2 ± 2.7, P = 0.07). Similarly, the number of exonic CNVs per subject is similar in case versus control subjects (3.2 ± 2.9 vs. 3.6 ± 2.6, P = 0.06). We compiled a list of common CNV regions and found that 38.2% of CNV calls can be classified as rare CNVs. The number of rare CNVs per subject did not differ between obese case and control subjects (2.3 ± 2.4 vs. 2.2 ± 1.8, P = 0.41).
Large CNVs are overrepresented in obese case subjects.
We next performed comparative analysis on CNV calls stratified by their sizes, common/rare status, and deletion/duplication status. Interestingly, with the increasing size thresholds, we observe a stronger trend of association (odds ratio [OR]) between CNVs and obesity (Table 2). Similar to previous reports in schizophrenia cases (8), we found that 5/427 (1.2%) of the case subjects but none of the control subjects carry CNVs >2 Mb (OR = infinity [95% CI 1.16 to infinity]), P = 0.04). The frequency of obese case subjects carrying CNVs >2 Mb in our study are similar to the Kirov et al. study (16) (6 of 471, 1.3%) and the Need et al. study (8) (14 of 1,013, 1.4%) on schizophrenia cases. Among five CNVs >2 Mb observed in our study, three are deletions and two are duplications. We listed all 16 CNVs >1 Mb in case and control subjects in Table 2, and the signal intensity patterns are provided in supplementary Fig. 3 as a visual means of validation. We also assessed whether large and rare gene-disrupting deletion CNVs tend to be enriched in case versus control subjects. Not surprisingly, the ORs for conferring risk of obesity are even higher for this group of CNVs (2.7 [0.47–27.1] for >1 Mb CNVs, infinity for >2 Mb CNVs) (Table 2), though this does not reach statistical significance due to the rare nature of the events.
Size . | n (case subjects with CNVs) . | n (controls with CNVs) . | OR (95% CI) . | n (case subjects with gene-disrupting deletion) . | n (control subjects with gene-disrupting deletion) . | OR (95% CI) . |
---|---|---|---|---|---|---|
>100 kb | 352 | 313 | 0.99 (0.65–1.39) | 88 | 66 | 1.22 (0.84–1.77) |
>500 kb | 34 | 30 | 1.01 (0.46–1.37) | 9 | 6 | 1.33 (0.42–4.58) |
>1 Mb | 10 | 6 | 1.49 (0.48–5.00) | 6 | 2 | 2.67 (0.47–27.1) |
>2 Mb | 5 | 0 | Infinity (1.16 to infinity) | 3 | 0 | Infinity (0.69 to infinity) |
>5 Mb | 1 | 0 | Infinity (0 to infinity) | 0 | 0 | Infinity |
Size . | n (case subjects with CNVs) . | n (controls with CNVs) . | OR (95% CI) . | n (case subjects with gene-disrupting deletion) . | n (control subjects with gene-disrupting deletion) . | OR (95% CI) . |
---|---|---|---|---|---|---|
>100 kb | 352 | 313 | 0.99 (0.65–1.39) | 88 | 66 | 1.22 (0.84–1.77) |
>500 kb | 34 | 30 | 1.01 (0.46–1.37) | 9 | 6 | 1.33 (0.42–4.58) |
>1 Mb | 10 | 6 | 1.49 (0.48–5.00) | 6 | 2 | 2.67 (0.47–27.1) |
>2 Mb | 5 | 0 | Infinity (1.16 to infinity) | 3 | 0 | Infinity (0.69 to infinity) |
>5 Mb | 1 | 0 | Infinity (0 to infinity) | 0 | 0 | Infinity |
Multiple large and rare CNVs disrupt obesity candidate genes.
We next examined CNVs >1 Mb and found several genes that are a priori candidates for obesity. Two of the strongest candidates are UCP1 and IL15, which are located within the same 2.1-Mb deletion on chromosome 4q31 (Fig. 1). The case carrying this CNV has moderate obesity (BMI 46.2 kg/m2). Numerous studies relate UCP1 to obesity in animal models (17), and associations have been reported in humans (18). We validated this CNV by a CNV-typing platform, the Affymetrix Cytogenetic arrays (supplementary Fig. 4). Since parental DNA is also available, we assessed both parents and found that the CNV is inherited from the father. Another large CNV on chromosome 4q22.1 contains two potential candidate genes (NAP1L5 and SNCA), and it is present in a subject with moderate obesity (BMI 49.0 kg/m2) (Fig. 2). NAP1L5 is an imprinted gene, which is of interest because of associations of body weight and obesity with genomic imprinting (19). Differences in paternal and maternal copies of this gene have been related to body weight at birth and in adulthood in mice (20). We validated this CNV by the Affymetrix Cytogenetic platform (supplementary Fig. 4) and also found that the CNV is inherited from the father. SNCA is another gene within this CNV that has been reported to have interactive effects on response to a high-fat diet in dietary obesity (21), yet SNCA duplication is a well-known risk factor for Parkinson's disease. Several other candidate genes, such as CTSC, NOX4, DLG2, ME3, and MIPEP, are also found within the collection of rare CNVs in case subjects (Table 3). We acknowledge that this list is relatively small and that none of them occur twice in case subjects; as a result, we detected the collective association with obesity but cannot identify specific CNVs/genes that are more penetrant than others. Finally, we also did an exploratory examination to determine whether some CNVs are unique to the extremely obese case subjects. We chose a BMI threshold of 70 kg/m2, which doubles the minimum entry criteria for case subjects. However, compared with case subjects with moderate obesity, the extremely obese case subjects do not appear to have larger CNVs or more well-characterized candidate genes.
Region (NCBI 36) . | Number of SNPs . | Length . | Type . | Phenotype . | BMI . | Gene . |
---|---|---|---|---|---|---|
chr2:106245033–107807545 | 279 | 1,562,513 | Del | Case | 52.09 | LOC729121, PLGLA, RGPD3, ST6GAL2 |
chr2:137328699–138602350 | 273 | 1,273,652 | Dup | Case | 64.37 | HNMT, THSD7B |
chr4:89822108–93149947 | 616 | 3,327,840 | Del | Case | 49.03 | FAM13A, FAM13AOS, FAM190A, GPRIN3, HERC3, MMRN1, NAP1L5, SNCA, TIGD2, TMSL3 |
chr4:141598764–143656669 | 403 | 2,057,906 | Del | Case | 46.16 | ELMOD2, IL15, INPP4B, RNF150, TBC1D9, UCP1, ZNF330 |
chr10:41756307–42943818 | 138 | 1,187,512 | Dup | Case | 40.59 | BMS1, LOC441666, LOC84856, RET, ZNF33B, ZNF37B |
chr11:84695124–86095201 | 315 | 1,400,078 | Dup | Case | 38.33 | C11orf73, CCDC81, CCDC83, CCDC89, CREBZF, DLG2, EED, ME3, PICALM, SYTL2, TMEM126A, TMEM126B |
chr11:86463458–91574130 | 909 | 5,110,673 | Dup | Case | 38.33 | CHORDC1, CTSC, FOLH1B, GRM5, LOC729384, NAALAD2, NOX4, RAB38, TMEM135, TRIM49, TRIM53, TRIM64, TRIM64B, TRIM77, TYR, UBTFL1 |
chr13:22153141–24201255 | 649 | 2,048,115 | Dup | Case | 36.94 | ATP12A, C1QTNF9, C1QTNF9B, LOC374491, MIPEP, MIR2276, PARP4, PCOTH, SACS, SGCG, SPATA13, TNFRSF19 |
chr16:15032942–16197033 | 201 | 1,164,092 | Del | Case | 45.37 | ABCC1, ABCC6, C16orf45, C16orf63, KIAA0430, MIR484, MPV17L, MYH11, NDE1, NTAN1, PDXDC1, RRN3 |
chr16:80739605–82222770 | 822 | 1,483,166 | Del | Case | 43.26 | CDH13, MPHOSPH6 |
chr18:4663080–6830148 | 525 | 2,167,069 | Del | Case | 45.12 | ARHGAP28, C18orf18, EPB41L3, L3MBTL4, LOC339290, LOC642597, TMEM200C, ZFP161 |
chr2:146325342–147328577 | 106 | 1,003,236 | Dup | Control | 22.48 | PABPCP2 |
chr5:103809862–104873156 | 173 | 1,063,295 | Dup | Control | 21.76 | RAB9P1 |
chr5:103816450–104873156 | 172 | 1,056,707 | Dup | Control | 21.16 | RAB9P1 |
chr7:67306086–68350057 | 204 | 1,043,972 | Del | Control | 20.13 | (not gene disrupting) |
chr16:15032942–16197033 | 201 | 1,164,092 | Del | Control | 20.61 | ABCC1, ABCC6, C16orf45, C16orf63, KIAA0430, MIR484, MPV17L, MYH11, NDE1, NTAN1, PDXDC1, RRN3 |
chr17:14063278–15411904 | 461 | 1,348,627 | Del | Control | 20.12 | CDRT15, CDRT4, FAM18B2, HS3ST3B1, MGC12916, PMP22, TEKT3 |
Region (NCBI 36) . | Number of SNPs . | Length . | Type . | Phenotype . | BMI . | Gene . |
---|---|---|---|---|---|---|
chr2:106245033–107807545 | 279 | 1,562,513 | Del | Case | 52.09 | LOC729121, PLGLA, RGPD3, ST6GAL2 |
chr2:137328699–138602350 | 273 | 1,273,652 | Dup | Case | 64.37 | HNMT, THSD7B |
chr4:89822108–93149947 | 616 | 3,327,840 | Del | Case | 49.03 | FAM13A, FAM13AOS, FAM190A, GPRIN3, HERC3, MMRN1, NAP1L5, SNCA, TIGD2, TMSL3 |
chr4:141598764–143656669 | 403 | 2,057,906 | Del | Case | 46.16 | ELMOD2, IL15, INPP4B, RNF150, TBC1D9, UCP1, ZNF330 |
chr10:41756307–42943818 | 138 | 1,187,512 | Dup | Case | 40.59 | BMS1, LOC441666, LOC84856, RET, ZNF33B, ZNF37B |
chr11:84695124–86095201 | 315 | 1,400,078 | Dup | Case | 38.33 | C11orf73, CCDC81, CCDC83, CCDC89, CREBZF, DLG2, EED, ME3, PICALM, SYTL2, TMEM126A, TMEM126B |
chr11:86463458–91574130 | 909 | 5,110,673 | Dup | Case | 38.33 | CHORDC1, CTSC, FOLH1B, GRM5, LOC729384, NAALAD2, NOX4, RAB38, TMEM135, TRIM49, TRIM53, TRIM64, TRIM64B, TRIM77, TYR, UBTFL1 |
chr13:22153141–24201255 | 649 | 2,048,115 | Dup | Case | 36.94 | ATP12A, C1QTNF9, C1QTNF9B, LOC374491, MIPEP, MIR2276, PARP4, PCOTH, SACS, SGCG, SPATA13, TNFRSF19 |
chr16:15032942–16197033 | 201 | 1,164,092 | Del | Case | 45.37 | ABCC1, ABCC6, C16orf45, C16orf63, KIAA0430, MIR484, MPV17L, MYH11, NDE1, NTAN1, PDXDC1, RRN3 |
chr16:80739605–82222770 | 822 | 1,483,166 | Del | Case | 43.26 | CDH13, MPHOSPH6 |
chr18:4663080–6830148 | 525 | 2,167,069 | Del | Case | 45.12 | ARHGAP28, C18orf18, EPB41L3, L3MBTL4, LOC339290, LOC642597, TMEM200C, ZFP161 |
chr2:146325342–147328577 | 106 | 1,003,236 | Dup | Control | 22.48 | PABPCP2 |
chr5:103809862–104873156 | 173 | 1,063,295 | Dup | Control | 21.76 | RAB9P1 |
chr5:103816450–104873156 | 172 | 1,056,707 | Dup | Control | 21.16 | RAB9P1 |
chr7:67306086–68350057 | 204 | 1,043,972 | Del | Control | 20.13 | (not gene disrupting) |
chr16:15032942–16197033 | 201 | 1,164,092 | Del | Control | 20.61 | ABCC1, ABCC6, C16orf45, C16orf63, KIAA0430, MIR484, MPV17L, MYH11, NDE1, NTAN1, PDXDC1, RRN3 |
chr17:14063278–15411904 | 461 | 1,348,627 | Del | Control | 20.12 | CDRT15, CDRT4, FAM18B2, HS3ST3B1, MGC12916, PMP22, TEKT3 |
Underlining indicates DNA from cell line. All others are from blood.
Examination of previously reported obesity-associated CNVs.
An association between BMI and a chromosome 10q11 CNV was recently reported in a Chinese cohort (12). We observed three case subjects carrying this CNV (BMI 36, 41, and 43 kg/m2, respectively), but it is not present in control subjects. Two genes in this region are GPRIN2 and PPYR1, which are worthy of follow-up studies in larger sample sets. Additionally, a highly penetrant deletion on 16p11.2 was recently reported to be associated with obesity (7,11). In our data, one obese subject (BMI 44.9 kg/m2) carries this deletion and one control subject (BMI 19.1 kg/m2) carries the reciprocal duplication. Therefore, our data are consistent with the possibility that the 16p11.2 deletion is associated with obesity.
DISCUSSION
In the current study, we assayed a sample collection of obese case subjects and never-overweight control subjects and found strong support that large and rare CNVs contribute to obesity. Collectively, the OR for large CNVs observed in our study is higher than common SNPs identified in GWASs (for example, the OR for FTO is 1.3 [3] and for MC4R in severe childhood obesity is 1.3 [4]), suggesting that rare CNVs may represent more penetrant risk factors for obesity.
One interesting implication of our study relates to the hypothesized genetic architecture of obesity. Although it is well known that obesity results from multiple genetic risk factors as well as environmental factors, it is not clear what and how many genetic risk factors are involved. Recent GWASs identified a few obesity genes, but they collectively only explain a minor fraction of interindividual differences in obesity (5). Therefore, even though more common susceptibility variants may be identified by increasing sample size, they will be very unlikely to account for a significant proportion of genetic risk. On the other hand, our study suggests that rare variants with much higher ORs may also contribute to risk of obesity. Given the rare nature of the CNVs, we could not discern which one of these large CNVs are truly causal for obesity, so some less penetrant or noncausal large CNVs will dilute the effect sizes. Therefore, the observed effect sizes for large CNVs may represent underestimation of the true effect size of causal CNVs for obesity.
Another interesting implication is how quantitative genetics relates to disease phenotypes. How distinct alleles, including modest-effect alleles and major-effect alleles, may interact to shape disease presentation is not well studied. For obesity, although FTO represents consistently the strongest gene in many association studies, it has never been implicated from studies of monogenic forms of obesity. Similarly, although MC4R has been implicated in monogenic forms of obesity, analysis of common variants have been highly inconsistent until large-scale GWASs are conducted (4). Therefore, it is likely that rare alleles work together with common alleles to shape the onset of obesity in human populations and that some genes with rare causal alleles may never show up from studies on common variants.
In conclusion, we have identified large, yet rare, CNVs representing major risk factors for obesity. Some of these large CNVs encompass known obesity genes or potential candidate genes for follow-up studies. Our results further suggested that studies of monogenic forms of complex disorders, studies of common variants in GWASs, and studies of CNVs represent three complementary approaches to research the genetic basis of complex diseases.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
ACKNOWLEDGMENTS
This work was supported in part by National Institutes of Health Grants R01DK44073, R01DK56210, and R01DK076023 (to R.A.P.) and a Scientist Development Grant (0630188N) from the American Heart Association (to W.D.L.). Genome-wide genotyping was funded in part by an Institutional Development Award to the Center for Applied Genomics (to H.H.) from the Children's Hospital of Philadelphia.
No potential conflicts of interest relevant to this article were reported.
K.W. researched data and wrote the manuscript. W.-D.L. researched data and edited the manuscript. J.T.G., S.F.A.G., and H.H. generated genotype data and contributed to the discussion. R.A.P. designed the study, collected samples, and edited the manuscript.
We thank all the case and control subjects who donated blood samples for genetic research purposes.