Our previous gene expression microarray studies identified a number of genes differentially expressed in patients with type 1 diabetes (T1D) and islet autoantibody-positive subjects. This study was designed to validate these gene expression changes in T1D patients and to identify gene expression changes in diabetes complications.
We performed high-throughput real-time RT-PCR to validate gene expression changes in peripheral blood mononuclear cells (PBMCs) from a large sample set of 928 T1D patients and 922 control subjects.
Of the 18 genes analyzed here, eight genes (S100A8, S100A9, MNDA, SELL, TGFB1, PSMB3, CD74, and IL12A) had higher expression and three genes (GNLY, PSMA4, and SMAD7) had lower expression in T1D patients compared with control subjects, indicating that genes involved in inflammation, immune regulation, and antigen processing and presentation are significantly altered in PBMCs from T1D patients. Furthermore, one adhesion molecule (SELL) and three inflammatory genes mainly expressed by myeloid cells (S100A8, S100A9, and MNDA) were significantly higher in T1D patients with complications (odds ratio [OR] 1.3–2.6, adjusted P value = 0.005–10−8), especially those patients with neuropathy (OR 4.8–7.9, adjusted P value <0.005).
These findings suggest that inflammatory mediators secreted mainly by myeloid cells are implicated in T1D and its complications.
Gene expression profiling has been widely applied to the identification of differentially expressed genes in human diseases, including type 1 diabetes (T1D). Understanding the patterns of expressed genes is expected to provide insight into disease pathogenic mechanisms, therapy evaluation, or biomarkers for risk prediction. Although much effort has been devoted toward discoveries with respect to gene expression profiling in human T1D in the last decade (1–5), previous studies had serious limitations. Microarray-based gene expression profiling is a powerful discovery platform, but the results must be validated by an alternative technique such as real-time RT-PCR. Unfortunately, few of the previous microarray studies on T1D have been followed by a validation study. Furthermore, most previous gene expression studies had small sample sizes (<100 subjects in each group) that are not adequate for the human population given the expectation of large expression variations among individual subjects. Finally, the selection of appropriate reference genes for normalization of quantitative real-time PCR has a major impact on data quality. Most of the previous studies have used only a single reference gene for normalization. Ideally, gene transcription studies using real-time PCR should begin with the selection of an appropriate set of reference genes to obtain more reliable results (6–8).
We have previously carried out extensive microarray analysis and identified >100 genes with significantly differential expression between T1D patients and control subjects. Most of these genes have important immunological functions and were found to be upregulated in autoantibody-positive subjects, suggesting their potential use as predictive markers and involvement in T1D development (2). In this study, real-time RT-PCR was performed to validate a subset of the differentially expressed genes in a large sample set of 928 T1D patients and 922 control subjects. In addition to the verification of the gene expression associated with T1D, we also identified genes with significant expression changes in T1D patients with diabetes complications.
RESEARCH DESIGN AND METHODS
Human subjects and samples
This study was approved by the institutional review board (IRB) of the Georgia Regents University, and informed consent was obtained from every subject or his/her legally authorized representative. Diagnosis of T1D was made using the criteria of the American Diabetes Association. Tempus RNA tubes were used to collect peripheral blood for RNA extraction. The subjects in this study are Caucasian and were recruited from Georgia, U.S., mainly in the Atlanta and Augusta areas. The sex and age distributions, age of onset for the patients, presence or absence of a first-degree relative with the disease, duration of diabetes, complications of diabetes, as well as genetic risk information for subjects have been summarized in Supplementary Table 1.
High-throughput real-time RT-PCR
Peripheral blood (2.5 mL) was immediately preserved in Tempus RNA tubes. After 2 h at room temperature, the tubes were frozen at −80°C and total RNA was extracted within a few weeks. Extracted RNA was stored at −80°C until use. An aliquot of total RNA for each sample was arrayed in 96-well plates and then converted to cDNA using a High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). The cDNA products were diluted, and an aliquot of cDNA equivalent to 10 ng of total RNA was used for quantitative real-time PCR performed using ready-to-use TaqMan gene expression assays from Applied Biosystems.
A total of 1,850 cDNA samples were arrayed in five 384-well plates, each plate containing 185 T1D and 185 autoantibody-negative (AbN) samples in addition to a serial dilution of a common cDNA sample for the establishment of a standard curve. A total of 24 assays, including 18 target genes, 3 “negative control” genes, and 3 reference or housekeeping genes, were analyzed by real-time PCR using TaqMan assay (Applied Biosystems) (Supplementary Table 2). Real-time PCR was performed in 384-well plates with the ABI 7900HT Fast Real-Time PCR System. Standard thermal cycling conditions (10 min at 95°C, 40 cycles for 15 s at 95°C, and 1 min at 60°C) were used for all genes. Cycle threshold (CT) values and quantity (Q) values, calculated from CT values based on a standard curve, for each test gene were obtained for each sample using the SDS2.3 software.
Statistical analyses
All statistical analyses were performed using the R language and environment for statistical computing (R version 2.12.1; R Foundation for Statistical Computing, www.r-project.org). There was a small proportion (<5%) of cDNA samples that failed to amplify for each gene, and those data were removed before statistical analyses. Plate-plate variations were normalized by using a standard curve on each 384-well plate. After selection and validation, the combination of GAPDH, B2M, and ABL genes was used as reference genes for normalization.
The pairwise correlation between individual gene expression levels was computed using the Pearson correlation coefficient. Linear regression of gene expression level with age as a covariate on data stratified by sex and disease status was completed to assess the effect of age on the expression level of each studied gene. Since there was a significant effect of age and sex on the expression levels of many genes, case-control matching was performed with respect to age and sex using the “matching” R package (9). To estimate the relative risk of diabetes at different gene expression levels, we performed conditional logistic regression on matched paired data. The odds ratios (ORs) and 95% CIs were computed for each gene. To investigate the association between gene expression levels and the diabetes complications, the samples from the T1D patients with a particular complication were compared with the samples from the T1D patients without any complication. Matching was performed with respect to age, sex, and duration of diabetes. Conditional logistic regression was performed to estimate the relative risk of a particular complication at different expression levels. For analyses using gene expression as a continuous variable, the data were log transformed and scaled to unit SD. For analyses using gene expression level as categorical variables, the values of “low” or “high” were assigned using the middle value (median) in control subjects as a cutoff point.
RESULTS
Gene expression changes associated with diabetes
Fourteen genes were selected for validation studies based on our previous study (2), and these genes showed higher expression in both T1D and autoantibody-positive groups as compared with the AbN control group. Four other genes (TGFB1, NFKB1, Foxp3, and IL12A) were selected based on their potential relevance to T1D. Three genes (TSC22D3, RPL17, and EEF1B2) that were not significant on our microarray dataset were included as negative controls. All genes were analyzed in a large sample set that included 928 T1D patients and 922 AbN control subjects using TaqMan assays. Box plots of the expression levels in different subject groups are shown in Fig. 1. Before examining the differences between AbN and T1D groups, we determined whether sex and sampling age of the subjects were potential confounding factors. We performed regression of gene expression levels with age as a covariate. Data for each gene were divided into four subgroups based on sex and disease phenotype. There were significant correlations with age in both phenotypic groups and for both sexes for several genes (data not shown). In order to accurately assess the gene expression associated with T1D, we performed conditional logistic regression analysis on age- and sex-matched pairs. The ORs per SD increment of gene expression levels were first computed using expression data as continuous variable (Table 1). These analyses suggest that eight genes (TGFB1, SELL, PSMB3, CD74, S100A8, S100A9, and IL12A) were significantly upregulated and three genes (GNLY, PSMA4, and SMAD7) were significantly downregulated in T1D patients. Conditional logistic regression analysis was also performed using expression values as categorical variable after assigning each subject into low- or high-expression subgroups using the mean expression levels observed in AbN control subjects. The results are highly consistent with the analysis using expression levels as continuous data (Table 1). We also compared gene expression differences between juvenile-onset (n = 692) and adult-onset T1D patients (n = 232), but no significant difference was found (data not shown). The three negative control genes were analyzed using the same approaches, and no significant differences were found for these genes (data not shown).
Box plots for gene expression levels (data were log transformed) in different subject groups including AbN, T1D without any complication (T1D-NoComp), T1D with at least one complication (AnyComp), and T1D with one specific complication (neuropathy, HTN, retinopathy, nephropathy, or CAD). The box lines delineate the 25th, 50th, and 75th percentile, respectively.
Box plots for gene expression levels (data were log transformed) in different subject groups including AbN, T1D without any complication (T1D-NoComp), T1D with at least one complication (AnyComp), and T1D with one specific complication (neuropathy, HTN, retinopathy, nephropathy, or CAD). The box lines delineate the 25th, 50th, and 75th percentile, respectively.
The T1D group is a heterogeneous set of subjects as some have developed complications such as neuropathy, retinopathy, nephropathy, coronary artery disease (CAD), and hypertension (HTN). As the expression levels for some genes may be associated with diabetes complications, it is important to delineate the gene expression differences caused by hyperglycemia or the specific types of complications. Among the 928 T1D patients in our dataset, 238 patients had at least one diabetes complication. To exclude expression differences caused by diabetes complications, we compared expression levels between 922 AbN control subjects and 407 T1D patients without any complications (Table 1). In the first analysis, each T1D patient without complication is matched with an AbN control for age and sex; the expression data were analyzed as continuous variables using logistic regression. The results indicate that the seven of eight genes significantly upregulated in the entire dataset (except for S100A9) were also significantly higher in T1D patients without complications compared with control subjects. Additionally, the three genes shown to be downregulated in the entire T1D cohort were consistently downregulated in the T1D subset without complications (Table 1). Conditional logistic regression analysis using expression values as categorical variable also revealed similar findings (Table 1).
Gene expression changes associated with diabetes complications
We next examined gene expression changes associated with complications seen in T1D patients. In this dataset, there were 119 patients with hypertension, 59 patients with neuropathy, 70 patients with retinopathy, 35 patients with CAD, and 26 patients with nephropathy. Before examining the gene expression changes in different diabetes complications, we determined whether T1D duration is a potential confounding factor since T1D duration is significantly longer for all complication groups (Supplementary Table 3). We performed regression analysis of gene expression level with T1D duration as covariate (Supplementary Table 4). There were significant correlations with T1D duration in both sexes for four genes (S100A9, S100A8, PSMA4, and IL12A) and significant correlation in only the male group for four genes (RNF31, FOXP3, MFNG, and NFKB1).
Since T1D duration and sex are confounding factors for most genes, age, T1D duration, and sex matching was then performed using a multivariate and propensity score–matching software (9). Each T1D patient with a specific complication was paired with one or two of the closest T1D patient(s) without any complications. The comparisons of expression levels between complication and noncomplication groups are shown as a box plot for each complication (Fig. 2) for the top four genes. Conditional logistic regression analysis was performed to first assess changes associated with diabetes complication using all 238 patients with at least one complication. Four genes (S100A8, S100A9, SELL, and MNDA) were significantly higher in the T1D patients with any one kind of complication (any-complication group) compared with the no-complication group, whereas four other genes (SMAD7, PSMA4, MFNG, and GNLY) were significantly lower in T1D with complication than T1D without complication (Table 2). To exclude the possibility that the gene expression changes result from the treatment for secondary complications, we further examined gene expression differences between patients that were on treatment with specific drugs and those without the treatment. As the numbers of patients on individual drugs were very small, we grouped the treatments into four major categories: hypertension, blood cholesterol, hypothyroidism, and psychological disorders. Our results suggest that the expression differences observed in this study were not due to treatment for diabetes complications (data not shown).
Box plots for gene expression levels (data were log transformed) in T1D patients without any complication and T1D patients with different complication groups: at least one of complications (AnyComp), neuropathy, HTN, retinopathy, nephropathy, and CAD. The box lines delineate the 25th, 50th, and 75th percentile, respectively. Data are matched for sex and age between the two examined groups. Comp, complications; N, no for complications; Y, yes for complications.
Box plots for gene expression levels (data were log transformed) in T1D patients without any complication and T1D patients with different complication groups: at least one of complications (AnyComp), neuropathy, HTN, retinopathy, nephropathy, and CAD. The box lines delineate the 25th, 50th, and 75th percentile, respectively. Data are matched for sex and age between the two examined groups. Comp, complications; N, no for complications; Y, yes for complications.
We further examined these eight genes in five patient subsets with specific complications (neuropathy, hypertension, retinopathy, nephropathy, and CAD). Patients with neuropathy showed significant changes for all eight genes, especially with the four upregulated genes. For example, subjects with higher S100A8 expression are 7.9 times more likely to have neuropathy. T1D patients with neuropathy are also much more likely to have higher gene expression levels for S100A9, SELL, and MNDA (OR 4.8, 4.9, and 5.2, respectively). Patients with hypertension also showed significant changes for seven of the eight genes, including all four upregulated genes (S100A8, S100A9, SELL, and MNDA) (Table 2). Although statistical significance was not reached, the four upregulated genes were also higher in patients with other examined complications, including CAD, retinopathy, and nephropathy, suggesting that these genes are probably implicated in all diabetes complications.
Finally, we examined pairwise correlation of gene expression levels for all 18 genes, and the results are graphically presented in Fig. 3. Clustering of the genes based on the expression correlation values revealed a main cluster of proinflammatory genes (S100A8, S100A9, MNDA, and SELL) significantly upregulated in T1D and its complications. TGFB1, NFKB1, and RNF10 are also loosely grouped within this cluster, and these genes, especially TGFB1, are increased in T1D compared with control subjects.
Pairwise correlations between gene expression levels for 18 studied genes in different subject groups: AbN, T1D without any complication (T1D-noComp), T1D with at least one complication (AnyComp), and T1D with neuropathy. The squares indicate the high correlation between S100A8, S100A9, MNDA, and SELL.
Pairwise correlations between gene expression levels for 18 studied genes in different subject groups: AbN, T1D without any complication (T1D-noComp), T1D with at least one complication (AnyComp), and T1D with neuropathy. The squares indicate the high correlation between S100A8, S100A9, MNDA, and SELL.
CONCLUSIONS
This is to date the largest-scale study on gene expression profiles in human T1D patients. Several strategies contributed to the success of this study. The large sample set of close to 2,000 individuals was especially informative for the gene expression profiles in control subjects and T1D patients with and without diabetes complications. Since expression data were generated on multiple plates, normalization through the use of a standard curve for each gene in each 384-well plate was critical to ensure consistency across plates. Furthermore, the use of multiple validated reference genes, rather than a single gene, improved the normalization of the RNA concentration across all samples.
This well-designed and adequately powered study allowed us to confirm gene expression differences between T1D and control subjects initially suggested by microarray experiments. We also demonstrated that gene expression is significantly different between AbN control subjects and T1D patients without any diabetes complications. The genes with a higher expression in T1D are implicated in immune function, such as inflammation (S100A8, S100A9, MNDA, and IL12-A), immune regulation/promotion (TGFB1 and SELL), and antigen processing and presentation (CD74 and PSMB3). Our findings strongly support the notion that chronic inflammation contributes to T1D development (10–12). Upregulation of genes involved in antigen processing and presentation may occur in response to the activation and proliferation of autoreactive pathogenic T cells in T1D patients.
Two previous studies with small sample sizes (13,14) reported a decreased TGFB1 level in T1D patients; however, this study with a very large sample size revealed a significantly higher expression of TGFB1 in T1D patients. TGFB1 acts as a double-edged sword in immune regulation. It induces immune tolerance by acting as a potent immune suppressor through the inhibition of proliferation, differentiation, activation, and effector function of immune cells (15–17). Parodoxically, it also promotes immune reaction by being a potent chemoattractant for neutrophils and promoting inflammation. In addition, it can induce differentiation into the anti-inflammatory regulatory T cells (18–20) or the proinflammatory Th17 cells (21) depending on the context. Although the protective effects of TGFB1 have been documented in various autoimmune diseases, including T1D, the increased TGFB1 gene expression in T1D shown in the current study suggests that TGFB1 may mainly play a pathogenic role in T1D as a proinflammatory mediator. Interestingly, the expression of an inhibitor of TGFB1 signaling, SMAD7, showed a decreased level of expression in T1D as compared with healthy control subjects. SMAD7 encodes a nuclear protein that binds the E3 ubiquitin ligase SMURF2 and inhibits TGFB1 signaling in many ways (22). Overexpression of SMAD7 has been shown to antagonize TGFB1-mediated fibrosis, carcinogenesis, and inflammation (23). More interestingly, it has been reported that TGFB1 and SMAD7 play a major role in diabetic nephropathy (24). In this study, SMAD7 is indeed reduced in T1D patients with complications compared with patients without any complications.
The most noteworthy finding of this study is the identification of a set of genes associated with diabetes complications. Relative to the comparison between T1D patients and control subjects, the sample sizes for the study on specific complications are smaller and the results should be viewed with more caution. However, several strategies used in this study ensure that the findings are most likely robust. First, we started by comparing T1D patients without any complication (n = 407) and T1D patients with any complication (n = 238). Only those genes showing significant differences in this relatively large dataset were further analyzed for specific complications. Second, many of the genes are altered in multiple complications, further strengthening evidence for the findings. For example, the most prominent genes are two inflammatory genes, S100A8 and S100A9, both being strongly associated with complications, especially neuropathy (OR 7.9 and 4.8) and hypertension (2.7 and 2.2). These two genes may also be increased in retinopathy (OR 1.9 and 1.4) and nephropathy (3.6 and 1.3), even though statistical significance was not reached due to small sample size. Both macrovascular and microvascular complications can occur in T1D patients. CAD is a major macrovascular complication, and neuropathy, nephropathy, and retinopathy are major microvascular complications (25). Although the pathogenesis of diabetic angiopathy is incompletely understood, patients with one complication often present with a second one, suggesting the existence of common risk factors and pathogenic mechanisms. Endothelial dysfunction, a common change among T1D patients, represents a major link between vascular complications (26). S100A8 and S100A9 may play important roles in different pathways leading to endothelial dysfunction initiated by inflammation and/or hyperglycemia, two hallmarks of T1D patients (Fig. 4). S100A8/A9, also referred to as MRP8/14, are two calcium-binding proteins primarily expressed in cells of myeloid origin, particularly in monocytes and neutrophils. As a marker of monocyte and neutrophil activation, the expression and secretion of the S100A8/A9 heterodimer could be induced by inflammatory cytokines such as IL1β (27). Other inflammatory cytokines such as IL6 and TNF-α also had positive associations with S100A8/A9 and diabetic vascular complications (28). In one pathway, S100A8/A9 stimulation can cause a rapid increase of CD11b expression on the monocyte surface, accounting for the increase of trans-endothelial migratory activity of monocytes and hence vascular endothelium changes (29). In another pathway, S100A8/9 protein can bind specifically to and induce a specific inflammatory response in human vascular endothelial cells by increasing the transcription of proinflammatory chemokines and adhesion molecules (30). More interestingly, the proinflammatory endothelial response to S100A8/A9 can be increased by advanced glycation end products (AGEs) since S100A8/A9 can work as ligands of RAGE (receptor for AGE) and have effects on the RAGE-NFKB–mediated induction of proinflammatory gene expression (31).
Studied genes involved in pathways leading to endothelial dysfunction.
Another myeloid cell–related gene, MNDA, was found as having highly correlated expression with S100A8/S100A9 and was also increased in neuropathy and hypertension (OR 5.2 and 2.5). The myeloid cell nuclear differentiation antigen (MNDA) is detected only in the nuclei of cells of the granulocyte-monocyte lineage and plays a role in the cell-specific response to interferon (32). The function of MNDA in T1D and diabetes complications remains elusive.
Another interesting gene that is upregulated in both T1D and diabetes complications is SELL. L-selectin (SELL) is a cell surface adhesion molecule that plays a critical role in facilitating leukocyte migration from blood vessels to secondary lymphoid organ and sites of local inflammation. Although there are controversial reports about the role of L-selectin in the development of autoimmune diabetes in NOD mice (33–35), a positive correlation between the serum level of soluble L-selectin and T1D, diabetic retinopathy, atherosclerosis, and arterial hypertension was reported in human studies (36–38). The percentage of T cells expressing L-selectin was significantly increased in T1D patients and in T1D patients with arterial hypertension as well. Consistent with these findings, we provided convincing evidence that L-selectin expression in peripheral blood mononuclear cells is higher in T1D patients (OR 1.4), especially those patients with diabetic neuropathy (4.9) and hypertension (1.6).
The examination of a pairwise correlation of gene expression revealed coordinated upregulation of multiple proinflammatory genes in myeloid cells. S100A8, S100A9, and MNDA are likely examples of the cluster of myeloid cell-related genes associated with T1D and diabetes complications. It will therefore be important to examine the cellular functions and pathogenic roles of myeloid cells in the development of T1D and its complications in future studies.
Acknowledgments
This work was supported by grants from the National Institutes of Health (4R33-HD-050196, DR33-DK-069878, and 2RO1-HD-37800) and the Juvenile Diabetes Research Foundation (JDRF 1-2004-661) to J.-X.S.
No potential conflicts of interest relevant to this article were reported.
Y.J. contributed to study design, most of the data generation, and wrote the manuscript. A.S. performed most of the data analysis. C.C. contributed to data generation. D.H., D.G.R., B.B., S.W.A., J.C.R., R.D.S., and L.S. contributed to sample and data collection. X.W. contributed to data analysis. J.-X.S. contributed to study design, data interpretation, and writing of the manuscript. All authors reviewed and edited the manuscript. J.-X.S. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.