Colorectal cancer (CRC) is the third most common cancer and a leading cause of cancer-related deaths worldwide.[1] Although cancer therapy has been improved in recent years, the mortality of CRC keeps increasing in China.[2,3] Current clinical treatment strategy mainly depends on the tumor-node-metastasis classification system,[4] however, the individual response of the same tumor stage can be heterogeneous after receiving similar treatments, suggesting the importance of identifying new clinical biomarkers to accurately predict prognosis of CRC patients and guide CRC personalized treatment.
Up to date, increasing numbers of studies have reported that single nucleotide polymorphisms (SNPs) were associated with the progression of various kinds of tumors, including CRC.[5–8] Some SNPs were reported to be associated with drug disposition, pharmacokinetic effects, and host immunity, and subsequently affect the drug response and survival outcomes.[9–12] However, the identified prognosis-related genetic variants were limited and the specific biological mechanisms underlying how SNPs influence tumor behavior and eventual clinical outcomes were not well elucidated. The expression quantitative trait locus (eQTL) mapping that associated genetic variants with gene expression might serve as an effective intermediate tool to fill the gap from genotypes to human phenotypes.[13,14]
In this study, we confirmed the relationship between CRC survival and eQTLs by conducting a two-stage population study in Chinese population and further experimentally validated the function of CRC survival-associated SNP. These results might, to some extent, help to understand the genetic impact on cancer courses and guide the clinical therapy for CRC patients.
Methods Genome-wide eQTL analysis in CRCWe downloaded the germline genotype, copy number variation (CNV), CpG methylation, and mRNA expression data of colon adenocarcinoma and rectum adenocarcinoma from The Cancer Genome Atlas (TCGA) data portal (https://tcga-data.nci.nih.gov/tcga/). Samples with all four kinds of data available were retained.
Previous studies showed that factors affecting global gene expression might reduce the eQTL-identifying power. To minimize the influence of the population difference, we first conducted the principal component analysis (PCA) using the EIGENSOFT6.0 software[15,16] (https://github.com/DReichLab/EIG) for the TCGA samples with a reference panel of 210 unrelated individual genotypes from the Haplotypemap (https://ftp.ncbi.nlm.nih.gov/hapmap/). According to the results of PCA analysis, 254 samples from northwestern Europe were used for the subsequent analysis [Supplementary Figure 1, https://links.lww.com/CM9/B715]. To remove the effects of somatic CNV and DNA methylation levels, we applied the modified cis-eQTL analysis, a two-stage multivariate linear regression model, considering SNPs within one million bases (Mb) around the genes. The method has been described in our previous study.[17] SNPs with false discovery rates (FDR) <0.05 calculated by the Benjamini–Hochberg method were defined as significant eQTLs. To narrow down the candidate SNPs, we restricted the correlations between SNPs and the target genes to Pearson | r| >0.3. For the replication of eQTL analysis results, we used the eQTL data of colon transverse and sigmoid tissues from Genotype-Tissue Expression (GTEx) database (https://www.gtexportal.org/home/).
Identification of eQTL genes associated with CRC survivalTo identify survival-associated eQTL genes in CRC, we first downloaded gene expression data and survival information from two public datasets including TCGA (N = 595) and GSE39582 (N = 562, https://www.ncbi.nlm.nih.gov/geoprofiles), respectively. For genes with multiple probes in GSE39582, only probes with the highest median value were retained. Patients were stratified into two groups based on the median of gene expression, and the overall survival (OS) time described in months was considered as the survival outcome. Kaplan–Meier analysis was used to identify genes associated with CRC prognosis. Genes with log rank P <0.05 and consistent effect direction in both datasets were retained.
Identification of potential functional SNPsThe functional prediction and verification were performed for all SNPs in linkage disequilibrium ([LD], r2 >0.2) with the eQTLs of those survival-related genes by using the Haploreg 4.1 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php)[18] and CistromeDB database (http://cistrome.org/db)[19]. SNPs were supposed to exert the function of promoter or enhancer and we considered the weight of promoter or enhancer histone marks to be higher than deoxyribonuclease (DNase) marks. We have generated the functional score for each SNP in LD with the eQTLs (the detailed standards and the score results were shown in Supplementary Table 1, https://links.lww.com/CM9/B714). SNPs with the highest total score were selected as the candidates. The eQTL analysis results of SNPs from other study groups were also shown in Haploreg 4.1 and were considered as the reference when the total scores were the same. Histone modification enrichment signal intensity of candidate SNPs was further verified by the chromatin immunoprecipitation followed by sequencing (ChIP-seq) data in the CistromeDB database. Moreover, for SNPs residing in the 3′-untranslated region (3′-UTR) of their eQTL target genes, we first used the miRNASNP database (http://bioinfo.life.hust.edu.cn/miRNASNP/)[20] to predict the potential binding microRNAs. Then, target genes of the microRNAs with expression >0.1 in colon or rectal tissues were predicted using the TargetScan database (https://www.targetscan.org/vert_72/)[21] and we compared the target genes with our eQTL results as well. The 3D Genome Browser database (http://promoter.bx.psu.edu/hi-c/index.html)[22] was used to verify the remote regulation of candidate loci.
Study populationIn this study, 907 newly diagnosed CRC samples were collected from hospitals in Wuhan from 2004 to 2010. Written informed consent was obtained from each subject and the study was conducted with the approval of the participating hospital (No. NCT00454519). All patients were histologically diagnosed with CRC with no previous history of malignant tumors and did not receive radiotherapy or chemotherapy before sample collection. Patients with two or more malignancies were excluded. Follow-up data collection was under the guidance of clinicians. Patient information on pathological diagnostics, clinic characteristics, and disease stages was collected. Patients diagnosed within one year were followed up every three months, and the other patients were followed up every six months. The observation end time for the cohort was up to 2022. We also screened participants in the UK Biobank cohort for validation. CRC cases were defined as subjects with primary invasive CRC using the International Classification of Diseases, 10th revision (ICD-10). The follow-up time of cancer survival was calculated from cancer diagnosis to death or the last follow-up (March 31, 2021). We determined whether an individual died of specific cancer by considering the ICD-10 codes listed as the primary cause of death, resulting in a total of 3609 CRC cases included.
SNP genotyping for collected CRC samplesThe peripheral blood sample was collected for genomic DNA isolation and we used the Sequenom MassArray platform (Sequenom Inc., San Diego, CA, USA) for genotyping according to the manufacturer's instructions.
Imputation and quality control for genotype dataFor the loci that were associated with CRC prognosis and had no genotyping information in TCGA, the genotypes were imputed by IMPUTE2 software (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html). The reference population for genotype imputation was the European population in the 1000 Genomes Project Phase 3 (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/). Imputation quality was limited to r2 ≥0.4.
Cell linesThe HCT116 and SW480 cell lines were obtained from the China Center for Type Culture Collection (Wuhan, China) in September 2021. These cell lines were authenticated by short tandem repeat profiling (Applied Biosystems, New York, USA) and tested for the absence of mycoplasma contamination. All cell lines were cultured in Dulbecco's modified Eagle's medium (DMEM, Gibco, New York, USA) supplemented with 10% fetal bovine serum (Gibco) and 1% antibiotics (100 U/mL penicillin and 0.1 mg/mL streptomycin, Gibco) at 37℃ in a humidified atmosphere of 5% CO2.
Dual-luciferase reporter assayThe 1130 bp DNA sequence containing both wild-type and mutation-type of rs71630754 was synthesized and inserted into the PGL3-promoter (Promega, Madison, WI, USA) at both forward and reverse directions. Constructed plasmids and reference plasmids pRL-SV40 were co-transfected into HCT116 and SW480 cell lines by using lipofectamine 3000 (Invitrogen, Waltham, MA, USA). After 36 h incubation, luciferase activity was detected by the dual-luciferase assay system (Promega). All experiments were repeated three times independently.
Electrophoretic mobility shift assay (EMSA)Nuclear extracts were prepared by using Nuclear Protein Extraction Kit (Beyotime, Shanghai, China). Two sets of double-strand oligonucleotides with or without biotin-label centered on the rs71630754 were synthesized by TaKaRa Bio (Kyoto, Japan). EMSA/Gel-Shift Kit (Beyotime) was used to detect the DNA binding ability. Nuclear extracts were incubated for 20 min with unlabeled probes at a gradient concentration. For the super-shift assay, we first used JASPAR (https://jaspar.genereg.net/) to predict the possible binding transcription factors of the rs71630754. And gradient concentration of transcription factor 3 (TCF3) antibody (ab228699, Abcam, Cambridge, MA, USA) was incubated together with the nuclear extract and labeled probes for 20 min. After that we added the labeled oligonucleotide probes at room temperature, those reaction mixtures were loaded on an 8% polyacrylamide electrophoresis gel. The result was detected by SuperSignal West Femto Trial Kit (Thermo, Rockford, IL, USA).
Estimation of immune cell fractionDeconvolution algorithms consider gene expression profiles of a heterogeneous sample as the convolution of the gene expression levels of the different cells and estimated the unknown cell fractions leveraging on a signature matrix describing the cell-type-specific expression profiles. Combining the CIBERSORT[23] algorithm and signature matrix of 22 mature human hematopoietic populations, we quantified the relative percent of 22 immune cell subsets based on RNA sequencing data from both TCGA and GEO data, meanwhile, a P value for the global deconvolution of each sample was obtained. We excluded samples with a P value ≥0.05 as recommended by the CIBERSORT developer. Moreover, we applied Tumor Immune Estimation Resource (TIMER)[24] to estimate the fraction of immune cells using RNA sequencing data from TCGA CRC patients. Then, the association between literature-reviewed immune cell type-specific marker genes[25] and endoplasmic reticulum aminopeptidase 1 (ERAP1) was further calculated by Pearson correlation analysis. The marker genes were listed in Supplementary Table 2, https://links.lww.com/CM9/B714.
Statistical analysisLog rank test was applied in examining the differences between age and gender groups, as well as tumor stage. To measure the effects of SNPs on the OS of CRC patients, Cox proportional hazard model was used to calculate the hazard ratio (HR) and 95% confidence interval (CI) under an additive genetic model, with the adjustment of age, gender and tumor stage. Statistical analyses were performed on R software (3.3.1;The R Foundation for Statistical Computing, Vienna, Austria) using the "Survival" package. Pearson correlation analysis was used with GraphPad Prism 7.04 (GraphPad Software, Inc., California, USA) to calculate the association between eQTL genes and its predicted transcription factors, as well as estimated immune cell fractions. P values <0.05 were considered significant.
Results Expression of six genes regulated by seven potential functional SNPs were associated with CRC prognosisTo identify the eQTLs associated with CRC prognosis, we first did the genome-wide systematical eQTL analysis using TCGA data. There were 3069 SNP-gene pairs with FDR-P <0.05, including 361 eQTL target genes closely correlated with eQTL SNPs (| r| >0.3, Supplementary Table 3, https://links.lww.com/CM9/B714). Then genes associated with the OS of CRC patients were maintained and the corresponding SNPs were considered for the functional prediction. Kaplan–Meier analysis found that 230 genes [Supplementary Table 4, https://links.lww.com/CM9/B714] were in concordant correlation with CRC survival outcome in both TCGA and GEO datasets (P <0.05). Among the 230 genes, six genes (leucyl-tRNA synthetase 2 [LARS2], proline and serine rich coiled-coil 1 [PSRC1], transcription factor B1 [TFB1M], tetratricopeptide repeat domain 12 [TTC12], ERAP1, and ring finger protein 157 [RNF157]) were the eQTL target genes. The results showed that lower gene expression of these genes indicated shorter OS of the CRC patients [Figure 1, Supplementary Figure 2, https://links.lww.com/CM9/B715].
Figure 1:Genes associated with CRC prognosis. Kaplan–Meier survival curves for OS in the TCGA dataset (A) and GEO dataset (B). The expression of genes was divided into low and high groups according to median expression. CI: Confidence interval; CRC: Colorectal cancer; ERAP1: Endoplasmic reticulum aminopeptidase 1; GEO: Gene Expression Omnibus; HR: Hazard ratio; LARS2: Leucyl-tRNA synthetase 2; OS: Overall survival; PSRC1: Proline and serine rich coiled-coil 1; RNF157: Ring finger protein 157; TCGA: The Cancer Genome Atlas; TFB1M: Transcription factor B1; TTC12: Tetratricopeptide repeat domain 12.
The expression of the six survival-related genes was associated with 46 eQTLs according to our eQTL analysis. To identify the most potential functional SNPs, we considered all the SNPs in LD (r2 >0.2, n = 784) with those eQTLs. Functional prediction suggested that seven SNPs (minor allele frequency ≥0.05, rs12088168, rs13074123, rs34750, rs1334688, rs7107223, rs74254475, rs71630754) had the highest potential to be functional, among which two independent SNPs (rs34750 and rs71630754) reside in ERAP1 gene [Supplementary Table 1, https://links.lww.com/CM9/B714]. As for SNPs in 3′-UTR regions [Supplementary Table 5, https://links.lww.com/CM9/B714], none of the predicted target genes in TargetScan were in concordance with the eQTL analysis results. Therefore, we only considered the seven SNPs as the candidate variants for the Chinese population study. The functional validation results of these SNPs were shown in Supplementary Figure 3, https://links.lww.com/CM9/B715.
rs71630754 was associated with CRC prognosisPopulation analyses were performed to confirm the relationship between SNPs and patients' outcomes. In total, 907 Chinese CRC patients with prognosis information were collected and the characteristics of the patients were shown in Supplementary Table 6, https://links.lww.com/CM9/B714. Among the patients, 383 died of CRC and the median survival time was not statistically significantly different between male and female patients. Patients with elder age (hazard ratio [HR]: 1.80, 95% confidence interval [CI]: 1.47–2.21, P <0.001) and later cancer stages (HR: 2.67, 95% CI: 2.20–3.31, P <0.001) had worse outcome. Of the seven candidate SNPs, the Cox-regression results showed that the rs71630754 was significantly associated with CRC survival (additive model, HR: 1.43, 95% CI: 1.08–1.88, P = 0.012), and the survival time of patients with A allele was less than patients with G allele [Table 1 and Supplementary Table 7, https://links.lww.com/CM9/B714]. The association between the rs71630754 and CRC patients' survival was further validated in the UK biobank cohort (additive model, HR: 1.13, 95% CI: 1.01–1.27, P = 0.028, Supplementary Table 8, https://links.lww.com/CM9/B714). Then, we performed the stratification analysis to determine the association between the rs71630754 and CRC survival. A significant association between the SNP and worse outcome of CRC was observed for subgroup patients of female, stage I/II, and stage III/IV [Supplementary Figure 4, https://links.lww.com/CM9/B715].
Table 1 - Association between the rs71630754 and OS in CRC patients (N = 907). Genotypes n (%) Median survival time (months) HR (95% CI) P values rs71630754 GG 811 (89.42) 105 1.00 (reference) – GA 92 (10.14) 103 1.41 (1.04–1.92) 0.029 AA 4 (0.44) 48 1.47 (0.83–2.60) 0.182 Additive model – – 1.43 (1.08–1.88) 0.012 Dominant model – – 1.46 (1.07–1.95) 0.016HR and P values were calculated using Cox proportional hazards regression with adjustment for age, gender, and stage of the disease. CI: Confidence interval; CRC: Colorectal cancer; HR: Hazard ratio; OS: Overall survival; –: Not available.
Since the functional prediction from the Haploreg and CistromeDB database showed that the rs71630754 resided in the enhancer region, we further confirmed the results by the high-throughput chromosome conformation capture (Hi-C) data in the 3D Genome Browser database, and the data showed the possible interaction between the rs71630754 and ERAP1 promoter [Figure 2]. ChIP-Seq data from several studies had also validated the enhancer function of the rs71630754, which resided in the histone marker of acetylated histone 3 lysine 27 (H3K27ac) or histone 3 lysine 4 monomethylation (H3K4me1). Besides, GTEx data also showed that individuals carrying the rs71630754-A allele had lower expression of ERAP1 compared with the rs71630754-G allele carriers in colon tissues, which was consistent with eQTL result in TCGA dataset [Figure 3A]. In addition, we found out that patients with stage III/IV had significantly lower ERAP1 gene expression than the early-stage patients in TCGA dataset, and the same trend was also shown in GSE39582 [Supplementary Figure 5, https://links.lww.com/CM9/B715], which further confirmed the association between ERAP1 expression and worse survival outcome.
Figure 2:Enhancer functional prediction of the rs71630754. The Hi-C data were downloaded from the 3D Genome Browser database. The ChIP-Seq data for multiple CRC cell lines came from multiple study groups. CAST: Calpastatin; chr: Chromosome; Chip-seq: Chromatin immunoprecipitation sequence; CRC: Colorectal cancer; DHSs: Deoxyribonuclease I (DNase I) hypersensitive sites; DNase: Deoxyribonuclease; ERAP: Endoplasmic reticulum aminopeptidase; H3K27ac: H3K27 acetylation; H3K4me1: H3 lysine 4 monomethylation; Hi-C: High-throughput chromosome conformation capture.
Figure 3:The rs71630754 was associated with the expression of ERAP1. (A) The eQTL results of the rs71630754 in both the GTEx and TCGA databases showed that gene expression of ERAP1 was significantly decreased in rs71630754 mutant allele carriers. (B) Luciferase experiment assay showed that the A allele rs71630754 results in lower ERAP1 expression in both HCT116 and SW480 cell lines with the mutant allele reside in downstream of the reporter genes. ERAP1: Endoplasmic reticulum aminopeptidase 1; eQTLs: Expression quantitative trait loci; GG, AG, AA: The three genotypes of rs71630754; GTEx: Genotype-Tissue Expression; Mut: Mutation; ns: No significance; Ref: Reference; TCGA: The Cancer Genome Atlas; *P <0.0001.
We then conducted several functional assays to evaluate the function of this variant. Luciferase reporter assay was firstly performed to confirm the regulation of the rs71630754 on gene expression and it showed that the rs71630754-A allele resulted in lower ERAP1 expression in both HCT116 and SW480 cell lines when inserted into the downstream of the reporter genes [Figure 3B], which mimicked the real position of SNPs and ERAP1 promoter region. Given that allele-specific activity of genetic variants in regulatory regions might function through influencing the different binding affinity of transcription factors (TFs), we conducted an EMSA experiment in both HCT116 and SW480 cell lines and the results showed that the mutant allele had a stronger binding affinity to nuclear protein than the G allele. With the increasing amount of unlabeled rs71630754 mutant probes, the binding band of labeled probe-nuclear protein was decreasing [Figure 4A]. To further identify the specific TF being affected, we used the JASPAR database to make the prediction and the data showed that the rs71630754 was consistent with the binding sequence of TCF3 [Figure 4B]. TCF3 had been increasingly reported as a transcriptional repressor, which could interfere with the formation of transcription initiation complexes and regulate target gene expression.[26,27] We then found that TCF3 was reversely correlated with ERAP1 expression using the gene expression data from the TCGA tumor sample for correlation analysis [Figure 4C]. In addition, the super-shift EMSA experiment in two CRC cell lines confirmed the prediction results, with increasing antibody of TCF3, the labeled probe-nuclear protein complex was significantly decreased [Figure 4D]. There was no super-shift band of antibody–protein–DNA complex in our super-shift EMSA experiment, which might due to that the antibody blocked the DNA–protein complex formation or it might because that the molecular mass of the antibody–protein–DNA complex was relatively high and it was unable to shift to be shown in the band within the experiment time. Thus, we confirmed that the rs71630754-A allele could increase the binding of transcription factor TCF3 and subsequently decrease the expression of ERAP1.
Figure 4:The rs71630754 A allele increased the binding of TCF3 and decreased the expression of ERAP1. (A) EMSA of both the HCT116 and SW480 cell lines showed that the mutant allele has a stronger binding ability of the nuclear protein than the wild-type allele. With the increasing amount of non-marked rs71630754 mutant probe, the binding band of nuclear protein was decreasing. (B) Transcription factor prediction of the JASPAR database showed that rs71630754 was consistent with the binding sequence of TCF3. (C) Correlation analysis showed that TCF3 was reversely correlated with ERAP1 gene expression. (D) Super-shift EMSA experiment in two CRC cell lines showed that with increasing antibody of TCF3, the DNA–protein complex was significantly decreasing. Alt: Alteration; CRC: Colorectal cancer; EMSA: Electrophoretic mobility shift assay; ERAP1: Endoplasmic reticulum aminopeptidase 1; Ref: Reference; RPKM: Reads per kilobase per million mapped reads; TCF3: Transcription factor 3.
ERAP1 was associated with the function of CD8+ T cellsSince literature reviews showed that ERAP1 was associated with the immune process,[28–34] we explored the association between ERAP1 and immune cell infiltration. We first calculated the infiltration of 22 kinds of immune cells in tumor tissues by using the CIBERSORT algorithm. Then, the associations between ERAP1 and the fraction of each immune cell type as well as the immune cell marker genes were calculated. The correlation analysis identified that ERAP1 was correlated with the relative fraction of CD8+ T cells (TCGA: R = 0.155, P = 0.0417; GSE39582: R = 0.092, P = 0.0497), which function as tumor kill cells. We further confirmed the association between ERAP1 and CD8+ T cells by using TIMER in TCGA CRC patients (R = 0.327, P <0.001, Supplementary Figure 6, https://links.lww.com/CM9/B715). And the relationship between ERAP1 and CD8+ T cells was confirmed by the results that seven marker genes of CD8+ T cells, including CD8 subunit alpha (CD8A), natural killer cell granule protein 7 (NKG7), T cell immunoreceptor with Ig and ITIM domains (TIGIT), CD2 molecule (CD2), CD3 delta subunit of T-cell receptor complex (CD3D), CD3 epsilon subunit of T-cell receptor complex (CD3E), and CD3 gamma subunit of T-cell receptor complex (CD3G) were significantly correlated with ERAP1, which were consistent in both TCGA and GSE39582 dataset [Figure 5A, 5B].
Figure 5:Association analysis of ERAP1 and immune cell fraction and cell marker genes. The association was calculated in both TCGA (A) and GEO (B) datasets. Expression of all the genes was log-transformed. CD2: CD2 molecule; CD3D: CD3 delta subunit of T-cell receptor complex; CD3E: CD3 epsilon subunit of T-cell receptor complex; CD3G: CD3 gamma subunit of T-cell receptor complex; CD8A: CD8 subunit alpha; ERAP1: Endoplasmic reticulum aminopeptidase 1; GEO: Gene Expression Omnibus; NKG7: Natural killer cell granule protein 7; TCGA: The Cancer Genome Atlas; TIGIT: T cell immunoreceptor with Ig and ITIM domains.
DiscussionIn this study, we systematically identified the association between eQTL and CRC prognosis, and identified that the rs71630754-A allele was significantly associated with worse survival results of CRC patients. Functional experiments confirmed that the rs71630754 could influence the expression of ERAP1 by affecting binding of transcription factor TCF3. Lower expression of ERAP1 might contribute to the poor prognosis of CRC patients by influencing the immune infiltration of CD8+ T cells and the tumor-killing process.
Based on clinical data of CRC patients and ERAP1 gene expression data in TCGA, we found that the expression level of ERAP1 was significantly lower in patients with later clinical stage, suggesting a certain correlation between ERAP1 expression and the progression of CRC. Literature research showed that ERAP1 was involved in the antigen processing procedure[28] and played an important role in inflammation, autoimmune diseases, tumors, and other diseases.[29–32] The main function of this gene is to shear 9–16 amino acid antigenic peptides so that the peptides can be recognized and transported by tapasin (TAP), and then bind with major histocompatibility complex type I (MHC-I) molecules and be present to the cell surface for the recognition of natural killer (NK) cells or specific CD8+ T cells. Wang et al[33] found that when there was a lack of P53 in HCT116 cell line, the expression of the ERAP1 gene was down-regulated and its binding with MHC-I was reduced, affecting tumor immune monitoring. In the current research, we also confirmed these results. Association analysis revealed that ERAP1 was associated with a relative fraction of CD8+ T cells as well as seven marker genes of CD8+ T cells, suggesting that patients with low ERAP1 expression might be deficient in immune protection. In addition, genome-wide association study research from Yang et al[34] also identified that genetic variants in ERAP1 were associated with the risk of lung cancer. Therefore, further studies on ERAP1 would contribute to a deeper understanding of the nature of the disease and be beneficial to guide clinical treatment.
Additionally, in the correlation analysis between gene expression and prognosis of CRC, we also found that the low expression of the other five genes was associated with poor prognosis of patients including TFB1M, TTC12, RNF157, PSRC1, and LARS2. Although the SNPs that regulated the expression of these genes were not shown to be significantly associated with the prognosis of CRC in our study, they might be regulated in other ways, and the mechanism of the relationship between these genes and the progression of CRC needs further study. These genes were also reported to be associated with tumor progression[35–37] or exert important function,[38–40] for example, reducing TFB1M has been shown to lead to mitochondrial dysfunction, which might result in impaired insulin secretion and was even closely associated with diabetes, while type 2 diabetes was significantly associated with the risk of CRC.[41] In this study, individuals with low TFB1M expression had a worse prognosis, and whether it was also related to mitochondrial function was worth further discussing.
In conclusion, this study screened out genes related to CRC prognosis by using the public database and identified the functional genetic variations regulating these genes in the Chinese population. The rs71630754, which could influence an immune-related gene ERAP1, was significantly related to the prognosis of CRC. These results could help to further understand the development process of CRC and the genetic loci found in this study could provide a theoretical basis for the prognosis prediction model in CRC.
AcknowledgmentsThe authors gratefully acknowledge the members of the Miao lab for their suggestions and contributions to this work.
FundingThis study was supported by grants from the National Natural Science Foundation of China (Nos. 82273713 and 82103929), the Young Elite Scientists Sponsorship Program by CAST (No. 2022QNRC001), the Fundamental Research Funds for the Central Universities (No. WHU:2042022kf1205), the Knowledge Innovation Program of Wuhan (No. whkxjsj011), the Distinguished Young Scholars of China (No. 81925032), the Key Program of National Natural Science Foundation of China (No. 82130098), the Hubei Provincial Natural Science Foundation of China (No. 2019CFA009), the Youth Program of National Natural Science Foundation of China (No. 82003547) and the Fundamental Research Funds for the Central Universities (No. WHU: 2042022kf1031).
Conflicts of interestNone.
References 1. Cao W, Chen HD, Yu YW, Li N, Chen WQ. Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J 2021; 134: 783–791. doi: 10.1097/CM9.0000000000001474. 2. Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J 2022; 135: 584–590. doi: 10.1097/CM9.0000000000002108. 3. Li N, Lu B, Luo C, Cai J, Lu M, Zhang Y, et al. Incidence, mortality, survival, risk factor and screening of colorectal cancer: A comparison among China, Europe, and northern America. Cancer Lett 2021; 522: 255–268. doi: 10.1016/j.canlet.2021.09.034. 4. Hemminki K, Santi I, Weires M, Thomsen H, Sundquist J, Bermejo JL. Tumor location and patient characteristics of colon and rectal adenocarcinomas in relation to survival and TNM classes. BMC Cancer 2010; 10: 688. doi: 10.1186/1471-2407-10-688. 5. Crea F, Fornaro L, Paolicchi E, Masi G, Frumento P, Loupakis F, et al. An EZH2 polymorphism is associated with clinical outcome in metastatic colorectal cancer patients. Ann Oncol 2012; 23: 1207–1213. doi: 10.1093/annonc/mdr387. 6. Chen J, Luo X, Xie G, Chen K, Jiang H, Pan F, et al. Functional analysis of SNPs in the ERCC5 promoter in advanced colorectal cancer patients treated with oxaliplatin-based chemotherapy. Medicine (Baltimore) 2016; 95: e3652. doi: 10.1097/MD.0000000000003652. 7. Dong G, He X, Chen Y, Cao H, Wang J, Liu X, et al. Genetic variations in genes of metabolic enzymes predict postoperational prognosis of patients with colorectal cancer. Mol Cancer 2015; 14: 171. doi: 10.1186/s12943-015-0442-x. 8. Lin M, Eng C, Hawk ET, Huang M, Lin J, Gu J, et al. Identification of polymorphisms in ultraconserved elements associated with clinical outcomes in locally advanced colorectal adenocarcinoma. Cancer 2012; 118: 6188–6198. doi: 10.1002/cncr.27653. 9. Kogan D, Grabner A, Yanucil C, Faul C, Ulaganathan VK. STAT3-enhancing germline mutations contribute to tumor-extrinsic immune evasion. J Clin Invest 2018; 128: 1867–1872. doi: 10.1172/JCI96708. 10. Huang RS, Ratain MJ. Pharmacogenetics and pharmacogenomics of anticancer agents. CA Cancer J Clin 2009; 59: 42–55. doi: 10.3322/caac.20002. 11. Whitfield ML, George LK, Grant GD, Perou CM. Common markers of proliferation. Nat Rev Cancer 2006; 6: 99–106. doi: 10.1038/nrc1802. 12. Pulendran B, Davis MM. The science and medicine of human immunology. Science 2020; 369: eaay4014. doi: 10.1126/science.aay4014. 13. Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease. Nat Rev Genet 2015; 16: 197–212. doi: 10.1038/nrg3891. 14. Rockman MV, Kruglyak L. Genetics of global gene expression. Nat Rev Genet 2006; 7: 862–872. doi: 10.1038/nrg1964. 15. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet 2006; 2: e190. doi: 10.1371/journal.pgen.0020190. 16. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909. doi: 10.1038/ng1847. 17. Zou D, Lou J, Ke J, Mei S, Li J, Gong Y, et al. Integrative expression quantitative trait locus-based analysis of colorectal cancer identified a functional polymorphism regulating SLC22A5 expression. Eur J Cancer 2018; 93: 1–9. doi: 10.1016/j.ejca.2018.01.065. 18. Ward LD, Kellis M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res 2016; 44: D877–D881. doi: 10.1093/nar/gkv1340. 19. Mei S, Qin Q, Wu Q, Sun H, Zheng R, Zang C, et al. Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res 2017; 45: D658–D662. doi: 10.1093/nar/gkw983. 20. Liu CJ, Fu X, Xia M, Zhang Q, Gu Z, Guo AY. miRNASNP-v3: a comprehensive database for SNPs and disease-related variations in miRNAs and miRNA targets. Nucleic Acids Res 2021; 49: D1276–D1281. doi: 10.1093/nar/gkaa783.
Comments (0)