The role of the SOX2 gene in cervical cancer: focus on ferroptosis and construction of a predictive model

Tumor stemness assay

We obtained RNA-sequencing expression data and clinical information for a cohort of 253 patients diagnosed with cervical cancer from the TCGA dataset, accessible through the portal at https://portal.gdc.com. Employing the OCLR algorithm, we computed the mRNAsi index as formulated by Malta et al. This index is grounded in the mRNA expression profiles, encompassing a total of 11,774 unique genes. Following the same analytical approach, we employed Spearman correlation on the RNA expression data. By subtracting the lowest value and subsequently dividing the outcome by the maximum value, we normalized the resulting dryness index within the [0, 1] range (Lian et al. 2019; Malta et al. 2018).

Human protein atlas (HPA) database

In this study, an investigation into the expression of SOX2 and its prognostic relevance in cervical squamous cell carcinoma and adenocarcinoma was conducted using the Human Protein Atlas (HPA) database. The HPA database is a valuable resource for exploring protein expression patterns and their associations with clinical outcomes (Kaminker and Timoshenko 2021).

SOX2 expression data: expression data for in cervical squamous cell carcinoma and adenocarcinoma tissues were retrieved from the HPA database. This included information on SOX2 expression levels, subcellular localization, and staining intensities.

Prognostic information: prognostic data related to cervical squamous cell carcinoma and adenocarcinoma, such as overall survival (OS) outcomes, were obtained from the HPA database.

The retrieved SOX2 expression data were analyzed to assess the differential expression of SOX2 in cervical squamous cell carcinoma and adenocarcinoma tissues compared to normal tissues. This analysis aimed to determine whether SOX2 expression exhibits significant alterations in these cancer types (Basha et al. 2018).

Prognosis analysis

We retrieved RNA-sequencing expression data as well as the relevant clinical data for individuals diagnosed with cervical cancer from the TCGA dataset, accessible via the URL (https://portal.gdc.com). The comparison of survival disparities among these cohorts was conducted utilizing the log-rank test. Moreover, to assess the prognostic efficacy of SOX2 mRNA, we employed the timeROC analysis (version 0.4) to measure and compare its predictive precision (Zhang et al. 2020; Lin et al. 2020).

Data on SOX2 gene expression and patient survival were retrieved from established databases (mention specific databases like PanCanSurvPlot (https://smuonco.shinyapps.io/PanCanSurvPlot/) and Kaplan–Meier plotter (https://kmplot.com/analysis/) relevant to CSCC). The datasets included information on patient demographics, clinical characteristics, treatment modalities, and follow-up outcomes (Lin et al.2022).

Survival analysis was conducted to determine the relationship between SOX2 expression levels and the survival outcomes of CSCC patients. The following metrics were evaluated: Overall Survival (OS), Progression-Free Survival (PFS), Disease-Specific Survival (DSS), Disease-Free Survival (DFS). Describe the statistical methods used to analyze the data. This might include Kaplan–Meier survival curves, log-rank tests for comparing survival distributions, and Cox proportional hazards regression to adjust for potential confounders and assess the independent effect of SOX2 expression on survival outcomes (Zhang et al. 2021).

Pan cancers analysis

We utilized the “exploration” module of the Tumor Immune Estimation Resource (TIMER, version 2.0) available at http://timer.cistrome.org/ to visually present the contrasting gene expression patterns of SOX2 across distinct tumor tissues and their respective normal counterparts (Han et al. 2023).

Differential genes expression and KEGG/GO analysis

We acquired RNA-sequencing expression profiles and corresponding clinical data for a cohort of 253 patients diagnosed with cervical cancer from the TCGA dataset, accessible via https://portal.gdc.com. Employing the limma package within the R software, we conducted an in-depth analysis of differentially expressed mRNAs. Our criteria for differential expression were set as “Adjusted P < 0.05 and Log2(Fold Change) > 1.5 or Log2(Fold Change) < − 1.5 (Han et al. 2023).”

To delve into the functional implications of potential targets, we conducted a comprehensive functional enrichment analysis. Gene Ontology (GO) emerged as a widely-utilized resource for annotating genes based on molecular function (MF), biological pathways (BP), and cellular components (CC). Furthermore, we harnessed the Kyoto Encyclopedia of Genes and Genomes (KEGG) Enrichment Analysis, a valuable tool for comprehending gene functions and pertinent high-level genome functional insights (Yu et al. 2012).

For a more profound comprehension of mRNA's involvement in carcinogenesis, we harnessed the ClusterProfiler package (version: 3.18.0) within the R environment. This enabled us to dissect the GO functions of potential targets and to enrich the KEGG pathways. To illustrate our findings effectively, we employed the ggplot2 package in R to craft box plots, while heatmap visualization was facilitated using the pheatmap package within the R software (Yu et al. 2012).

Iron death assay

We procured RNA-sequencing expression profiles along with corresponding clinical data for a cohort of 253 individuals afflicted with cervical cancer. These valuable datasets were retrieved from the TCGA dataset, accessible via the URL https://portal.gdc.com. In order to explore the context of ferroptosis, we referred to the work by Ze-Xian Liu et al. titled “Systematic analysis of the aberrances and functional implications of ferroptosis in cancer,” to identify genes associated with ferroptosis.

Venn analysis and KEGG/GO enrichment

Venn diagram analysis: we executed Venn analysis to pinpoint shared genes between the sets of differentially expressed genes and iron death-related proteins (Yu et al. 2012).

Pathway and functional enrichment: the next step encompassed conducting an enrichment analysis on the intersecting gene pool. This was done to unearth KEGG pathways and GO terms that are intertwined with both SOX2 and iron death. To carry out this analysis, we utilized Metascape, a comprehensive resource accessible at (Wei et al. 2022).

Immune checkpoint analysis

We obtained RNA-sequencing expression data and the relevant clinical information for cases of cervical cancer from the TCGA dataset, accessible through the link https://portal.gdc.com. To ensure the robustness of immune score evaluations, we employed a tool called immuneeconv. This R software package amalgamates six cutting-edge algorithms, namely TIMER, xCell, MCP-counter, CIBERSORT, EPIC, and quanTIseq. Each of these algorithms has undergone rigorous benchmarking and possesses distinctive strengths (Yi et al. 2020; Ravi et al. 2018).

Relation analysis

We obtained RNA-sequencing expression profiles along with relevant clinical data for cervical cancer cases from the TCGA dataset, accessible at https://portal.gdc.com. Employing the R software GSVA package, we conducted an analysis using the ‘ssgsea’ method as the chosen parameter. To examine the relationship between genes and pathway scores, we further evaluated the Spearman correlation (Hanzelmann et al. 2013; Wei et al. 2020).

Gene set enrichment analysis (GSEA)

GSEA was conducted on CSCC gene expression data retrieved from The Cancer Genome Atlas (TCGA). Data preprocessing included normalization and cleaning. Differential expression analysis segregated samples into distinct groups, utilizing tools like DESeq2 or edgeR. GSEA, executed via GSEA software, involved gene sets from MSigDB or custom CSCC-specific sets, with parameters meticulously defined. Analysis outcomes included Normalized Enrichment Score (NES), False Discovery Rate (FDR), and P-values, identifying significant gene sets (FDR < 0.25, P-value < 0.05). Biological implications of these sets were examined, with potential validation through independent datasets or experimental methods, offering insights into CSCC's molecular underpinnings (Kar et al. 2017).

Sensitivity in drug

We forecasted the anticipated chemotherapeutic response for each individual sample utilizing data from the most expansive accessible pharmacogenomics repository, the Genomics of Drug Sensitivity in Cancer (GDSC), accessible at https://www.cancerrxgene.org/. This prediction procedure was executed via the R package “pRRophetic”. By employing ridge regression, we approximated the half-maximal inhibitory concentration (IC50) for the samples. All parameters were maintained at their default settings (Jiang et al. 2021).

To counteract potential batch effects, we integrated the application of the "combat" technique, while for the various tissue types, we accounted for their influences. Furthermore, in handling duplicate gene expression records, we summarized them using the mean value (Geeleher et al. 2014).

Prognostic analysis of ferrozois-associated genes

We obtained RNA-sequencing expression profiles and corresponding clinical data for cervical cancer patients from the TCGA dataset (https://portal.gdc.com). The data was processed by converting count data to TPM (Transcripts Per Million) and normalizing it using log2(TPM + 1). We retained samples with associated clinical information, resulting in cervical cancer samples for subsequent analysis.

To assess survival differences among groups, we used the Log-rank test. The predictive accuracy of ferrozois-associated genes and risk score was evaluated using time ROC analysis (v 0.4).

For feature selection, we employed the Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm with tenfold cross-validation, utilizing the R package glmnet (Hanzelmann et al. 2013).

We constructed a prognostic model using Multivariate Cox Regression analysis with the R package survival. The model was optimized using a multi-factor Cox regression followed by stepwise iteration.

Kaplan–Meier curves, P-values, and hazard ratios (HR) with 95% confidence intervals were generated using Log-rank tests and univariate Cox proportional hazards regression (Wei et al. 2020).

Construction of a predictive model

We retrieved RNA-sequencing expression profiles alongside corresponding clinical data for cervical cancers from the TCGA dataset, available at https://portal.gdc.com. Our analysis comprised univariate and multivariate Cox regression procedures, aimed at pinpointing the pertinent factors for constructing a reliable nomogram. To visually present the significance of each variable, encompassing P values, hazard ratios (HR), and 95% confidence intervals (CI), we employed the 'forestplot' R package.

Subsequently, leveraging the outcomes of the multivariate Cox proportional hazards analysis, we crafted a predictive nomogram. This nomogram served as a graphical tool that synthesized the various factors, which can be employed to gauge the risk of X-year overall recurrence for an individual patient. The ‘rms’ R package facilitated the integration of these risk factors into the nomogram, with each factor assigned points corresponding to its impact (Liu et al. 2020; Jeong et al. 2020).

Prognostic model evaluation

Tumor data and clinical details for cervical squamous cell carcinoma were retrieved from the TCGA database (https://portal.gdc.com), specifically focusing on datasets in STAR format. Following acquisition, TPM-formatted data were extracted and underwent a normalization process using the transformation log2(TPM + 1). Only those samples with comprehensive RNAseq and clinical data were selected for further investigative steps. The refined set of cervical squamous cell carcinoma samples was utilized in downstream analyses. The performance of various prognostic models was evaluated by examining their Decision Curve Analysis (DCA) curves, utilizing the ggDCA package within the R software framework (Yang 2022).

All the above analysis methods and R package were implemented by R foundation for statistical computing (2020) version 4.0.3.

Comments (0)

No login
gif