First, gene expression data and clinical information were obtained from The Pan-cancer Atlas of TCGA [14], which contains 10,071 samples divided into 33 cancer types. Second, gene expression data of nuclear receptor genes from the 10,071 samples were clustered according to cancer types. Hierarchical clustering was then applied using a clustering distance threshold of 60. Lastly, we identified cancer types whose nuclear receptor expression correlated with mutation, subtypes, and prognosis (Fig. 1A).
Fig. 1Flowchart for clustering based on nuclear receptor expression and overview of nuclear receptor expression patterns. A Data on nuclear receptor expression from 10,071 samples were analyzed using hierarchical clustering of each cancer type via the Euclidian distance metric and Ward’s method. The clusters were classified based on a clustering distance of 60. Accordingly, 21 cancer types were classified into two or more clusters. Based on the clustering according to 48 nuclear receptors, we identified cancer types whose nuclear receptor expression correlated with mutations, cancer subtypes, and prognosis. B A heatmap was generated based on the nuclear receptor expression of all samples in TCGA. The UMAP dimensionality reduction method was employed to examine C similarities among nuclear receptor expression patterns and D similarities among samples. Nuclear receptors are color-coded according to their superfamily, whereas cancers are classified according to cancer type
RNA-seq data of nuclear receptor genes from the 10,071 samples were analyzed using hierarchical clustering (Fig. 1B) and visualized as Uniform Manifold Approximation and Projection (UMAP) plots (Fig. 1C, D). Expression pattern of each nuclear receptor among the whole samples seemed to be unique (Fig. 1C). However, a characteristic nuclear receptor expression pattern was observed for each organ, such as the brain (brain lower grade glioma and glioblastoma multiforme), thyroid gland (thyroid carcinoma), testis (testicular germ cell tumor), prostate (prostate adenocarcinoma), and kidneys (kidney renal clear cell carcinoma and kidney renal papillary cell carcinoma). In addition, several clusters were divided into groups of organs with similar function and development, such as digestive organs (colorectal adenocarcinoma, esophageal adenocarcinoma, pancreatic adenocarcinoma, and gastric adenocarcinoma), the liver (cholangiocarcinoma and liver hepatocellular carcinoma), and the adrenal cortex (adrenocortical carcinoma, pheochromocytoma, and paraganglioma). Finally, the largest cluster comprised squamous cell carcinoma (cervical squamous cell carcinoma, esophageal squamous cell carcinoma, head and neck squamous cell carcinoma, and lung squamous cell carcinoma), including the basal-like subtype of breast invasive carcinoma, bladder urothelial carcinoma, and lung adenocarcinoma (Fig. 1D).
3.2 Validation of the clustering strategy using breast invasive carcinomaTo confirm whether the molecular subtypes of breast invasive carcinoma were appropriately classified based on ER and PGR expression [6], three key aspects, namely genetic mutation, cancer subtypes, and prognosis, were investigated. First, hierarchical clustering was performed, which resulted in five clusters, tentatively called Clusters 1, 2, 3, 4, and 5, divided from left to right. Clusters 1–3 showed high ESR1 and PGR expression, whereas Clusters 4 and 5 showed low ESR1 and PGR expression. Cluster 4 showed a higher AR expression than did Cluster 5 (Fig. 2A). Clusters 1–3, which exhibited high ESR1 expression, demonstrated a lower frequency of mutations in tumor protein p53 (TP53) than did Clusters 4 and 5. Moreover, Cluster 4, which exhibited higher AR expression than did Cluster 5, displayed a higher frequency of mutations in erb-b2 receptor tyrosine kinase 2 (ERBB2) than did Cluster 5 (Fig. 2A). Subtype information assigned by TCGA indicated that Clusters 1–3 mainly comprised Luminal A and B subtypes, Cluster 4 mainly comprised HER2-positive subtypes, and Cluster 5 mainly comprised basal-like subtypes (Fig. 2B). Overall survival analysis showed that Clusters 1 and 2, which mainly comprised luminal subtypes, exhibited a better prognosis (Fig. 2C). The classification based on the 48 nuclear receptors in breast invasive carcinoma was consistent with that based on ESR1 and PGR expression, suggesting that the HER2-positive subtype and basal-like subtype could also be classified based on AR expression.
Fig. 2Classification of breast invasive carcinoma samples according to nuclear receptor expression and the association between nuclear receptor expression and genetic mutations, subtypes, and prognosis. Hierarchical clustering was performed based on nuclear receptor expression of 1,082 breast invasive carcinoma samples using the Manhattan distance metric and Ward’s method. A A heatmap that illustrated the expression of nuclear receptors was created, with color labels demonstrating the mutation pattern in breast invasive carcinoma samples. B The proportion of subtypes within each cluster is presented. C Kaplan–Meier curves for each cluster are presented. The log-rank test was used to calculate p values
3.3 Relationship between nuclear receptor expression and genetic mutation patternsWe then determined the relationship between nuclear receptor expression and genetic mutation profiles. In head and neck squamous cell carcinoma, Cluster 2, which demonstrated a high nuclear receptor subfamily 5 group A member 1 (NR5A1) expression (Fig. 3A), exhibited a higher frequency of truncating mutations in nuclear receptor binding SET domain protein 1 (NSD1) than did the other clusters (p < 10−10; Fig. 3C). In lung adenocarcinoma, Cluster 1, which demonstrated high nuclear receptor subfamily 0 group B member 1 (NR0B1) expression, exhibited a higher frequency of missense and truncating mutations in Kelch-like ECH associated protein 1 (KEAP1) than did the other clusters (p < 10−10). In addition, Cluster 3, which demonstrated high nuclear receptor subfamily 0 group B member 2 (NR0B2) expression, exhibited a lower frequency of missense, truncating, and splicing mutations in TP53 than did the other clusters (p = 4.32 × 10−10; Fig. 3B, D). In lung squamous cell carcinoma, Cluster 1, which demonstrated high NR0B1 expression, exhibited a higher frequency of missense mutations in KEAP1 and NFE2-like bZIP transcription factor 2 (NFE2L2) than did the other clusters (p < 10−10, p = 3.06 × 10−10; Figures S1A, C). The NF-E2-related factor (NRF2) protein, encoded by NFE2L2, is subject to ubiquitination by the CUL3–KEAP1 ubiquitin E3 ligase complex and subsequent degradation in proteasomes [27]. Therefore, Cluster 1 in both lung adenocarcinoma and lung squamous cell carcinoma had mutated genes in the same pathway and had elevated expression of the same nuclear receptors [28]. In thyroid carcinoma, Cluster 1, which demonstrated low retinoid X receptor gamma (RXRG) expression, exhibited a lower frequency of mutations in Braf proto-oncogene serine/threonine kinase (BRAF) than did the other clusters (p < 10−10; Figures S1B, D).
Fig. 3Distinctive mutation patterns associated with nuclear receptor expression. Hierarchical clustering was performed based on nuclear receptor expression of each cancer type. A heatmap illustrating the expression of nuclear receptors, with color labels demonstrating the mutation pattern in (A) head and neck squamous cell carcinoma and B lung adenocarcinoma. The frequency of mutations among clusters is illustrated in bar graphs based on the percentage of samples containing mutations per cluster. The graphs show the proportion of mutations in (C) head and neck squamous cell carcinoma and D lung adenocarcinoma
3.4 Relationship between nuclear receptor expression and histological typeNext, we determined the relationship between nuclear receptor expression and classical molecular subtypes. In cervical squamous cell carcinoma, Cluster 1, which demonstrated high hepatocyte nuclear factor 4 alpha (HNF4A) expression, included few cervical squamous cell carcinoma (p < 10−10; Fig. 4A–C). In brain lower grade glioma, isocitrate dehydrogenase (IDH) wild-type was predominant in Cluster 1, which showed high nuclear receptor subfamily 2 group E member 1 (NR2E1) expression and low NR0B1 expression (p < 10−10; Fig. 4D–F). In head and neck squamous cell carcinoma, Cluster 2 demonstrated high NR5A1 expression and was mainly categorized as C32.9 based on the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) classification, indicating that it included laryngeal cancer (p < 10−10; Figures S2A–C). In sarcoma, Cluster 3, which demonstrated high ESR1, PGR, and AR expression, contained leiomyosarcoma (p < 10−10; Figures S2D–F). In testicular germ cell tumor, Cluster 2, which showed low hepatocyte nuclear factor 4 gamma (HNF4G) expression and high nuclear receptor subfamily 6 group A member 1 expression, represented seminoma (p value < 10−10) (Figures S2G, 2H, 2I).
Fig. 4Distinct subtypes associated with nuclear receptor expression. Hierarchical clustering was performed based on the nuclear receptor expression of each cancer type. Dendrograms are presented for each cancer type, with color labels indicating characteristically different subtypes within each cancer type. A Cervical squamous cell carcinoma is indicated based on tumor type. B The bar graph for the proportion of subtypes within each cluster in cervical squamous cell carcinoma. C The violin plots of the cluster-specific expression levels of nuclear receptors in cervical squamous cell carcinoma. D Brain lower grade glioma is indicated based on subtype. E The bar graph for the proportion of subtypes within each cluster for brain lower grade glioma. F The violin plots illustrate the cluster-specific expression levels of nuclear receptors in brain lower grade glioma
3.5 Relationship between nuclear receptor expression and prognosisAs subtyping based on nuclear receptor expression is associated with prognosis in breast cancer, we sought to determine the relationship between nuclear receptor expression and prognosis in pan-cancer. Cancer types with a p value of < 0.05 were defined as those with a significant difference in prognosis. In brain lower grade glioma, Cluster 1 with high NR2E1 expression exhibited a relatively poor prognosis (Fig. 5A, E). In lung adenocarcinoma, Cluster 1 with high NR0B1 expression exhibited a relatively poor prognosis. Conversely, Cluster 3 with high NR0B2 expression exhibited a relatively good prognosis (Fig. 5B, F). In skin cutaneous melanoma, Cluster 4, which displayed low NR0B1 expression and high RXRG and RAR-related orphan receptor C expression, exhibited a relatively good prognosis (Fig. 5C, G). In uterine corpus endometrial carcinoma, Cluster 1 with high AR, ESR1, and PGR expression had a relatively good prognosis (Fig. 5D, H). In kidney renal clear cell carcinoma, Cluster 2 with low HNF4A and NR0B2 expression displayed a relatively poor prognosis (Figures S3A, E). In kidney renal papillary cell carcinoma, Cluster 1 with low HNF4A expression displayed a relatively poor prognosis (Figures S3B, F). In liver hepatocellular carcinoma, Cluster 1 with high AR, nuclear receptor subfamily 1 group I member 2, and nuclear receptor subfamily 1 group I member 3 expression had a relatively good prognosis (Figures S3C, G). In stomach adenocarcinoma, Cluster 1 with no HNF4A and HNF4G expression displayed a relatively poor prognosis (Figures S3D, H). Our survival analyses revealed significant differences in overall survival among the nine cancer types, including breast invasive carcinoma (Fig. 2).
Fig. 5Relationship between nuclear receptor expression-defined clusters and prognosis. A survival analysis was conducted for each cancer type. Notably, significant differences in overall survival rates were observed between nine cancer types, including breast cancer. Four of these cancer types are presented below. Kaplan–Meier curves are displayed for A brain lower grade glioma, B lung adenocarcinoma, C skin cutaneous melanoma, and D uterine corpus endometrial carcinoma. The violin plots illustrate the cluster-specific expression levels of nuclear receptors, exhibiting notable disparities in expression among clusters for E brain lower grade glioma, F lung adenocarcinoma, G skin cutaneous melanoma, and H uterine corpus endometrial carcinoma
3.6 Verification of significance of nuclear receptor-based classificationTo show the importance and significance of classification by nuclear receptors, a random sampling test was performed using TCGA dataset to evaluate the usefulness of the nuclear receptor classification. First, 48 genes were randomly selected from 20,511 genes in TCGA dataset. We then conducted hierarchical clustering analysis using the 48 randomly selected genes to identify the number of cancer types segregated into multiple clusters. Additionally, we calculated the number of cancer types for which significant differences in overall survival were observed. This trial was repeated 100,000 times. Following 100,000 trials, an average of 16.0 cancer types were classified into multiple clusters (Fig. 6A), and an average of 5.41 cancer types demonstrated significant differences in overall survival (Fig. 6B). The classification based on nuclear receptor expression revealed that 21 cancer types were classified into multiple clusters, with significant differences in overall survival having been observed in 9 cancer types. Thus, compared to the classification using 48 randomly selected genes, the nuclear receptor-based classification was significantly superior as demonstrated by both analyses.
Fig. 6Comparison between nuclear receptor gene sets and random gene sets. A total of 48 genes were randomly selected from the 20,511 genes whose gene expression information was registered in TCGA dataset for hierarchical clustering. A A bar graph with the vertical axis showing the number of trials and the horizontal axis showing the number of cancers that could be classified into two or more clusters in one trial. B A bar graph with the vertical axis showing the number of trials and the horizontal axis showing the number of cancers with a p value of < 0.05 on survival analysis per trial
Comments (0)