A liquid biopsy signature of circulating extracellular vesicles-derived RNAs predicts response to first line chemotherapy in patients with metastatic colorectal cancer

Establishment and validation of the EV-derived RNAs signature for predicting ORR of first-line chemotherapy in mCRC

Totally 190 patients were finally included in the analysis and among which, 142 participants were recruited from Fudan University Shanghai Cancer Center (FUSCC), and 48 patients were recruited from Zhongshan hospital, Fudan University and Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine. Among the 142 patients treated at FUSCC, we randomly divided the patients treated with single chemotherapy (n = 120) into the training set and internal validation set proportionally (2:1), and those with target therapy are put into the internal validation set, so training cohort (n = 80) and internal validation cohort (n = 62). Patients treated in Zhongshan hospital and Xinhua hospital were included in external validation set, the samples in which were collected prospectively. The clinicopathological characteristics of the patients were shown in Supplement Table 1, indicating that the three cohort were balanced well in terms of gender, age, tumor site, pathological type, degree of differentiation, and the proportion of mutations in RAS or BRAF.

The mCRC patients were treated with FOLFOX (oxaliplatin 85 mg/m2, Leucovorin 400 mg/m2, and fluorouracil 400 mg/m2 intravenously on day 1, followed by a 46 h continuous infusion of fluorouracil 2400 mg/m2; repeated every two weeks) or XEOLX (oxaliplatin 130 mg/m2 intravenously on day 1; capecitabine 1000 mg / m2 p.o. bid from day 1 to day 14; repeated every three weeks) chemotherapy with or without anti-VEGF or anti-EGFR antibodies as first line treatment. The radiologic evaluations of tumor responses were performed every 2–3 cycles during systematic chemotherapy and every 2 months during the maintenance therapy. The tumor response was assessed according to RECIST 1.1 criteria. The follow-up in our study was aimed at PFS (defined as the time from treatment to first progression or death due to various causes) and OS (defined as the time from treatment to death).

We extracted the RNAs of isolated EVs from 190 patients and performed the RNA sequence referring to the methodological part of the previous study published by the same laboratory personnel [12]. About 58132 genes were finally detected in each sample, including mRNAs, non-coding RNAs and so on. We randomly selected eight genes and examined their expression in five randomized patients by real-time quantitative reverse transcription polymerase chain reaction (RT-PCR), and the results were in approximately 95% agreement with the sequencing data. We divided the patients in the training set into two subgroups according to the best of response (BOR) during the first-line chemotherapy, which were identified as response (complete response, CR and partial response, PR) group and non-response (stable disease, SD and progressive disease, PD) group. None of the clinicopathologic factors were significantly related to ORR, as shown in Supplement Table 2, indicating that the clinicopathologic factors could not predict ORR. Then we got 340 differentially expressed genes between the two groups. Figure 1A was the heat map of 340 different genes between the two groups in training set (p < 0.05). We then applied the random forest algorithm using R package ‘varSelRF’ (https://doi.org/10.1186/1471-2105-8-328) and the least absolute shrinkage and selection operator (LASSO) method for feature ranking to find and shrink the significant variables associated with BOR, and as shown in Fig. 1B-D, the green color represented the variables we may be probably interested. Finally, 22 markers (GOLPH3L, RPL21P1, ANKRD20A17P, CCL2, EPS8L1, CUBN, PODXL, ALG9, TLR5, CLK4, PRICKLE2, HMGB1P24, ME3, RPL13P12, MIB2, SLC5A6, NUDT19, ELMOD3, PGAM5, ERICH6-AS1, BACE1, MYL12BP1) were selected to establish a lasso-logistic model using R package ‘glmnet’ (https://doi.org/10.18637/jss.v033.i01) for predicting the response of first-line chemotherapy in mCRC patients. The 22-gene signature distinguished patients with respondence from patients with non-respondence, with an AUC of 0·986 (95%CI: 0·968 to 1·000) in the training set (Fig. 2A). Then we applied the signature to the internal validation set and the AUC was 0·821 (95%CI: 0·711 to 0·931) (Fig. 2B). To further confirm the prediction ability of the 22-gene signature on first-line chemotherapy in mCRC patients, we applied the signature to a prospective external validation set, and the AUC was 0·820 (95%CI: 0·702 to 0·937) (Fig. 2C). Supplement Table 3 demonstrates the percentage of ORR in patients with different risk score in varieties of cohorts. The ability of the 22-gene signature to predict the response to first-line chemotherapy in the training set was visually exhibited in the Fig. 2D and I, and patients with lower risk-score according to the model would more likely obtain a BOR of CR or PR (p < 0·05). The signature was applied in the internal validation set and external validation set and got similar results (Fig. 2E, J, F, K). Meanwhile, the signature was applied in patients treated with chemotherapy combined with anti-VEGF target therapy among the internal and external validation cohorts, and we found that the 22-gene signature could also well distinguish patients with different BOR (p < 0·05), as shown in Fig. 2H. Likewise, the signature was applied in patients treated with chemotherapy combined with anti-EGFR target therapy among the two validation cohorts, and we found that compared with patients with high-risk score, the ORR in the patients with low-risk score was higher but the difference between them was not statistically significant, which may be related to the fact that anti-EGFR antibodies can modify or reverse chemotherapy resistance (Supplement Table 3, Fig. 2G).

Fig. 1figure 1

Construction of the 22-gene signature. A Heatmap of 340 genes which were differentially expressed between different BOR (patients with CR + PR versus PD + SD) in the training cohort. Each column represents an individual sample, and each row represents a gene. P < 0.05. *The TPM values centered and scaled in the row direction by R package ‘pheatmap’. B The ranking and selection of BOR associated variates using the random forest algorithm. C, D The most robust predictive 22 genes were identified using LASSO-logistic algorithm

Fig. 2figure 2

Predictive ability of the 22-gene signature in different cohort. The ROC analyses illustrated the predictive ability of this 22-gene model to identify responders in the training set (A), internal validation set (B), and external validation set (C); The scores of enrolled patients in different cohorts which were calculated based on the 22-gene signature: the training cohort (D), the internal validation cohort (E), the external validation cohort (F), the cohort combined chemotherapy with anti-EGFR (G), the cohort combined chemotherapy with anti-VEGF (H). Patients with lower risk score would more likely obtain tumor remission in training group (I), internal validation group (J) and external validation group (K)

The 22-gene signature was identified as a predictor of PFS and OS in mCRC patients treated with first-line chemotherapy

We identified the prognostic value of the 22-gene signature both in training cohort and validation cohorts using R package ‘survival’(https://CRAN.R-project.org/package=survival). The patients of the training and validation cohorts were both divided into two subgroups, the low-risk score subgroup and the high-risk score subgroup, according to the cutoff value obtained from the Receiver operating characteristic (ROC) curve. Compared with patients with high-risk score, patients with low-risk score had longer OS and PFS in training cohort (p = 0·00016 and p < 0·0001 respectively), as shown in Supplement Fig. 1A-B, which was the same as the results in the whole validation cohort (Supplement Fig. 1C-D, p = 0·02 and p = 0·047 respectively).

GSEA analysis for the gene in the 22-gene signature

We observed the expression of the genes in the 22-gene model in the TCGA database, and the differentially expressed genes in the cancer versus the adjacent normal tissues were subjected to signaling pathway analysis through GSEA software (https://doi.org/10.1073/pnas.0506580102). Genes would be selected with adjustment P < 0·05 and a change greater than 1·5-fold after paired DEseq2 analysis (https://doi.org/10.1186/s13059-014-0550-8). The following genes of the 22-gene model met the requirements: CCL2 (high in adjacent normal tissues), PODXL (high in cancer), TLR5 (high in adjacent normal tissues), PRICKLE2 (high in adjacent normal tissues), SLC5A6 (high in cancer), PGAM5 (high in cancer). Fifty key HALLMARK pathways were analyzed, the upregulated pathways in patients with a higher gene expression were shown in Supplement Fig. 2. Down-regulations of CCL2 enriched the MYC associated signaling pathway, and up-regulation of PGAM5 or down-regulation of PODXL enriched the oxidative phosphorylation pathway. Besides, higher PRICKLE2 expression could enrich the inflammatory pathway and lower TLR5 expression always enriched the DNA repair pathway. The specific mechanism deserves further in-depth study, and the relevant molecules we screened in this study may also provide new targets for the exploration of the mechanism associated with chemo-resistance in order to further overcome drug resistance.

Establishment of an EV-derived 7-gene signature to predict the response of oxaliplatin contained first-line chemotherapy and irinotecan contained second-line chemotherapy in mCRC

To construct a model through which we can predict the response of both oxaliplatin contained chemotherapy and irinotecan contained chemotherapy, we selected patients with second-line treatment data on the use of irinotecan-containing regimens. The ORR was applied to assess the response of first-line chemotherapy while in the assessment of second-line chemotherapy, we selected the disease control rate (DCR) because of low ORR in second-line treatment. We screened the molecules related with good response to first-line oxaliplatin contained regimens and poor response to second-line irinotecan contained-regimens, and vice versa. We then applied the LASSO method for feature ranking (Supplement Fig. 3A-B) and finally established a 7-gene signature which containing CACNG6, PCSK5, CCL2, TBC1D24, NUBPL, PIK3R4, AL513550.1. The 7-gene signature predicted well in the response of oxaliplatin contained chemotherapy, as shown in Supplement Fig. 3E, with an AUC of 0·835 (95%CI: 0·737 to 0·932); moreover, the prediction of irinotecan-contained chemotherapy was exhibited in Supplement Fig. 3G, with an AUC of 0·793 (95%CI: 0·654 to 0·931). Each patient could obtain a risk score according to the 7-gene signature, and patients with lower risk score would more likely obtained tumor remission (CR or PR) in the first-line oxaliplatin contained chemotherapy, as shown in Supplement Fig. 3C,F, while in the second-line irinotecan contained chemotherapy, patients with lower risk score tend to have greater possibility to suffer PD, which was shown in Supplement Fig. 3D,H. Benefit from this, we may apply this model to the selection of first-line treatment options, helping us to choose the best one more accurately. The limitation is that the number of cases in this model is small and the irinotecan containing regimen is used as the second line treatment, so future validation is worthwhile in a larger sample and in patients treated with irinotecan containing regimens as first-line chemotherapy.

Comments (0)

No login
gif