Personalized prediction of immunotherapy response in lung cancer patients using advanced radiomics and deep learning

Patient cohort and image data

The Institutional Review Board of Taipei Veterans General Hospital (TVGH) approved this retrospective study (IRB No.: 2021-09-009BCF) and waived the requirement of informed consent from patients. This study collected data from 212 LC patients who received IO treatment between 2016 and 2020 with monthly follow-up assessments. The inclusion criteria are listed as follows: (1) histological diagnosis of LC based on surgical specimens or tissue biopsies; (2) staging assessment of LC according to the American Joint Committee on Cancer (AJCC) guidelines [20] based on imaging and pathological information; (3) no other neoplastic diseases; (4) complete clinical information, such as smoking status, Eastern Cooperative Oncology Group performance status (ECOG PS) score, applied IO drugs, EGFR mutation status; and (5) adequate quality of contrast-enhanced chest CT data. Finally, 206 LC patients were included in the subsequent analyses (Fig. 1). Disease progression was identified as tumor enlargement, new metastasis, and patient death based on monthly follow-up assessments (including vital signs, blood tests, and CT scans if needed), aligning with the primary objective of our model to forecast responses to IO. Discontinuities in the follow-up procedure were documented as instances of data censoring.

Fig. 1figure 1

Patient flowchart for this study

Among 206 included patients, pre-treatment CT image data were acquired using several different scanners produced by five major manufacturers, including GE, Hitachi, Siemens, Toshiba, and Philips (detailed information in Supplementary Table 1). We did not limit the use of specific manufacturers or models of CT scanners to ensure that the predictive model we developed could tolerate the variability introduced by different machines. The CT scan coverage was similar, ranging from the thoracic inlet to the upper abdomen. CT images were reconstructed with a slice thickness between 1.0 and 5.0 mm. Pixel spacing ranged from 0.5 to 1.2 mm. Each slice had a matrix size of 512 × 512 pixels with a 16-bit gray-scale resolution. The peak tube voltage was 120 kVp, and the tube current was decided by automatic dose modulation.

In order to delineate the location of the tumors for the radiomics analysis, experienced radiologists and certified pulmonologists collaborated to assess the quality and delineate the region of interest (ROI) in CT scans. For ROI delineation, two specific window settings were applied to the CT images. The soft tissue window (width: 350, level: 50) aided in distinguishing tumors, collapsed lungs, and fluid components like pleural and pericardial effusions. On the other hand, the lung window (width: 1500, level: -600) was used to determine tumor borders accurately. In the cases with multiple tumors, only the primary tumor was selected as the ROI.

Intratumoral radiomics

We applied several preprocessing steps to the CT images, including resolution adjustment to an isotropic pixel size of 1 × 1 × 1 mm³, followed by intensity standardization using Z-score transformation to achieve standardized ranges based on the mean and standard deviation of the image data. Additionally, low-pass (L) and high-pass (H) wavelet filters were applied to the three axes of CT images, resulting in eight decomposed image sets (named LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH images). Intratumoral radiomics analysis involved the calculation of histogram, geometry, and texture features, including gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM), and local binary pattern (LBP), from eight wavelet-filtered images and the original CT images. The feature extraction process followed the Imaging Biomarker Standardization Initiative (IBSI) guidelines [21]. In total, 593 intratumoral radiomic features were generated for each ROI (Fig. 2a). The image preprocessing, ROI delineation, and intratumoral radiomics analysis were performed using the Multimodal Radiomics Platform, a MATLAB-based graphic user interface (available online: http://cflu.lab.nycu.edu.tw/MRP_MLinglioma.html, accessed on 9 May 2024) [22, 23]. Supplementary Table S2 provides the formulas used for intratumoral radiomics analysis.

Fig. 2figure 2

Feature processing workflow for (a) intratumoral radiomics, (b) peritumoral-vasculature radiomics, and (c) clinical features

Peritumoral-vasculature radiomics

After the adjustment of image resolution to an isotropic voxel of 1 × 1 × 1 mm³, a pre-trained U-Net model based on 231 routine CT cases (acquired from different scanner manufacturers) was employed to automatically segment the bilateral lung areas on CT images [24]. Afterward, radiomic features of tumor-associated vasculature were extracted through a stepwise process. First, a multi-scale 3D vessel enhancement filter was utilized to enhance vessel-like structures [25]. Afterward, Otsu’s thresholding method was applied to the vessel enhancement image to extract voxels associated with the vasculature. Morphological operations were subsequently applied to eliminate noise and non-vessel artifacts. Finally, a cubic region, covering a 2.5-cm extension from the tumor volume along three directions, was extracted. A fast-marching algorithm was employed on the segmented vasculature to identify the center lines of vessels and partition the vessel network into distinct branches [26]. Ninety-one vasculature radiomic features were computed, comprising 61 quantitative vessel morphology features (e.g., curvature and torsion) and 30 features describing the organization of the peritumoral vasculature (Fig. 2b). The calculation of vasculature radiomic features is provided in Supplementary Table S3. The proportion of outliers (values located outside the three standard deviations from the mean value) was computed for each normalized radiomic feature, and features with outlier proportions exceeding 20% of the entire dataset were then removed from consideration. The peritumoral-vasculature radiomics analysis was also performed using the Multimodal Radiomics Platform (available online: http://cflu.lab.nycu.edu.tw/MRP_MLinglioma.html, accessed on 9 May 2024) with QuanTAV toolbox (available online: https://github.com/ccipd/QuanTAV, accessed on 9 May 2024).

Prediction models and statistics

In this study, the patient data were first divided into training (55% of cases), validation (15%), and testing (remaining 30%) sets using simple random sampling. Figure 3 shows the workflow to construct the prediction models for the IO response. To identify key radiomic and clinical features for predicting IO outcomes, a two-step feature selection was applied to the training dataset. In the first step, univariate Cox proportional regression for radiomic features and the chi-squared test for clinical features were used to quickly eliminate a large number of redundant features [27, 28], with only those having p < 0.05 advancing to the second step. The second step involved the implementation of a sequential forward selection (SFS) [29] algorithm to further consider feature interactions and eliminate highly correlated features.

Fig. 3figure 3

The workflow of proposed prediction models for immunotherapy response

DeepSurv, a deep learning model built on the Cox proportional hazards (CPH) framework, was employed to predict PFS after IO treatment. This model employed a multilayer perceptron network to embed nonlinear properties into the hazard function, improving the survival prediction [30, 31]. In this study, we used selected radiomic and clinical features as inputs of DeepSurv models. The model output, the estimated hazard function, was further utilized to predict personalized PFS curves for each patient. The architecture and parameter configuration of the proposed DeepSurv model are depicted in Supplementary Figure S1.

Three DeepSurv models were constructed, incorporating different sets of features: (1) clinical features only (e.g., age, sex, hematology, molecular status, metastatic information, and histopathology); (2) clinical and intratumoral radiomic features; and (3) clinical features, intratumoral radiomics, and peritumoral-vasculature radiomics. We further leveraged Shapley Additive Explanations (SHAP) to assess the contribution of individual feature to outcome prediction in DeepSurv models. This process involved sorting and visualizing the importance of each feature in shaping the model’s outcomes [32]. In this study, the hyper-parameters were determined through the grid search method [33]. The selection of hyper-parameters was mainly based on the prediction performance, i.e., concordance index (C-index), in the validation set,. Only when multiple models (with different combinations of hyper-parameters) presented the same C-index value would we select the one with the shortest training time. Supplementary Table S4 presents the selected hyper-parameters based on the grid search.

Patients will be stratified into high-risk and low-risk groups based on the PFS probability predicted by the DeepSurv model at the median PFS time point (3.3 months). Patients with a predicted PFS probability higher than 50% at 3.3 months will be classified as the low-risk group, while those with a predicted PFS probability of 50% or lower will be classified as the high-risk group. A log-rank test was employed to evaluate the statistical differences in Kaplan-Meier (KM) survival curves between the high-risk and low-risk groups. The DeepSurv models were also used to predict progression status at four follow-up time points (i.e., 1, 2, 3, and 6 months after the IO treatment). The performance of DeepSurv models, including time-dependent receiver operating characteristic (ROC) curves, area under the ROC curve (AUC), C-index, sensitivity, and specificity, was evaluated using the testing dataset (remaining 30% of patients). To calculate the C-index, consider all possible pairs of subjects in the dataset. For each pair, determine if the subject with the shorter actual survival time also has a shorter predicted survival time. The C-index is the ratio of concordant pairs to the total number of evaluable pairs [34]:

To compare performance among three DeepSurv models, we utilized a bootstrap random sampling technique 100 times, followed by the paired t-test of AUC values. Feature selection and training of the DeepSurv models were conducted using the R DeepSurv package (available online: https://rdrr.io/cran/survivalmodels/src/R/deepsurv.R, accessed on 6 May 2023).

Comments (0)

No login
gif