In this paper, we proposed an automatic workflow for GBM survival prediction based on four pre-operative MR images. The VGG-Seg was proposed and trained using 105 glioma patients for automatically generating GBM contours from four MR images. The trained VGG-Seg was applied to 163 GBM patients to generate their autosegmented tumor contours for survival analysis. We extracted handcrafted and DL-based radiomic features from the MR images using the autosegmented contours for these patients. Two Cox regression models were trained using the extracted features to construct the handcrafted and DL-based signatures for survival prediction.
The handcrafted signature achieved a C-index of 0.64, while the DL-based signature achieved a C-index of 0.67. The DL-based signature achieved numerically higher AUCs, evaluated at the OS of 300 days and 450 days, than the handcrafted signature. Additionally, the DL-based signature, unlike the handcrafted signature, resulted in prognostically distinct groups using either X-tile generated or median threshold. Shboul et al. did not report the C-index but the accuracy of 0.52 in classifying GBM patients into three survival outcome groups12. However, DL-based radiomic features were not investigated in this study. It is also difficult to know whether significant patient stratification was achieved for the testing GBM patients in this study since log-rank tests were not conducted.
The VGG-Seg achieved accurate automatic GBM segmentation, with a mean Dice coefficient of 0.86 for the 163 GBM patients. A study showed that the mean Dice coefficient between the whole tumor contours drawn by two experts based on multi-modal MR images was 0.8627. Recently, many studies have proposed novel 3D CNN architectures for improving glioma segmentation accuracy28–30. The goal of this study is not to benchmark the best segmentation model but to develop an automatic workflow that can achieve accurate GBM survival prediction. Other automatic segmentation methods can be integrated into the proposed workflow but were not explored within the scope of this study. Potential future work includes selecting the best segmentation model and investigating whether more accurate autosegmented contours may result in a better survival prediction model.
We included 75 LGG patients for training the VGG-Seg because we found that the VGG-Seg trained with both 75 LGG patients and 30 GBM patients achieved better performance than the VGG-Seg trained with 30 GBM patients alone. This is expected as LGG and GBM have a similar appearance in MR images. The VGG-Seg could generate three tumor subregion labels. However, the accuracy of segmenting subregion labels using the VGG-Seg was low, with the mean Dice coefficients of the tumor subregions smaller than 0.75. Hence, we decided to use the whole tumor contours for feature extraction.
Our study has several limitations. First, the number of patients is limited so we only investigated the transfer learning method for survival prediction. A CNN trained from scratch for survival prediction could directly learn useful features from MR images. However, it could be easily overfitted and hence require more patient data to achieve robust performance. Other methods like training an autoencoder for feature extraction would also be valuable to explore. Second, the information provided by the MR images may be limited and not powerful enough for achieving more accurate models. Future work could be done to include genomic features and investigate whether the combination of genomic and radiomic features could improve prediction performance. Third, we did not consider the treatment status of patients due to data scarcity. Integrating treatment status may help achieve better prediction performance and is worthy of investigation in the future.
