Differentiation between cerebral alveolar echinococcosis and brain metastases with radiomics combined machine learning approach

Existing literature on CAE has primarily consisted of case reports [30], lacking comprehensive studies. In this research, we conducted the most extensive systematic study to date on CAE, including 30 cases, with a significant sample size: Treatment strategies differ for CAE and BM. Radiation therapy, long-term antiparasitic medication, and surgical resection is typically performed for CAE. Conversely, BM are frequently treated using a multidisciplinary strategy that, depending on the initial tumor origin, may include radiation therapy, systemic chemotherapy, targeted medicines, or surgical resection. BM is commonly associated with advanced stages of cancer and typically carries a poor prognosis. In contrast, CAE progresses slowly and chronically, its prognosis can be greatly enhanced by prompt diagnosis and treatment. Accurate diagnoses are crucial as they can help avoid unnecessary interventions, particularly for CAE, where performing a biopsy poses a risk of parasite spillage and dissemination within the brain. However, CAE and BM share similar symptoms and imaging presentations, including neurological symptoms like seizures, headaches, focal neurological deficits, and mental disorders. Both diseases can manifest as multiple solid enhancing masses with surrounding edema on imaging examinations, posing challenges for physicians and radiologists in achieving accurate diagnoses [23,24,25]. Therefore, it is of great importance to accurate diagnose CAE and BM.

In this study, we aimed to develop a precise and reproducible classifier to differentiate between patients with BM and those with CAE using a wide range of radiomics features and machine learning methods. Specifically, we built five different machine learning models to accurately distinguish between CAE and BM based on conventional contrast-enhanced T1WI images. Among the models, the KNN classifier demonstrated the highest performance, with an AUC value of 0.97. It achieved a precision of 0.70, accuracy of 0.86, sensitivity of 1.0, and specificity of 0.78. On the other hand, the logistic regression algorithm displayed the lowest performance, with an AUC of 0.87, precision of 0.55, accuracy of 0.71, sensitivity of 0.86, and specificity of 0.64.

Radiomics aims to extract high-throughput quantitative image features from radiographic images and train a prediction model [31]. Since its first introduction by Philippe Lambin in 2012, radiomics has demonstrated considerable promise in developing models that can distinguish different types of tumors based on the numerous image features extracted from MRI that represent tumor heterogeneity [13, 15, 32, 33]. Radiomics combined with a machine learning approach has been widely studied in recent years. In our research, 9 valuable features were selected, which include 2 features based on log-sigma transformed images, 3 first-order features, 2 GLSZM features, 1 GLDM feature and 1 wavelet HLL feature.

As a representation of the local image structures at multiple scales, Log-sigma transformed features enable the analysis and description of complex structures, edges, and textures. The log-sigma transformation convolves the image with a sequence of Gaussian filters at various standard deviation (sigma) values to improve edges, boundaries, and other important image properties. In our research 2 valuable features were log-sigma features [34].

First-order features usually describe basic statistical or histogram-based characteristics of the data distribution, such as mean, median, standard deviation, range, skewness, kurtosis, or other statistical moments. To investigate whether CT-based texture analysis could early predict tumor recurrence from radiation-induced lung injury, Mattonen SA et al. [35] conducted a study, results showed that first-order features (energy, and entropy) achieved AUCs of 0.79–0.81 using a linear classifier. On two-fold cross validation, first-order texture reached 73% accuracy, which is similar to our research.

Spatial relationship and distribution of gray-level intensity patterns are characterized by gray-level size-zone matrix (GLSZM) features. To investigate whether peritumoral edema heterogeneity could predict glioblastoma recurrence, Long H et al. [36] have conducted MRI-based radiomics research, the results showed two GLSZM features (small area emphasis and low gray level emphasis) are among the valuable features could predict glioblastoma recurrence, which is in line with our study.

The number of patterns made up of linked voxels with comparable intensities is counted using the Gray Level Dependence Matrix (GLDM). Higher values in the dependence variance of GLDM indicate more diverse patterns in an image. In their study Peng S et al. [37] to predict neoadjuvant therapy response in breast cancer based on multi-phase contrast enhance MRI, results showed GLDM features in phase 1, 3 and 4 were valuable predictors, which is similar with our findings.

Using a series of wavelet functions that transition from higher frequency wavelets to lower frequency ones, wavelet decomposition divides up image data. The high-pass filter captures the more subtle information that is approximated by the higher frequency wavelet function, while the low-pass filter captures the remaining information that can be further deconstructed using lower frequency wavelet functions. Many researchers have found the importance of wavelet-HLL features in radiomics studies, one wavelet-HLL feature showed value in our study.

The use of radiomics-based machine learning for the diagnosis of CAE and brain metastases has several advantages over traditional methods. First, radiomics-based machine learning can provide more accurate and reliable results than traditional methods. This is because radiomics-based machine learning can extract more detailed information from medical images than traditional methods. Additionally, radiomics-based machine learning can be used to detect subtle differences between CAE and BM that may not be visible to the naked eye. Cerebral alveolar echinococcosis is a rare parasitic disease, but it is still a severe public health issue in many parts of the world. We believe that radiomics-based machine learning is a novel tool to investigate this disease, which have been proved as a powerful approach in other fields [38,39,40,41].

Due to rarity and limited data for CE, in our research we have utilized nested cross validation—when the dataset is small and there are numerous hyperparameters to adjust for the model, it is extremely helpful [42]. Nested cross-validation’s generalization ability can be deemed beneficial for a number of reasons. First off, by giving more accurate predictions of the model’s performance, it helps to reduce the risk of overfitting. The outer loop offers an objective assessment of how well the model will function on unobserved data by splitting the data into an outer and inner loop. The model is adjusted for better generalization rather than overfitting to the training data using the inner loop, which is used for hyperparameter adjustment. Secondly, the use of cross-validation helps to reduce the dependency of the performance estimate on a particular train-test split. By repeating the process multiple times, with different splits of the data, the variability in the performance estimate can be assessed. This helps to capture the model’s ability to perform well on unseen data from different perspectives, enhancing its generalization capability. Nested cross-validation also makes the model selection process more reliable. It makes it possible to compare various models or hyperparameter combinations objectively and choose the one that performs the best. This selection procedure aids in finding models that are effective on training data as well as those that generalize well to fresh, unexplored data [28, 43].

For the selection of biomarkers in high-dimensional data, the variable selection compression estimation method- LASSO has been widely used [44]. By developing a penalty function, it builds a more refined model by compressing certain coefficients while leaving others at zero. In this method, feature screening (dimension reduction) and over-fitting are both avoided during model training. In our study LR and KNN showed the best performances in training and testing sets, which is similar with previous studies [45]. These features allowed the LASSO regression model and LR, KNN classifiers to work together flawlessly in the radiomics investigation. Additionally, the LASSO algorithm chose the observed radiomics features from a variety of filters and feature classes, which shows that multiple feature categories may provide complimentary information in differentiating between the CAE and BM. Even though the biological activity underlying these radiomics features is not yet known, we hypothesize that they may be able to capture the fine radiomics qualities of microstructure and the tumor’s immediate surroundings.

Finally, radiomics combined machine learning approach has the potential to revolutionize the way we diagnose and differentiate between cerebral alveolar echinococcosis and brain metastases. Radiomics is a branch of medical imaging that uses advanced algorithms to extract quantitative features from medical images. These features can then be used to create predictive models that can accurately differentiate between CAE and brain metastases.

Comments (0)

No login
gif