Predictive radiomics based ensemble machine learning approach in CT lung nodule diagnosis

The motive of this research project is to determine whether nodules in CT lung scans are benign or malignant by examining the effectiveness of geometrical and 2D radiomics, FS techniques and ML algorithms. The whole study framework comprises of various phases, viz. Collection of dataset, feature extraction, feature selection, classification and assessment of various classifiers.

Dataset

Any diagnostic system relies substantially on database. This study makes use of CT images from LiDC, which has a collection of 1018 CT LC patient scans. This LIDC database contains both CT images and ground truth reports from four experienced radiologists. McNitt-Gray et al., [22,23,24] provide a detailed description of the existence of malignancy in nodules ≤ 3 cm and radiologists' remarks. The total count of the slices per CT used in this investigation ranged between 110 to 388. A total of 1207 CT scan slices were considered, including 883 malignant and 324 benign. Figure 1 shows sample CT images from the LIDC dataset that contain malignant ROIs.

Fig. 1

LIDC dataset sample images with a malignant ROIs

Feature extraction

The aforementioned dataset was used to extract features. Statistical approaches are used to extract radiomics (geometrical, texture and wavelet). Here's a synopsis of these features.

Geometrical features

Disparate geometrical attributes play a significant part in the classification procedure. These attributes are imperative since they have a direct bearing on the diagnosis and prognosis of cancer [25]. The 7 attributes viz. Minor axis Length, Major axisLength, Mean-Intensity, Max-Intensity, Min-Intensity Area and Perimeter are computed. Table 1 presents a list of these features.

Table 1 List of geometrical and texture features extracted using GLCM, GLDM and GLRLM [26]Halarick’s Texture features

Texture analysis facilitates the interpretation of the heterogeneity of tissues, a trait thought to affect how well cancer diagnosis succeeds [27]. It does this by capturing the spatial distribution of intensities [28]. Texture features are investigated using three techniques: GLCM, GLRLM and Gray Level Difference Method (GLDM) [6, 11, 12, 26, 29]. Second and high-order statistics are extracted based on inter-pixel distance 'd' and angle 'θ'. 22 texture features are calculated with GLCM: Auto-correlation (ACOR), Correlation2 (COR2), Correlation1 (COR1), Dissimilarity (DS), Cluster Prominence (CP), Energy (ENR), Entropy (ENT), Cluster Shade (CS), Maximum Probability (MP), Homogeneity1 (HMG1), Homogeneity2 (HMG2), Sum Average (SA), Contrast (CON), Information Measure of Correlation2 (IMC2), Information Measure of Correlation1 (IMC1), Inverse Difference Moment (IDM), Difference Variance (DV), Sum of Squares: Variance (SOS), Sum Variance (SV), Difference Entropy (DENT), Sum Entropy (SENT), Inverse Difference Moment Normalised (IDMN). Refer to Table 1.

GLDM calculates 5 features: Entropy (ENT), Contrast (CON), Mean (M), Inverse Difference Moment (IDM) and Angular Second Moment (ASM). Refer to Table 1.

In addition, 11 features are computed using GLRLM: Short Run Low Gray-Level Emphasis (SGLGE), Short Run Emphasis (SRE), Short Run High Gray-Level Emphasis (SRHGE), Gray Level Non-uniformity (GLN), Run Length Non-uniformity (RLN), Run Percentage (RP), Low Gray-Level Run Emphasis (LGRE), High Gray-Level Run Emphasis (HGRE), Long Run Emphasis (LRE), Long Run Low Gray-Level Emphasis (LRLGE) and Long Run High Gray-Level Emphasis (LRHGE). See Table 1.

WPT—Texture features

2-level Wavelet Packet Transform (WPT) [30, 31] is used to create multi-scale interpretations of the actual image. Utilizing wavelet-based features offers a multitude of benefits, including their exceptional ability to capture even the most intricate details, while remaining resilient to noise and variability in imaging conditions. Although there are many well performing wavelets available, the type of wavelet employed is application dependent. The orthogonal wavelets of compact support that Daubechies [32] developed were the main subject of this work. These wavelets can have a significant impact on how well texture analysis and classification work as the filter raises the standard of the identifiers [2, 33]. To implement WPT, the Daubechies wavelet family: db1, db2, and db3 was utilized. This 2-level WPT produces 16 multi scaled images given one image. These images are used to compute the attributes listed above (Section "Halarick’s Texture features"). These classes are therefore denoted as WPT-GLCM, WPT-GLDM, and WPT-GLRLM. Table 2 provides a list of feature classes and the total number of features extracted for each class. The list of extracted attributes is presented in Table 1.

Table 2 List of features per classFeature selection

FS or reduction aims to extract only the most beneficial attributes and eliminate noisy, unwarranted, and duplicitous attributes in order to benefit ML models [34]. This FS is the most crucial step in getting ready for ML model training. This is done as a way to avoid over-fitting, which may boost the accuracy of model forecasts and its capacity to generalize while also aiding in the construction of a strong radiomic signature [20]. The routinely used FS methods are put into three methodological classes: filter, wrapper and embedded methods. In present work two embedded FS methods (EFS) were employed viz. Boosted Classification Ensemble Tree (BOCET) and Bagged Classification Ensemble Tree (BACET).

The EFS stands out as a powerful technique that strikes a perfect balance between filters and wrappers. By handpicking features that surface during the learning process and aligning them with the classifier's evaluation criteria, this method drastically reduces computational expenses compared to wrappers. EFS entail integrating the process of FS directly into the model training process and help to reduce the time needed to reclassify subsets [35]. In order to attain optimal classification accuracy, the classifier modifies its internal parameters and determines the right weights or priorities assigned to each feature throughout the training process. One example of EFS techniques includes algorithms based on decision tree like gradient boosting, random forest and decision tree. Another effective EFS method is FS using regularization models such as LASSO. When used with linear classifiers like logistic regression and SVM, regularization algorithms typically work by penalizing the coefficient of features that do not substantially enhance the model performance [36]. It is worth noting that the previously stated regularization and decision tree based techniques yield a ranked list of characteristics. A tree ensemble is a bagging algorithm that combines a set number of decision trees. The tree based tactics naturally rank by how well they improve the node purity or reduce impurity (Gini impurity) across all trees. The nodes at the beginning of the trees have a greater decrease in impurity, while the nodes at the end of the trees have the least amount of impurity decrease. Thus, by pruning trees below a specific node, we can produce a subset of the most essential attributes.

Two different types of ensemble learning methods, bagging and boosting, are used for EFS. BOCET [37] and BACET [38] both approaches dynamically detect and prioritize pertinent features while constructing intricate models. Illustration of BACET and BOCET is given in Fig. 2(a) and (b). The algorithms for both are presented in Algorithm 1 and Algorithm 2.

Fig. 2

Algorithm 1 Pseudocode for the boosting algorithm AdaBoost

Algorithm 2 Pseudocode for the bagging algorithm

Categorization and performance evaluationClassification

Using EFS algorithms, when discriminative radiomics are obtained, the lung nodules can be classified into two classes (benign and malignant) using a number of cutting-edge ML classifiers, including SVM [39,40,41], decision trees (DT) [42], Ensemble Trees [43, 44], and Ensemble Subspace [45, 46].

SVM: According to research, SVM's relative simplicity and adaptability in implementation have rendered it as a powerful and effective ML strategy in the field of biomedical image analysis [40]. SVM can handle non-linearly independent data by using a kernel function to transform the input attributes into a higher-dimensional space. The various kernel functions that are utilized are sigmoid, polynomial, linear and Radial Basis Function (RBF, often known as Gaussian).

DT: A tree structure is used in the classification process using the DT algorithm in machine learning. The dataset, the root node, is broken down into nodes using this approach. Every internal node represents a feature, branches stand for rules, and leaf nodes represent classification and decision-making. One can apply this strategy to both qualitative and numerical data.

Ensemble Tree: An ensemble tree is a supervised ML technique that comprises of an ensemble of independently trained DT, which are base learners and might not work well when used singly. A new strong model is created by aggregating the base learners, and this model is often more accurate than the previous ones. Bagging, boosting and RUSboosting are the three types of ensemble tree ML methods used. Bagging creates an assortment of bootstrapped data (bags) from the original training dataset. Each bag contains N observations picked at random from the original dataset with replacement. Thus, a bag consists of around 63% of different samples, with the remainder being duplicates [47]. A DT is then trained with each bag, and the outcomes are aggregated via voting by majority. Boosting, an ensemble modeling technique, attempts to construct a robust classifier out of a sequence of weak classifiers. Using the training data, first model is constructed. Next, in an effort to rectify the errors in the first model, a second one is constructed. The aforementioned procedure is iterated until the maximum number of models is added or the full training data set is accurately predicted. An ML approach called RUSboost is used to enhance the performance of models that have been trained on skewed data. Until a desirable class distribution is reached, it uses a technique called random undersampling (RUS), which randomly removes examples from the majority class.

Ensemble Random Subspace: In order to reap the benefits, the random subspace (RS) ensemble classifier applies a random portion of features over the combined set of basis classifiers (KNN and Discriminant).The classifier randomly selects a portion of features from the actual dataset and uses them to train some number of weak classifiers. The predicted outputs of these weak classifiers are combined employing a majority voting combination rule to obtain the final target class labels. The non-parametric K-Nearest Neighbor algorithm (KNN) is an instance based classification method. The K training instances that are closest in the feature space make up the input. The input comprises the K closest training examples in the feature space and the output denotes the class membership. A majority vote of neighbors’ establishes the classification. The class is the single closest neighbor if K = 1 [48]. Linear Discriminant Analysis (LDA), a supervised learning method, utilizes class labels, making it ideal for class separation. It employs both within class and between class scatter matrices. When there are two classes, LDA creates a hyperplane and projects the data onto it to increase the distance between the two. This hyperplane is formed by maximizing the difference across the means of two categories and minimizing the variance within each and every category. LDA is commonly used in medical-computer interfaces due to its high accuracy [49].

Performance evaluation

Actual and predicted outcomes from prediction models for LC classification are displayed in a confusion matrix. The confusion matrix for LC classification with two outcomes is given in Table 3. True-positive (tp) refers to the number of malignant nodules rightly classified as LC. False-positive (fp) refers to the number of benign nodules diagnosed as LC. False-negative (fn) refers to the number of malignant nodules erroneously classified as benign. True-negative (tn) refers to the number of benign nodules classified correctly. The performance evaluation criteria employed in this study include Area under Curve (AUC), Accuracy, Precision, Sensitivity/recall, Specificity, and F1-score.

Table 3 Confusion matrix for LC prediction

The Area under Curve (AUC) tells how well a model performs.

Accuracy refers to the model's capacity to accurately predict outcomes relative to the entire number of outcomes, as either malignant or benign.

Precision refers to the models’ ability to predict the quality of positive prediction.

Sensitivity/Recall is used to compute the number of true positives (tp). In the medical field, sensitivity is prioritized over precision as the goal is to identify all true positive cases [50].

Specificity refers to the ability of the model in predicting true negatives (tn).

$$Specificity=\frac$$

(4)

F1-score combines precision and recall into a single measure in a balanced manner.

$$F1-score=2\times \frac$$

(5)

For all these metrics, a value closer to one signifies an excellent classification result and vice-versa.

Figure 3 depicts the framework for the proposed methodology.

Fig. 3

Proposed framework to classify lung nodule as malignant or benign

View original article

JOURNAL OF THE EGYPTIAN NATIONAL CANCER INSTITUTE

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Predictive radiomics based ensemble machine learning approach in CT lung nodule diagnosis

Comments (0)