Artificial Intelligence-Enhanced Breast MRI: Applications in Breast Cancer Primary Treatment Response Assessment and Prediction

Primary systemic therapy (PST) is the treatment of choice in patients with locally advanced breast cancer. Numerous trials have demonstrated that PST with neoadjuvant chemotherapy (NAC) is as effective as adjuvant chemotherapy in patients with locally advanced or large resectable breast cancer.1 Primary systemic therapy aims to downstage the tumor, thereby minimizing surgical extent, potentially avoiding mastectomy or axillary lymph node dissection, and sparing patients the long-term morbidity associated with possible lymphedema.2–6 In recent years, PST has also been considered for patients presenting with smaller tumors with a clear indication for chemotherapy at diagnosis, such as patients with triple-negative breast cancer (TNBC) or HER2-positive breast cancer.2,7

Various treatment strategies, such as chemotherapy, hormone therapy, and targeted therapy, can be used to treat breast cancer in the neoadjuvant setting.8 For instance, American Society of Clinical Oncology guidelines recommend that patients with clinically node-positive and/or at least T1c TNBC should undergo an anthracycline- and taxane-containing regimen with or without carboplatin. Although there have been promising preliminary results from ongoing clinical trials evaluating the role of immune checkpoint inhibitors, for example, pembrolizumab and atezolizumab, in treating patients with early-stage TNBC,9,10 there is currently insufficient evidence to recommend routinely adding immune checkpoint inhibitors to NAC for these patients. Meanwhile, postmenopausal patients with HR-positive, HER2-negative disease may be offered hormone therapy with an aromatase inhibitor. Patients with node-positive or high-risk node-negative, HER2-positive disease, on the other hand, should be offered an anthracycline and taxane-based regimen or a non–anthracycline-based regimen in combination with trastuzumab.

Previous studies have shown that disease-free and overall survival significantly improves for patients who achieve pathologic complete response (pCR) to PST with NAC, particularly those with aggressive breast cancer subtypes.11 Of note, although definitions of pCR vary across trials, the 3 most commonly used definitions include the following: ypT0 ypN0 (ie, absence of invasive cancer and in situ cancer in the breast and axillary nodes), ypT0/is ypN0 (ie, absence of invasive cancer in the breast and axillary nodes, irrespective of presence of DCIS), and ypT0/is (ie, absence of invasive cancer in the breast irrespective of ductal carcinoma in situ or nodal involvement).11 In addition, improved survival outcomes have also been demonstrated as a result of PST escalation in patients with aggressive breast cancer subtypes who did not achieve pCR.12,13 Consequently, response to PST has emerged as an important additional factor alongside tumor stage and biology in treatment decision-making with respect to adjuvant systemic treatment and postoperative radiotherapy.14,15

In this context, the accurate prediction of response to PST using imaging is crucial due to its potential to not only better prognostication but also allow the de-escalation or omission of potentially toxic treatment with undesirable adverse effects, the accelerated implementation of new targeted therapies, and the mitigation of surgical delays in selected patients.16 This review presents an overview of the current applications of artificial intelligence (AI) to magnetic resonance imaging (MRI) in predicting and assessing response to PST and discusses the challenges and limitations of their clinical implementation.

Response Assessment and Prediction of Response to PST With Imaging

Although conventional imaging techniques (ie, mammography, digital breast tomosynthesis, and breast ultrasound) can be used to evaluate the response to PST, there is consensus that MRI is the method of choice and should be performed during PST (halfway) and after the conclusion of PST (at conclusion and/or before surgery).15,17–20 There is broad evidence that MRI has superior performance to conventional imaging techniques for preoperative locoregional staging and response assessment to PST. Magnetic resonance imaging detects additional disease sites in the ipsilateral breast in approximately 20% of patients and in the contralateral breast in approximately 5.5% of patients, leading to treatment plan changes in one third of patients.21

Response assessment to PST with MRI is typically performed via qualitative radiologist assessment of dynamic contrast-enhanced (DCE) MRI using tumor size and volume,22 with responsiveness (early vs late in the course of treatment) being influenced by breast cancer subtype as well as PST strategy.23 Of note, in HER2-negative tumors treated with taxane regimens,24,25 residual disease is underestimated on MRI when compared with HER2-negative tumors treated with FEC (5-fluoro-uracyl-epirubicin-cyclophosphamide). Underestimation of treatment response is most likely caused by the angiogenic effect of taxane drugs, resulting in limited tumor contrast enhancement and differences in tumor shrinkage patterns. Moreover, although patients treated with FEC predominantly demonstrate concentric shrinkage patterns on MRI resulting in single nodular residual lesions on surgical histopathology, patients treated with the taxane docetaxel more often display fragmented patterns on MRI resulting in vast, numerous microscopic nests on surgical histopathology. Tumor subtype also influences tumor response to PST, with higher pCR rates in patients with HER2-positive breast cancer and TNBC.26 In this context, additional means to reliably determine response to PST at earlier stages in the course of treatment are warranted.

Improvements can be achieved with multiparametric breast MRI, which provides morphological, functional, and metabolic information from different MRI parameters (DCE-MRI, T2-weighted imaging, and diffusion-weighted imaging [DWI], with 3D proton magnetic spectroscopic imaging as an optional parameter).27 Multiparametric MRI simultaneously allows qualitative and quantitative assessment, and several studies have demonstrated that multiparametric MRI increases the diagnostic accuracy over DCE-MRI alone, with DWI being the most valuable supplemental sequence in this context.28–31 For the early prediction of response to PST, there is evidence that qualitative morphological and routinely used functional MRI features of breast cancer before PST can predict pCR. Tsunoda-Shimizu et al32 demonstrated that breast cancers presenting as well-defined round/oval or lobulated masses have low rates of pCR. Other studies33,34 found that multicentric cancers are less likely to achieve pCR compared with multifocal or unifocal cancers. Tumor growth pattern also influences response to PST, as shown by Tsukada et al,35 who found that a growth pattern where tumors grew parallel to Cooper's ligaments was associated with a higher likelihood of achieving pCR after PST. In addition, quantitative measures such as ADC from DWI also have value as an early predictor of response to PST.30

The abovementioned studies mainly used univariate and multivariate regression models to ascertain the use of MRI to assess and predict response. More recently, there has been a focus on the implementation of AI-enhanced predictive modeling approaches, using a variety of classical machine learning (ML) techniques coupled with radiomics and deep learning (DL) techniques.36–38

The Basic Concept of AI-Enhanced Image Analysis

Artificial intelligence–enhanced image analysis can be broken down into 2 approaches that differ in how the imaging information is transformed into mineable data39,40 (Fig. 1). These 2 approaches share the same underlying principle that imaging features that encode both simple patterns and many higher-order patterns not discernable with the naked eye can be extracted from biomedical images and can be linked with different variables of interest (eg, patient characteristics, clinical outcomes, and omics data) to enable improved decision support.41,42

F1FIGURE 1:

Schematic description of workflow of artificial intelligence–enhanced imaging biomarker development using machine learning and deep learning approaches. For deep learning, the image segmentation step is not necessarily required.

Handcrafted radiomics analysis coupled with ML extracts imaging features, which are then used to identify a phenotypical fingerprint or “radiomics signature.” In contrast, DL uses a complex network inspired by the human brain architecture to devise its features.43,44 The typical radiomics and ML workflow starts with the input of the image of interest. Images can be derived with any 2D or 3D modality. As a next step, pixel intensities are normalized within a standardized range. Then, image segmentation by means of a region of interest (ROI) is performed.

Image segmentation can be performed through either fully manual or semiautomatic tumor segmentation. Fully manual segmentation can be performed on a single slice or on multiple slices. This is a labor-intensive process and is limited by significant interreader variability.45 On the other hand, semiautomatic tumor segmentation allows for faster segmentation using an active contour method to select the ROI. In the semiautomatic segmentation workflow, the boundaries of the ROI are first adjusted manually in the x, y, and z planes. Thresholding is also adjusted manually to exclude unwanted tissues within the active contour. Then, one or more spherical seeds are placed manually in the ROI to initialize the annotation, which merges into a single contour. In the final stage of the semiautomatic segmentation workflow, the automatic segmentation can be adjusted manually.46 Several methods of breast lesion segmentation have been described in the literature. Lucas-Quesada et al47 used a 2D similarity map method, which samples a representative signal enhancement curve inside the tumor, with similarity in signal enhancement then depicted within each voxel and segmentation resulting from thresholding the similarity map at the user-defined threshold value. Meanwhile, Gilhuijs et al48 used an enhanced image method, which assigns the variance of signal enhancement curve values to each voxel, with segmentation resulting from computing a threshold differentiating background and tumor voxels within the bounding sphere. Chen et al49 used a fuzzy clustering method where lesion enhancement within the user-defined box-shaped ROI is normalized based on precontrast imaging, and the normalized time curves are clustered using fuzzy c-means. The binarized lesion membership map is postprocessed by connected component labeling, object selection, and hole-filling on the selected object. Castellani et al50 used a 3-step segmentation method: signal feature extraction from time-intensity curves, voxel segmentation, mean-shift clustering, and support vector machine (SVM) classification trained to classify voxels according to the labels obtained by the clustering phase. Other methods include a volume of interest–based approach, whereby tumor annotations are performed by drawing a volume of interest around a tumor lesion followed by thresholding.51

Following image segmentation, imaging features are extracted from the diagnostic images and comprise handcrafted features (shape, texture, kinetics, etc), first-order features (histogram-based features related to the distribution of pixel intensities), and higher-order features (co-occurrence matrices, run length matrices, size zone matrices, neighborhood gray-level dependence matrices, Minkowski functionals, local binary patterns, and wavelet analysis related to how pixels are positioned in relation to each other). Radiomics analysis is then performed as follows. From the extracted features, those not relevant to the proposed task are eliminated (ie, feature selection and reduction). The resulting relevant features are the so-called radiomic signature. Subsequently, statistical or ML classifiers are coupled with the radiomic signature to answer the question of interest, such as the classification of patients according to a predicted outcome. In supervised ML, paired radiomic signatures and known outcomes are used to train the machine to recognize patterns in the data that can predict the known outcomes. Machine learning methods that are used for feature selection and model building include but are not limited to logistic regression, random forest or decision trees, and SVMs. Finally, ideally, external validation is performed, whereby model performance is tested on an external data set to avoid overfitting. Overfitting denotes spurious correlations that are not generalizable to other data sets. In lieu of external validation, cross-validation can be used whereby the available data are divided into different subsets (ie, training set and validation set).

With advances in hardware and software, DL is being used increasingly in biomedical imaging studies, including in breast imaging. In contrast to traditional ML techniques that rely on handcrafted features to perform the required task, such as lesion detection, classification, or response prediction, DL uses neural networks, akin to the human brain architecture, that allow the machine to learn patterns on its own, without any predefined characteristics or handcrafted features.44,52 To date, most DL algorithms in medical image analysis rely on convolutional neural networks (CNNs), which comprise multiple layers of processing designed to optimize millions of variables, the so-called weights and biases, and extract hierarchical patterns. The majority of DL models use a supervised learning approach in which training is done using a multitude of labeled examples, which can be on different levels (examination, breast, pixel). During training, a general-purpose learning procedure, some variant of stochastic gradient descent, is used to optimize an objective such as classification, which in turn results in optimized feature selection. Labeled images in large quantities are inputted into a CNN. In the first layer, the machine learns small, simple features (eg, the orientation of edges). In the subsequent layers, the machine learns particular feature combinations. In the deeper layers, the machine learns complex arrangements of earlier patterns. In the final layers, the machine uses the learned imaging features or representations for the final desired output. After training is completed, model performance is ideally first validated with a held-out data set (ie, an internal data set not used during training) and then using an external data set from a different institution (ie, external validation).

Because DL approaches are data driven, performance improves with increasing data sets. To achieve optimal performance, much larger data sets for training as well as high computing power (higher than that of handcrafted radiomics analysis coupled with ML) are necessary. This limitation can be somewhat mitigated by the utilization of transfer learning techniques that substantially reduce data set size requirements for CNN training.52

AI-Enhanced MRI in Response Assessment and Prediction of Response to PST

Artificial intelligence–enhanced MRI is actively being explored to improve the ability of breast MRI as the most sensitive test not only for response assessment but also for the prediction of response to PST, with promising initial results53 and continuous developments.

Radiomics and ML

In this section, we will discuss and provide a summary of the existing literature on “classical” (ie, non-DL) ML methods for AI-enhanced MRI in breast cancer treatment response assessment. Table 1 also offers a detailed overview of the various ML approaches used to date, along with the MRI protocols, patient populations, and outcomes reported in relevant studies within the literature. The existing literature in Table 1 is ordered chronologically, in order to support an understanding of the chronological evolution of ML approaches.

TABLE 1 - Overview of Machine Learning (Non–Deep Learning) Methods for Predicting Treatment Response Author Year Method Patient Population and Cohort Size MRI Modality/Protocol Results Mani et al54 2011 Gaussian Naive Bayes, logistic regression, Bayesian logistic regression, CART36, random forest, SVM, Ripper (rule learner), with and without feature selection Stage II/III breast cancer patients receiving NAC (n = 20) 3 T DWI and DCE-MRI before and after first cycle NAC Highest diagnostic accuracy: 0.9; highest AUC: 0.96 (Bayesian logistic regression using both imaging and clinical variables) Mani et al55 2013 Radiomics analysis using Bayesian logistic regression and feature selection 28 breast cancer patients undergoing NAC DCE-MRI and DWI data acquired before and after 1 cycle of NAC AUC of 0.86, accuracy of 86%, sensitivity of 88%, specificity of 82% for predicting response to NAC Aghaei et al56 2015 Quantitative kinetic imaging features with ANN-based classifier and maximum score–based fusion process 68 breast cancer patients undergoing NAC Pretreatment MRI with precontrast and first postcontrast scans AUC = 0.85 ± 0.05 using 5 low-redundancy features; ANN-based classifier more accurate with AUC = 0.96 ± 0.03 O'Flynn et al57 2016 Statistical analyses of several imaging parameters (such as enhancement factor, tumor volume, …) Women with biopsy-proven breast cancer (n = 32) 3 T DCE-MRI sequences with contrast agent, T2-weighted, DW, and ISW sequences A decrease in enhancement fraction (EF) (−41% ± 38%) and tumor volume (−80% ± 25%) after 2 cycles of NAC were significant predictors of pCR (AUC: EF = 0.76, tumor volume = 0.77) Wu et al58 2016 Principal component analysis, k-means clustering, and Haralick texture features based on GLCM Stage II/III breast cancer patients (n = 35) 3 T DCE-MRI before and after the first cycle of NAC AUC of 0.79 for predicting pCR Aghaei et al59 2016 Attribution selected classifier using ANNs with a wrapper subset evaluator 151 cancer patients before NAC Pretreatment DCE-MRI AUC of 0.83 ± 0.04 for predicting complete response vs partial response Braman et al60 2017 Radiomic textural analysis of intratumoral and peritumoral regions followed by several machine learning classifiers Patients who received NAC for breast cancer (n = 117) Pretreatment T1-weighted contrast-enhanced DCE-MRI AUC of 0.74 using diagonal linear discriminant analysis (DLDA) classifier; AUC of 0.83 for HR+, HER2− (DLDA); AUC of 0.93 for TN/HER2+ (naive Bayes) Fan et al61 2017 Evolutionary algorithm (EA)–based method Breast cancer patients undergoing NAC (n = 57) Pretreatment DCE-MRI Leave-one-out cross validation (LOOCV) AUC of 0.91 for main cohort and 0.71 for the validation Giannini et al62 2017 Bayesian classifiers on 3D texture features Breast cancer patients undergoing NAC (n = 44) Pretreatment DCE-MRI Specificity of 72% and sensitivity of 67% for predicting pCR Tahmassebi et al63 2019 Machine learning with mpMRI for predicting treatment, using 8 different classifier techniques 38 women with breast cancer 3 T mpMRI with DCE, DWI, and T2-weighted imaging before/after 2 cycles of NAC AUC = 0.86 for predicting residual cancer burden (RCB), AUC = 0.83 for predicting recurrence-free survival (RFS), AUC = 0.92 for predicting disease-specific survival (DSS) Cain et al64 2019 Multivariate logistic regression classifier and SVM classifier with stepwise multilinear regression-based feature selection 288 breast cancer patients treated with PST (chemotherapy and/or endocrine or anti-HER2neu therapies) Pretreatment DCE-MRI Models were prognostic of pCR in TN/HER2+ patient subgroup (P < 0.002), but not across the entire cohort (P = 0.01) Braman et al65 2019 99 texture descriptors (Laws, Gabor, GLCM, CoLlAGe) from peritumoral and intratumoral regions on MRI used for statistical analyses 209 breast cancer patients, including 117 who received MRI prior to NAC and 42 with HER2+ breast cancer Pretreatment DCE-MRI Max AUC of 0.85 for identifying HER2+ tumors (peritumoral region); AUC of 0.89 for identifying HER2+ subtype (combined peritumoral and intratumoral features); AUC of 0.80 and 0.69 for predicting pCR in 2 validation cohorts Liu et al66 2019 Radiomics analysis with SVM Multicenter breast cancer patients undergoing NAC (n = 414) Pretreatment T2-weighted imaging, DWI, and contrast-enhanced T1-weighted imaging, along with clinical information AUC of 0.86 for predicting pCR in primary cohort (AUC around 0.70 for validation) Machireddy et al67 2019 SVM with multiresolution fractal analysis (wavelet analysis and fractal dimension) Breast cancer patients undergoing NAC (n = 55) DCE-MRI before and after the first NAC cycle AUC for training set: 0.91; AUC for testing set: 0.78. The addition of multiresolution features was statistically significant. Drukker et al68 2019 Radiomics with bootstrap resampling and linear discriminant analysis Node-positive breast cancer patients undergoing NAC (n = 158) Pretreatment DCE-MRI at 1.5 T and 3.0 T AUC up to 0.82 for predicting pCR; AUC up to 0.72 for predicting post-NAC lymph node status Sutton et al69 2020 Radiomic feature extraction and random forest machine learning classifier with and without molecular subtype Women with breast cancer treated with NAC (n = 273) with pre- and post-NAC MRI and post-NAC surgical pathology report assessing response 1.5 T or 3.0 T MRI, axial T1-weighted fat-suppressed images pre- and post-contrast before and after NAC Model 1 (radiomics only): AUC 0.83 (95% CI, 0.71–0.94); model 2 (radiomics and molecular subtype): AUC 0.78 (95% CI, 0.62–0.94) Bitencourt et al70 2020 Radiomic features coupled with coarse decision trees 311 HER2 overexpressing breast cancer patients receiving NAC Pretreatment contrast-enhanced MRI Sensitivity 99.3%, specificity 81.3%, diagnostic accuracy 97.4% for HER2 heterogeneity prediction; sensitivity 86.5%, specificity 80.0%, diagnostic accuracy 83.9% for pCR prediction (test set) Bian et al71 2020 Multivariate logistic regression for radiomic signatures based on T2-weighted, DWI, DCE imaging, and their combination Breast cancer patients undergoing NAC (n = 152) Pretreatment mpMRI: T2-weighted imaging, DWI, DCE imaging Combined radiomic signature and nomogram model: AUC 0.91 (training), 0.93 (validation) for predicting pCR Xiong et al72 2020 Radiomic signature construction using multivariable logistic regression Breast cancer patients undergoing NAC (n = 125: 63 primary, 62 validation) Pretreatment multiparametric MRI Combined prediction model: AUC 0.94 (95% CI, 0.85–1) in validation cohort for predicting grade 1–2 group Zhou et al73 2020 Wavelet-transformed radiomic texture features classified using random forest Locally advanced breast cancer patients (n = 55) Pretreatment contrast-enhanced MRI Model using only wavelet features (AUC = 0.89 ± 0.03) outperformed other models (that included standard nonwavelet features) Chen et al74 2020 Maximum relevance minimum redundancy (mRMR) algorithm for feature selection, least absolute shrinkage and selection operator (LASSO) algorithm for dimensionality reduction, machine learning classifiers: SVM, Bayes, k-Nearest Neighbor, Random Forest, Decision Tree are combined for multivariable logistic regression Breast cancer patients undergoing NAC (n = 158) Pretreatment T2-weighted, DWI, and DCE-MRI AUC of 0.88, specificity 82.19%, sensitivity 83.57% for test set Nemeth et al75 2021 SVM with quadratic kernel, random forest, multilayer perceptron, SVM with linear kernel Early triple-negative breast cancer patients (n = 75) Pretreatment T1-weighted, T2-weighted, DWI, and DCE-MRIs SVM with quadratic kernel had best performance (mean AUC = 0.83, sensitivity = 0.85, specificity = 0.75) in the test set Umutlu et al76 2022 Radiomics analysis using elastic net and SVM Newly diagnosed, therapy-naive breast cancer patients (n = 73) Simultaneous 18F-FDG PET/MRI with DCE-T1 weighted imaging, DWI, and 18F-FDG PET AUC of 0.80 for predicting pCR in the entire cohort; AUC of 0.94 in HR+/HER2− subgroup; AUC of 0.92 in TN/HER2+ subgroup (both MR and MR + PET) Syed et al77 2023 Extreme Gradient Boosting (XGBoost) on GLCM features and clinical data Invasive breast cancer patients undergoing NAC (n = 117) DWI, DCE-MRI, and tumor ADC values at 3 treatment time points; patient demographics; and tumor data AUC of 0.95 (95% CI, 0.91–0.99; P < 0.001) using all MRI and all non-MRI data at all time points Liu et al78 2023 Logistic regression, decision tree, SVM, random forest, Bayes Gaussian, k-Nearest Neighbor Breast cancer patients receiving NAC (n = 420) Pretreatment MRI (morphologic and kinetic features); clinicopathologic features The highest AUC for predicting tumor regression patterns was obtained by logistic regression (0.68) on the internal validation

ADC, apparent diffusion coefficient; ANN, artificial neural network; AUC, area under the curve; CI, confidence interval; DCE, dynamic contrast-enhanced; DWI, diffusion-weighted imaging; GLCM, gray-level co-occurrence matrix; mpMRI, multiparametric magnetic resonance imaging; NAC, neoadjuvant chemotherapy; pCR, pathologic complete response; PET, positron emission tomography; PST, primary systemic therapy; SVM, support vector machine.

Of note, many studies in the literature have used multiple MRI modalities (ie, multiparametric MRI) as input. In such studies, the usage of so-called coregistration is vital: this refers to the process of aligning and combining multiple images or sequences, such as T1-weighted imaging, T2-weighted imaging, and DWI, to generate a comprehensive and accurate representation of the tumor (and the associated segmentation).

Several studies combined pretreatment and posttreatment multiparametric MRI in their analysis. For example, in 2011, Mani et al54 explored several ML classifiers, such as Gaussian Naive Bayes, logistic regression, Bayesian logistic regression, CART36, random forest, SVM, and Ripper (rule learner), to predict treatment outcomes in stage II/III breast cancer patients receiving NAC (n = 20). The study used 3 T DWI and DCE-MRI before and after the first cycle of NAC. The highest diagnostic accuracy was 0.91, and the highest area under the curve (AUC) was 0.96. In a further study in 2013, Mani et al55 used formal radiomics analysis of DCE-MRI and DWI data rather than qualitative and quantitative features acquired before and after 1 cycle of NAC. They used Bayesian logistic regression and feature selection in a cohort of 28 breast cancer patients undergoing NAC. The study achieved an AUC of 0.86, an accuracy of 86%, a sensitivity of 88%, and a specificity of 82% for predicting response to NAC. Several years later, in 2019, Tahmassebi et al63 conducted a feasibility study exploring ML with multiparametric MRI for predicting treatment and long-term survival, using 8 distinct classifier techniques. The study used qualitative and quantitative features that can be routinely extracted from 3 T multiparametric MRI with DCE, DWI, and T2-weighted imaging, before and after 2 NAC cycles, achieving an AUC of 0.86 for predicting residual cancer burden, an AUC of 0.83 for predicting recurrence-free survival, and an AUC of 0.92 for predicting disease-specific survival (Fig. 2).

F2FIGURE 2: Receiver operation characteristic (ROC) curves of the multiparametric magnetic resonance imaging model using the XGBoost classifier with 4-fold cross-validations for predicting of (A) RCB class and (B) RFS, and 3-fold cross-validation in prediction of (C) DSS. The solid orange lines represent the average ROC curves, the lighter lines depict the ROC curve for each fold, and the gray-shaded areas indicate the confidence interval for the predictions using multiparametric magnetic resonance imaging model. Reprinted with permission from Tahmassebi et al.63

A more recent trio of studies in the context of multiparametric MRI analyses but using pretreatment MRI only are formed by the studies of Xiong et al,72 Nemeth et al,75 and Liu et al.78 In 2020, Xiong et al72 developed a radiomic signature using multivariable logistic regression in breast cancer patients undergoing NAC (n = 125: 63 primary, 62 validation), using pretreatment multiparametric MRI. The combined prediction model yielded an AUC of 0.94 (95% CI, 0.85–1) for predicting the grade 1–2 group in the validation cohort. In 2021, Nemeth et al75 compared multiple ML classifiers, including SVMs with quadratic kernels, random forests, multilayer perceptrons, and SVMs with linear kernels, in patients with early-stage TNBC (n = 75), using pretreatment T1-weighted, T2-weighted, DWI, and DCE-MRI. The classifier with the best performance was SVM with a quadratic kernel, with a mean AUC of 0.83, sensitivity of 85%, and specificity of 75% in the test set. Lastly, in 2023, Liu et al78 compared logistic regression, decision trees, and various other ML techniques for predicting the tumor regression pattern in 420 patients undergoing NAC, using pretreatment MRI morphologic and kinetic features along with clinical variables to obtain an AUC of 0.68 for the prediction of tumor regression patterns using the logistic regression model.

Overall, other studies combining pretreatment and posttreatment MRIs include O'Flynn et al,57 Wu et al,58 Machireddy et al,67 Sutton et al,69 and Syed et al77 (using only gray-level co-occurrence matrix [GLCM] features). Other studies using pretreatment MRIs only include Braman et al,65,79 Aghaei et al,56,59 Fan et al,61 Giannini et al,62 Cain et al,64 Liu et al,66 Drukker et al,68 Bitencourt et al,70 Bian et al,71 Chen et al,74 and Umutlu et al.76 The particular ML techniques used varied, incorporating random forests, SVMs, diagonal linear discriminant analyses, elastic nets, and others. In addition, a variety of commonly used MRI modalities were used, resulting in a broad range of predictive accuracy among different methodologies and modalities. It is important to note that these studies also often use clinical and hormonal variables in the final predictors.

Two studies used wavelet analysis. Machireddy et al67 used an SVM with multiresolution fractal analysis (wavelet analysis and fractal dimension) to predict treatment outcomes in breast cancer patients undergoing NAC (n = 55). They used DCE-MRI before and after the first NAC cycle for their analysis. The study reported an AUC of 0.91 in the training set and an AUC of 0.78 in the testing set. The addition of multiresolution features was statistically significant. A similar approach was used by Zhou et al,73 who introduced wavelet transforms of radiomic texture features to predict treatment response for a cohort of 55 patients with locally advanced breast cancer. They obtained an AUC of 0.89 using only the pretreatment CE-MRI.

Representing a slight step toward DL, Aghaei et al56 explored the possibility of identifying a new clinical marker for predicting treatment response to NAC based on quantitative kinetic image features analysis using a data set of DCE images from 68 cancer patients. The researchers developed an ML scheme, computing 39 kinetic image features from tumor and background parenchymal enhancement regions. The artificial neural network–based classifier significantly improved the AUC value to 0.96 ± 0.03. In a subsequent study,59 the same group of researchers used again quantitative kinetic image features analysis using a data set of breast MR images from 151 cancer patients. The proposed model computed 10 kinetic image features, and the attribution-selected classifier achieved a significantly higher AUC of 0.83 ± 0.04 compared with individual features. Both studies demonstrated the potential of quantitative kinetic image features from breast MR images acquired before NAC to predict tumor response to NAC.

These studies discussed highlight the potential of combining multiple MRI modalities with classical ML techniques to predict treatment outcomes in breast cancer patients undergoing NAC. By using different imaging modalities and various analytical methods, researchers have achieved promising results. However, it is essential to acknowledge the limitations and challenges associated with these approaches. The variability of MRI protocols, ML techniques, and patient populations across studies makes comparing and generalizing results beyond the modality trained on challenging. Moreover, the small sample sizes in many studies may affect the robustness and generalizability of their findings.

Deep Learning

In this section, we will transition from discussing classical ML methods to exploring the literature on DL methods for AI-enhanced MRI in breast cancer treatment response assessment. Table 2 provides an overview of various DL approaches thus far, along with the MRI protocols, patient populations, and results reported in the relevant studies in the literature. The existing literature in Table 2 is ordered chronologically, as in Table 1.

TABLE 2 - Overview of Deep Learning Methods for Predicting Treatment Response Author Year Method Patient Population and Cohort Size MRI Modality/Protocol Results Huynh et al80 2017 CNN with transfer learning (VGGNet) Breast cancer patients (n = 64) Pretreatment and posttreatment DCE-MRI with precontrast and 2 postcontrast images Best AUC: 0.85 (SD = 0.03) using only precontrast; other AUCs: 0.71 (SD = 0.03) to 0.82 (SD = 0.03) Ravichandran et al81 2018 CNN Breast cancer patients undergoing NAC (n = 166) Pretreatment DCE-MRI AUC of 0.77 for predicting pCR; accuracy of 82% in the testing set; inclusion of HER2 status improves prediction (AUC of 0.85, accuracy of 85%) Ha et al82 2019 CNN 141 breast cancer patients Pretreatment MRI 88% overall accuracy in 3-class prediction of response to NAC El Adoui et al83 2019 Two-branch CNN Local breast cancer patients (n = 42) DCE-MRI before and after the first cycle of NAC Accuracy of 92.72%, AUC of 0.96 Braman et al79 2020 Multi-input CNN with DCE-MRI for HER2+ patients HER2+ breast cancer patients (n = 157) Pretreatment DCE-MRI AUC of 0.85 (95% CI, 0.67–1.0, P = 0.0008) for predicting pCR in an external testing set of 28 patients; AUC of 0.77 (95% CI, 0.58–0.97, P = 0.006) for predicting pCR in a multicenter trial data set of 29 patients Choi et al84 2020 PET/MRI image deep learning model using CNN Women with advanced breast cancer (n = 56) PET/CT and MRI before and after first NAC cycle AUC of 0.81 in all patients, AUC of 0.88 for HER2-negative subtype Liu et al85 2020 CNN I-SPY TRIAL breast MRI database, 131 patients Pretreatment and posttreatment DCE-MRIs Diagnostic accuracy of 72.5% for discriminating between pCR vs non-pCR, with sensitivity 65.5%, specificity of 78.9%, and AUC of 0.72 Qu et al86

Comments (0)

No login
gif