Background: Early screening and treatment of esophageal cancer (EC) is particularly important for the survival and prognosis of patients. However, early EC is difficult to diagnose by a routine endoscopic examination. Therefore, convolutional neural network (CNN)-based artificial intelligence (AI) has become a very promising method in the diagnosis of early EC using endoscopic images. The aim of this study was to evaluate the diagnostic performance of CNN-based AI for detecting early EC based on endoscopic images.
Methods: A comprehensive search was performed to identify relevant English articles concerning CNN-based AI in the diagnosis of early EC based on endoscopic images (from the date of database establishment to April 2022). The pooled sensitivity (SEN), pooled specificity (SPE), positive likelihood ratio (LR+), negative likelihood ratio (LR−), diagnostic odds ratio (DOR) with 95% confidence interval (CI), summary receiver operating characteristic (SROC) curve, and area under the curve (AUC) for the accuracy of CNN-based AI in the diagnosis of early EC based on endoscopic images were calculated. We used the I2 test to assess heterogeneity and investigated the source of heterogeneity by performing meta-regression analysis. Publication bias was assessed using Deeks' funnel plot asymmetry test.
Results: Seven studies met the eligibility criteria. The SEN and SPE were 0.90 (95% confidence interval [CI]: 0.82–0.94) and 0.91 (95% CI: 0.79–0.96), respectively. The LR+ of the malignant ultrasonic features was 9.8 (95% CI: 3.8–24.8) and the LR− was 0.11 (95% CI: 0.06–0.21), revealing that CNN-based AI exhibited an excellent ability to confirm or exclude early EC on endoscopic images. Additionally, SROC curves showed that the AUC of the CNN-based AI in the diagnosis of early EC based on endoscopic images was 0.95 (95% CI: 0.93–0.97), demonstrating that CNN-based AI has good diagnostic value for early EC based on endoscopic images.
Conclusions: Based on our meta-analysis, CNN-based AI is an excellent diagnostic tool with high sensitivity, specificity, and AUC in the diagnosis of early EC based on endoscopic images.
Keywords: Artificial Intelligence, convolutional neural network, early esophageal cancer, endoscopic, meta-analysis
How to cite this article:Esophageal cancer (EC) is the eighth most common malignant tumor in the world and the sixth in terms of mortality.[1],[2],[3] It has characteristics of high invasiveness and lymph node metastasis.[4],[5] The early signs and symptoms of EC are often latent and nonspecific. Ninety percent of EC patients would have reached the middle and advanced stage when they are diagnosed with it, which leads to poor prognosis and high costs. The 5-year survival rate does not exceed 20%.[6],[7],[8] Therefore, early screening and treatment of EC is particularly important for the survival and prognosis of patients, which has always been a hot spot for clinical medical research. Early detection of EC by endoscopy is a widely adopted strategy to prevent cancer-related morbidity and mortality.[9],[10],[11] White-light endoscopy (WLE) and narrow-band imaging (NBI) are the most common techniques for detecting EC.[12],[13] However, endoscopic features of these early lesions are subtle and easily missed with conventional endoscopy. With the development of endoscopic technology, the appearance of high-definition WLE with or without chromoendoscopy, NBI with or without magnification, confocal laser endomicroscopy, and endocytoscopic imaging system improved the detection rate of lesions.[14],[15],[16] But the diagnosis rate is still low and detection of these subtle changes in endoscopic images relies on the expertise of endoscopists, and is inevitably affected by differences in their experience. The emergence of convolutional neural network (CNN)-based artificial intelligence (AI) has brought hope to solve this problem. CNN is a type of deep learning technology that can be used to build a computer-aided diagnosis model. It usually consists of an input layer, multiple convolutional layers, a pooling layer, a fully connected layer, a normalization layer, and an output layer. It directly extracts the most representative features from a large amount of given image data and performs automatic learning and classification recognition, and makes intelligent decisions accordingly, with high recognition accuracy.[17],[18],[19],[20] Therefore, it is applied to the classification, detection, and segmentation of medical images. In recent years, the application of CNN in vision system and medical image analysis has attracted more and more attention. In the field of digestive endoscopy image analysis, CNN is mainly used to assist in detecting colon polyps, judging the degree of differentiation of gastric and colon polyps, and identifying early gastrointestinal tumors and gastric mucosal Helicobacter pylori infection.[21],[22],[23],[24] In recent years, the development and research on the clinical application of CNN has gradually expanded into the field of esophageal diseases. Previously published studies on CNN-based AI for the diagnosis of early EC on endoscopic images show promising yet divergent results, and the methodology varies considerably among studies, particularly in endoscopic approach and CNN structure.[25],[26],[27],[28],[29],[30],[31] The results are, therefore, difficult to compare between studies. In this context, we suggest that the diagnostic accuracy of CNN-based AI for early EC on endoscopic images should be fully explored through a meta-analysis of existing studies.
Materials and MethodsThis meta-analysis was performed in accordance with the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy[32] and the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines.[33]
Search strategy
A comprehensive search of PubMed, EMBASE, the Cochrane Library, Web of Science, and Wiley Online Library was performed to identify relevant English articles (from the date of database establishment to April 2022). The search principle followed the population, intervention, comparison, outcome, and study design (PICOS) principle (P: “early esophageal cancer,” I: “convolutional neural network,” S: “diagnostic test”).[34]
A combination of subject words and free words was used for the search. The included references were searched twice to reduce the chance of missing relevant articles. Please refer to Appendix 1 for the complete search strategy used for PubMed.
Selection criteria
All studies concerning the CNN-based AI for diagnosis of early EC on endoscopic images, with pathology report results as the gold standard, were considered eligible for inclusion. Furthermore, studies from which a 2 × 2 table could be constructed for true-positive, false-positive, true-negative, and false-negative values were included. Studies were excluded if they lacked an explicitly stated reference standard or had insufficient data to calculate the study outcomes. Animal experiments, case reports, meta-analyses, and reviews were excluded from this study.
Study selection and data extraction
Two investigators independently reviewed the titles and abstracts of the studies for which the inclusion criteria were satisfied for a full-text assessment. The data were independently extracted by two investigators with an agreement kappa value of 96.9%. Differences were resolved by mutual agreement or, if an agreement could not be reached, by discussion with a third reviewer. The extracted data included the following: first author, year, country, research center, imaging modality, architecture of CNN, training set (number of patients, number of images), text set (number of patients, number of images), and results (sensitivity, specificity, and accuracy). The corresponding authors of the study were contacted when additional information was needed. If no response was received after sending a reminder, the study was excluded.
Quality of the studies
The risk of bias in individual studies was assessed in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) checklist.[35] Each article was independently evaluated by the reviewers using these criteria, and disagreements were resolved by discussion.
Statistical analysis
Stata software (version 14; Stata Corporation, College Station, TX, USA) was used to draw the plots and perform some calculations. The pooled sensitivity (SEN), pooled specificity (SPE), positive likelihood ratio (LR+), negative likelihood ratio (LR−), diagnostic odds ratio (DOR) with 95% confidence interval (CI), summary receiver operating characteristic (SROC) curve, and area under the curve (AUC) for the accuracy of CNN-based AI for early EC were calculated. Pooling was conducted using a bivariate generalized linear mixed model.[36] Heterogeneity was assessed using the Chi-square and Cochran's Q tests. If I2 >50%, substantial heterogeneity was considered.[37] If heterogeneity among the studies was recorded, the potential source of heterogeneity was investigated by performing subgroup analyses and a meta-regression analysis. Deeks' funnel plot asymmetry test was used to investigate the publication bias in all included studies, and publication bias was considered significant if P < 0.01.
ResultsLiterature search results
We identified 390 articles by searching the databases, of which 123 (duplicate articles), 209 (after reviewing the titles and abstracts), and 51 (after a full-text review) studies were excluded, leaving seven studies for inclusion in our analysis [Figure 1].
Characteristics of the included studies
The study characteristics are presented in [Table 1]. Articles included in the analysis were published recently, with the first study published in 2017. This result reflects the fact that CNN-based AI is a relatively novel concept in the field of early EC. All studies[25],[26],[27],[28],[29],[30],[31] have training sets and validation sets. The number of patients and images in the training set ranged from 17 to 804 and from 129 to 8660, respectively. The number of patients and images in the validation set ranged from 17 to 155 and from 26 to 1437, respectively. Methodology varied considerably among studies, particularly in endoscopic approaches and CNN structure. All studies[25],[26],[27],[28],[29],[30],[31] did not indicate whether the data was obtained from consecutive or random patients, and if all studies[25],[26],[27],[28],[29],[30],[31] were retrospective. All studies adopted blinding methods, and all studies used pathological analyses as their gold standard. As noted in [Table 1], the sensitivity (range, 0.80–0.97) and specificity (range, 0.73–1.00) of these studies were different.
Quality of the studies
The risk of bias and applicability concerns for the studies included are shown in [Table 2]. The information is composed of 14 items divided into four parts: patient selection, index test, reference standard, flow and timing. None of the studies fulfilled all items, but all studies fulfilled at least 10 items. The high-risk items were mainly reflected in the patient selection part. The remaining parts were associated with a low risk of bias.
Table 2: Risk of bias and applicability concerns of the included studiesMain results
In our study, the pooled sensitivity and specificity of CNN-based AI based on endoscopic images for predicting early EC were 0.90 (95% CI: 0.82–0.94) and 0.91 (95% CI: 0.79–0.96), respectively [Figure 2]. SROC curves showed that the accuracy of the AUC was 0.95 [Figure 3]. As noted in [Supplementary Figure 1[Additional file 1]], CNN-based AI based on endoscopic images had a high LR+ (9.8) and a low LR− (0.11), revealing that CNN-based AI exhibited an excellent ability to confirm or exclude early EC on endoscopic images.
Figure 2: Forest plots of the sensitivity and specificity of CNN-based AI based on endoscopic images for the diagnosis of early EC. The dots correspond to the individual studies included in this analysis, and both sides of the line represent the 95% CI. The narrower the line is, the greater the accuracy of the study and the greater the weight. The diamond corresponds to the pooled result. The intermediate vertical line represents an invalid line. Q statistic test card square value (Chi-square), degree of freedom (df), P values, and I2 statistic test results (inconsistency [I2]) correspond to heterogeneity test results. The Q test was used to assess heterogeneity, while the I2 test was used to measure the size of heterogeneity. Heterogeneity was considered when P was less than 0.01. If I2 was <25%, no heterogeneity was noted. If the value of I2 was between 25% and 50%, the degree of heterogeneity was considered to be small. If the value of I2 was between 50% and 75%, heterogeneity was noted. If I2 was >75%, large heterogeneity was noted. AI = artificial intelligence, CI = confidence interval, CNN = Convolutional neural network, EC = esophageal cancer, FN = false negatives, FP = false positives, TN = true negatives, TP = true positivesFigure 3: Hierarchical summary of SROC plots of CNN-based AI for the diagnosis of early EC. The ellipse represents 95% CI for this estimate. Numbers correspond to the sensitivity and specificity of individual studies included in this analysis. AI = artificial intelligence, CI = confidence interval, CNN = Convolutional neural network, EC = esophageal cancerPublication bias
We identified publication bias by performing Deeks' regression test of asymmetry (t = −1.04; P = 0.35) [Supplementary Figure 2[Additional file 2]]. Deeks' funnel plots for CNN-based AI indicated no publication bias (P > 0.01).
Heterogeneity and meta-regression analyses
Substantial heterogeneity was detected among the studies (I2 = 96.00%, 95% CI: 93.00–99.00). We performed subgroup analyses and meta-regression analysis to identify the source of heterogeneity [Table 3]. Research center (unicenter vs. multicenter), imaging modality (WLE/NBI vs. advanced endoscopic), training set (number of images: ≥1000 vs.<1000), and test set (number of images: ≥1000 vs.<1000) were used as covariates of meta-regression analysis of the effect of heterogeneity. The meta-regression analysis showed that no source of heterogeneity was identified among the covariates we selected.
Table 3: Univariate and multivariate meta-regression analyses for identifying covariates to explain heterogeneity among studies on CNN-based AI for the diagnosis of early EC DiscussionEarly EC refers to cancer tissue confined to mucosa and submucosa, regardless of lymph node metastasis.[38] Endoscopic resection is the main treatment method, the prognosis is good, and the survival rate can reach about 90%.[39] However, endoscopic features of these early lesions are subtle and easily missed with conventional endoscopy. In addition, the discovery of early EC relies on the expertise of endoscopists and is inevitably affected by differences in their experience. Therefore, the requirement for more efficient methods of detection and characterization of early EC has led to intensive research in the field of AI. CNN is a branch of machine learning in AI that appears to be better than others AI techniques in image recognition and classification.[40] It has powerful data processing capabilities and is suitable for medical image recognition and complex clinical data analysis. After it is combined with digestive endoscopic imaging technology, it can learn and train a large number of endoscopic images and analyze the relationship between endoscopic images and disease diagnosis. So as to achieve the level of imitating human cognition, it can help doctors to complete fast and accurate diagnosis. In this meta-analysis, we investigated the diagnostic value of CNN-based AI based on endoscopic images for early EC. Our meta-analysis showed that CNN-based AI has good diagnostic value for early EC based on endoscopic images with high sensitivity and specificity. In addition, the high AUC value also increases our confidence in the prediction of early EC, which will greatly improve the prognosis of patients, reduce the death rate of EC, and reduce the working intensity of doctors. Another finding of this analysis was that CNN-based AI can not only qualitatively diagnose early EC on endoscopic images, but also quantitatively diagnose it. CNN-based AI can assist in judging the depth of early EC lesions on endoscopic images. Nakagawa et al.[28] developed a CNN-based AI system that can be used to distinguish epithelium–submucosal microinvasive (EP-SM1) and submucosal deep invasive (SM2/3) lesions on endoscopic images. The diagnostic performance was similar to that of 16 experienced endoscopists. Tokai et al.[26] used 1751 esophageal squamous cell carcinomas with different invasion depths (EP-SM1, SM2) to create a diagnostic model through CNN. Subsequently, CNN-based AI and 13 endoscopists simultaneously examined 291 test images to evaluate diagnostic efficiency. The results showed that the diagnostic performance of CNN exceeded that of 12 experienced endoscopists. A well-known disadvantage of CNN is its black-box nature. To interpret the CNN results visually, seven studies included in this meta-analysis used a single-shot multibox detector, deepLab V.3+, explicit class activation maps, U-Net, and other methods. These techniques could mark the lesion sites with rectangular boxes or outline lesion boundaries with curves to help endoscopists clarify CNN-labeled lesion sites, thus potentially improving the detection rate of early EC with positive biopsy sampling rates. Our meta-analysis does not facilitate the development of clear recommendations regarding the CNN method. In high-dimensional spaces, no method generally outperforms others. Currently, the potential utility of CNN in clinical practice appears promising, despite the use of different approaches in the included studies. Our meta-analysis showed that CNN-based AI has good diagnostic value for early EC based on high-quality endoscopic images.
Substantial heterogeneity was detected in the included studies. According to meta-regression analysis, no source of heterogeneity was identified among the variables we selected. The source of heterogeneity may be related to differences in endoscopic approach and CNN structure in the included studies. However, these variables were not included as covariates in the regression analysis because the grouping conditions were not satisfactory.
We assessed the quality of the included studies using the following four components: patient selection, index test, reference standard, and flow and timing, among which the high-risk items were mainly reflected in the patient selection component. The potential explanation for this finding is that the inclusion criteria of diagnostic trials are often based on case–control trials rather than randomized controlled trials, and patients included in the study reported only the time period and did not indicate whether they were consecutive cases. The CNN method and the gold standard method were performed without knowing the results of the other test, and pathological analyses were used as the gold standard in the included studies. Therefore, the selection bias was small and the results were reliable, indicating that these factors were associated with a low risk of bias. Additionally, Deeks' funnel plots for CNN-based endoscopy revealed no publication bias among the studies.
Our meta-analysis has several limitations. Firstly, most studies included in the meta-analysis employed a retrospective design and, therefore, were subject to selection bias and prone to data loss. Secondly, substantial heterogeneity was detected in the included studies. The source of heterogeneity may be related to differences in endoscopic approach and CNN structure in the included studies. Finally, most studies have relatively small sample sizes and are single-center studies. Therefore, further prospective studies using a larger and more balanced population from multiple centers are required. In summary, CNN is a relatively new concept and the variety of presented approaches is worth investigating. All weak points should be addressed in future trials comprising larger patient cohorts from multicenter, multivendor studies.
In conclusion, our meta-analysis showed that CNN-based AI has good diagnostic value for early EC based on endoscopic images with high sensitivity, specificity, and AUC. More importantly, our study validated the feasibility of using CNN-based AI to conclusively identify a disease prone to missed diagnosis and misdiagnosis. This finding is very important for determining the diagnosis, treatment, and prognosis of early EC on endoscopic images.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
Appendix 1PubMed search strategy
#1 (“Esophageal Neoplasms”[Mesh]) OR (((((((((((((((((Esophageal Neoplasm[Title/Abstract]) OR (Neoplasm, Esophageal[Title/Abstract])) OR (Esophagus Neoplasm[Title/Abstract])) OR (Esophagus Neoplasms[Title/Abstract])) OR (Neoplasm, Esophagus[Title/Abstract])) OR (Neoplasms, Esophagus[Title/Abstract])) OR (Neoplasms, Esophageal[Title/Abstract])) OR (Cancer of Esophagus[Title/Abstract])) OR (Cancer of the Esophagus[Title/Abstract])) OR (Esophagus Cancer[Title/Abstract])) OR (Cancer, Esophagus[Title/Abstract])) OR (Cancers, Esophagus[Title/Abstract])) OR (Esophagus Cancers[Title/Abstract])) OR (Esophagus Cancers[Title/Abstract])) OR (Cancer, Esophageal[Title/Abstract])) OR (Cancers, Esophageal[Title/Abstract])) OR (Esophageal Cancers[Title/Abstract])).
#2 (((((((((((((convolutional neural network) OR (convolutional network)) OR (neural network)) OR (“Deep Learning”[Mesh])) OR (artificial intelligence)) OR (machine learning)) OR (computer-aided)) OR (computer aided)) OR (hierarchical learning)) OR (computational intelligence)) OR (machine intelligence)) OR (computer reasoning)) OR (classification algorithm)) OR (feed-forward neural network).
#3 (“sensitivity and specificity”[MeSH] OR predict*[text] OR diagnos*[text] OR accura*[text]).
#1 AND #2 AND #3.
Number of articles: 264.
References
Correspondence Address:
Dr. Lu Tian
Department of Radiology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014
China
Source of Support: None, Conflict of Interest: None
CheckDOI: 10.4103/sjg.sjg_178_22
Comments (0)