Artificial intelligence for colorectal polyp sizing: clinical promise and methodological challenges

 SFX Search Buy Article(opens in new window) Permissions and Reprints(opens in new window)

10.1055/a-2695-1978

Accurate determination of colorectal polyp size remains one of the most critical yet technically elusive metrics in endoscopy. In the current issue of Endoscopy, Antonelli et al. present an important and timely advance for colonoscopy practice: the first clinical, real-life evaluation of a novel artificial intelligence (AI)-based polyp sizing model [1]. This study deserves recognition not only for its novelty but also for its clinical relevance. Polyp size is a critical diagnostic variable that governs surveillance intervals, guides resection strategies, and indicates the probability of a lesion having advanced neoplasia. The authors demonstrate that AI-assisted polyp sizing solutions can meanwhile be prospectively tested during real-world procedures. Antonelli et al. move the field forward by showing that comprehensive computational support in everyday practice – combining detection, histology prediction, and now polyp sizing – is becoming a reality.

The key difficulty in developing AI for polyp sizing lies in the definition of “ground truth” [2]. Visual size estimation, although ubiquitous, has repeatedly been shown to be inaccurate and systematically biased [3] [4]. Non-calibrated instruments, such as biopsy forceps or open snares, have been used as reference, yet these tools themselves lack precision and cannot be regarded as definitive standards for training or validation of AI sizing systems [2] [3] [5] [6]. Histopathology has traditionally served as a reference standard, but its limitations are now well recognized: formalin fixation induces tissue shrinkage of approximately 20%, while orientation and sectioning artifacts may further distort measurements [7] [8]. Consequently, studies based on non-calibrated tools or histopathological size carry an intrinsic risk of error that complicates downstream AI training and validation. The reliance on flawed surrogates underscores the need for more robust ground truth approaches.

“Build and validated on reliable ground truth, however, AI will deliver the needed integrative solutions going beyond detection and histology prediction, to include accurate AI based sizing for colorectal polyps.”

Several groups have meanwhile reported convolutional neural network (CNN)-based models for automated polyp sizing, highlighting both the potential and the pitfalls of this approach. While promising accuracy has been achieved under experimental conditions, attention has also been brought to the underappreciated technical complexities of endoscopic imaging. Lens characteristics – especially the fisheye’s field of view distortion – alter the apparent dimensions of polyps. Without proper ground truth and explicit accounting for optical distortions, AI systems risk embedding systematic bias into their predictions. In practice, this means that models trained on unreliable ground truth are prone to perpetuate measurement biases and for platform-agnostic AI vendors, encountering new types of endoscopes or lens settings may show that the AI models do not generalize well. Thus, the challenge is not only to train robust CNNs but also to ensure that the labels and imaging conditions underpinning them reflect true, reproducible size.

Most validation studies, including the present work, try to mainly provide clinically relevant categorical binning (e.g. ≤5 mm, 6–9 mm, ≥10 mm) as the primary aim. This reflects clinical practice, where thresholds – particularly 10 mm – serve as critical decision points in surveillance guidelines. However, future AI models should ultimately provide continuous, millimeter accurate measurements rather than categorical binning. This task demands even greater rigor in ground truth data acquisition and AI development efforts. Training datasets will need sufficiently large numbers of polyps distributed across the full size spectrum, with enrichment around clinically decisive thresholds (e.g. 10 mm), to ensure they ultimately perform as intended in the real-world setting.

Calibrated methods are indispensable for advancing the field. Laser-based endoscopic sizing systems represent one objective solution, though their adoption is limited by cost and hardware constraints [2] [4] [5] [6]. Post-resection measurement using calipers or on-site microscopy of fresh specimens provide another viable pathway, mitigating shrinkage artifacts and enabling reference sizing for AI training and validation. Such approaches are emerging as the most credible candidates for establishing reliable ground truth datasets. Importantly, the optimal ground truth method should allow for reproducibility checks and be confirmed by validation studies.

Other recent studies, such as the work by Sudarevic et al. using waterjet-assisted AI systems, illustrate creative attempts to integrate adjunctive sizing markers directly into the AI measurement workflow [9]. However, such approaches still depend on non-calibrated adjunct instruments, while the ultimate goal remains AI models capable of inferring polyp size without requiring assistance from calibration instruments. Morphological cues such as pit pattern, vascular structures, and surface texture may provide the necessary information for models to achieve this without relying on additional tools in the field of view as shown in the current publication. Yet for AI systems to achieve broad clinical adoption, they must be benchmarked against ground truth derived from calibrated, verifiable standards.

Antonelli et al. should be commended for taking the field forward by developing an AI-based sizing system that does not rely on additional tools in the field of view into clinical testing. Their work demonstrates feasibility, underscores the practical challenges of implementation, and highlights the value of real-world validation. At the same time, it points to the need for consensus across the field on how ground truth for polyp sizing is defined, measured, and validated. Without a reliable ground truth, the promise of AI-based sizing will be undermined by uncertainties in the very labels it seeks to approximate. Built and validated on reliable ground truth, however, AI will deliver the needed integrative solutions that go beyond detection and histology prediction, to include accurate AI-based sizing for colorectal polyps.

Publication History

Article published online:
02 October 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

Comments (0)

No login
gif