Data-Driven Metrics of Operative Performance: The Technology-Enhanced Surgical Training (TEST) Delphi Consensus Study

ElsevierVolume 82, Issue 11, November 2025, 103701Journal of Surgical EducationAuthor links open overlay panel, , , , , , Highlights•

Technology-enhanced assessments offer greater objectivity than traditional methods.

Consensus on essential performance metrics in surgical training was achieved.

Included statements reflect technical and non-technical skills, and outcomes.

Suggested applications include benchmarking performance against peers.

Metric validation and development of implementation strategies are priorities.

Objectives

Technology-enhanced assessment of operative performance offers greater objectivity and reliability than traditional methods. However, consensus on the most critical metrics for informing the future of surgical training is lacking, and guidance on optimal application of these analytics to enhance feedback is needed. This study aimed to establish consensus on advanced operative performance metrics in surgical training and identify applications to enhance feedback.

Design

Data-driven 3-round Delphi method, where experts iteratively rated performance metrics derived from the surgical literature on a 7-point Likert scale. Statements reaching the a priori consensus threshold [round 1: > 75% rating => 6; round 2 + 3 > 50% rating => 6] were included in the final consensus statement.

Setting

Electronic survey, using the research electronic data capture (REDCap) system.

Participants

A pan-specialty, international panel of 57 surgical trainers, trainees and researchers.

Results

Twenty-two statements met the consensus threshold, of which 10 represented individual training specific metrics. Technical metrics were: dissection in the correct tissue plane (58.1% consensus); economy of motion (58.1%); and technical errors (53.5%). Non-technical metrics were: situation awareness (60.5%); communication (55.8%); decision -making (51.2%); and cognitive load (51.2%). Outcome metrics included: a safety score (60.5%); duration to react to adverse events (60.5%); and a global performance score (53.5%). Suggested applications to surgical training included: comparing individual metrics over time (65.1%); benchmarking against the average performance of trainees with similar experience (55.8%); and guiding formative assessments (51.2%).

Conclusions

This Delphi study established international consensus on advanced operative performance metrics, providing a foundation for improved feedback in surgical training. Demonstrating validity of existing metrics and developing novel ones that were highly ranked are the critical next steps to integrate advanced technologies into surgical training curricula, enhancing trainee development and patient safety.

Keywords

artificial intelligence

feedback

patient care

operative performance

performance metrics

surgical training

technology

© 2025 The Author(s). Published by Elsevier Inc. on behalf of Association of Program Directors in Surgery.

Comments (0)

No login
gif