Does Extended Reality Simulation Improve Surgical/Procedural Learning and Patient Outcomes When Compared With Standard Training Methods?: A Systematic Review

INTRODUCTION

One of the objectives of procedural and surgical education is to train healthcare practitioners to perform interventional procedures safely and efficiently.1 Traditional teaching methods have used curriculum-based approaches, model-based simulations, and mentorship programs to advance trainees to expert levels, aiming to enhance patient safety and optimize outcomes. However, these models may lack fidelity and specificity, and mentorship programs require extensive time, training, and effort from both faculty and students. There is an urgent need for effective and efficient surgical skills training. Medical and surgical errors remain major healthcare expenditures and contribute to patient morbidity and mortality worldwide.2

The digital transformation3 of extended reality (XR) is ushering in a paradigm shift in health care, seeking to optimize patient care and address the escalating educational requirements of healthcare professionals for skill proficiency. Extended reality encompasses technologies such as virtual4 reality (VR), augmented4 reality (AR), and mixed5 reality (MR). Advocates of XR suggest that it offers advantages to trainees by enhancing their learning environment through heightened sensory components in education, training, and patient care. Current applications of XR include creating immersive surgical environments with head-mounted devices, guiding surgeons on precise device placement and surgical approaches in the operating room through stereotactic navigation with augmented overlays, and developing interactive and immersive simulation modules incorporating XR specifically tailored to trainees.

A significant body of literature describing the technological advancements exists in the use of XR for surgical or procedural training, including multiple systematic reviews.6–12 These publications coincide with the decreasing costs of XR devices13 and their increasing implementation.14–22 Many XR studies focus on a subset of XR, a particular specialty, or reactions and knowledge. However, the overall impact of XR training on behavior and patient outcomes compared with standard training methods is not fully understood. Healthcare educators are exploring whether XR modalities can replace procedural training be used to supplement standard teaching methods. In addition, studies involving XR training are highly heterogeneous among populations and clinical settings making it challenging to compile accurate evidence to support when it should be used.

Therefore, our objective in this comprehensive systematic review was to investigate the question: how does the use of XR technology in any healthcare discipline for procedural and surgical training compare with standard teaching methods for educational learning? This analysis is critical for making informed decisions regarding the implementation of these technologies in training programs worldwide. We hypothesize that the use of XR technologies is at least equivalent to standard methods for technical-based training and patient outcomes.

METHODS

A working group established by the simulation for society in healthcare in 2021 conducted a systematic review of the current literature examining the impact of XR, encompassing VR, AR, and MR, on surgical training. No ethical review board approval was necessary as no human subjects were involved.

Systematic Review Process

We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses23 guidelines to conduct and report the findings of this systematic review. We used the Population, Intervention, Comparison, Outcome (PICO) method24 to formulate the following research question: Among health professions learners, what is the comparative effectiveness of XR simulations versus standard training methods in enhancing learning and improving patient outcomes?

Search Criteria and Eligibility

We conducted a search on October 8, 2021, in the MEDLINE and Embase databases, retrieving titles and abstracts without any date limitations. We examined the reference lists of review articles and identified four additional records.25–28 Appendix A (see Text, Supplemental Digital Content 1, https://links.lww.com/SIH/B11, which shows the search criteria) summarizes the search criteria used.

For inclusion in the study, eligible studies had to compare XR with standard methods for surgical or procedural training for any healthcare discipline. We considered systematic reviews on XR usage and any other study that compared XR use with standard training methods, including randomized controlled trials (RCTs), comparative studies, pilot studies, feasibility studies, and proof-of-concept studies. Studies with measurable outcomes29 related to satisfaction, confidence, knowledge, behavior, patient outcomes, mechanical outcomes, or transferability of surgical skills were included. Case studies, editorials, conference papers, and abstracts without full texts were excluded. Non-English studies or studies that solely reported satisfaction or feasibility outcomes were also excluded.

Screening Process

A systematic review software, Covidence (Veritas Health Innovation, Melbourne, Australia, available at http://www.covidence.org), was used to remove duplicates and facilitate the screening of titles, abstracts, and full texts. Each of the 3639 titles and abstracts was independently reviewed by at least two reviewers for eligibility. A separate investigator then handled any discrepancies during the screening process to finalize the article selection for full-text review (n = 136). During our full-text review, we found that XR technologies used in medicine before 2016 generally did not align with the current definition of XR, particularly in the case of VR (eg, screen-based systems often being identified as VR). The term “virtual reality” has been in use for decades, and its definition has evolved, encompassing a spectrum from screen-based systems to immersive VR. For VR, this review focused on immersive VR (iVR) that uses stereoscopic imaging (eg, head-mounted displays, robot simulators) to create a virtual environment encompassing the entire visual field30; thus, we excluded the use of screen-based systems, such as those commonly used in VR laparoscopic trainers, because we wanted to specifically assess the comparisons between real environments and immersive environments. Given the emergence of commercially available XR devices and iVR technologies around 2015, our team concluded that medical studies using XR technology before 2016 generally did not involve comparable XR technology with that of 2016 and beyond. Thus, we did not include studies prior to 2016, including any immersive robotic simulation studies to limit representation bias.

One distinction that we clarify in surgical simulation with novel devices (robotic, VR, MR, AR) is that the overall goal is to train toward the surgical procedure—and not toward the device itself. As an example, the training in robotic VR surgery encompasses several learning and technical hurdles. First, the trainee must learn how to operate the robot, that is, manipulating the arms in multiple dimensions (anterior/posterior, medial/lateral, rotational, etc) and use the different instruments (camera, graspers, cutters). This adds an extra dimension to learning the surgery, as the use of the robot adds a new layer of complexity that is different than standard open surgery. Second, the trainee must learn how to manipulate tissue planes using the robotic arms for the specific surgery. Third, the trainee must learn the concepts of how to perform the surgery efficiently and safely.

Data Extraction, Levels of Evidence, and Critical Appraisal

After identifying the eligible full-text articles (n = 32), we categorized them based on their study designs: RCTs, comparative studies, and systematic reviews. To facilitate data extraction, we used a standardized data extraction sheet in Appendix B (see Text, Supplemental Digital Content 2, https://links.lww.com/SIH/B12, which shows the data extraction sheet) and assigned them levels of evidence based on the Kirkpatrick model.29 We also used the Critical Appraisal Skills Program (CASP)31 tool to assess study quality because this tool is familiar with the team and the bias domains captured the variability of study designs.

Data Analysis and Synthesis

The findings of each study were summarized in tabular format, and themes for outcomes were identified to enable comparisons across different study types during the final synthesis. The findings were categorized into measurable outcomes such as knowledge, technical scores, and patient outcomes. In addition, cost analysis and adverse effects were also reported. Because of the heterogeneity among the included studies, a meta-analysis was not feasible. We used the Joanna Briggs Institute umbrella review for systematic reviews32 to summarize the data presented in the included systematic reviews.

RESULTS

The workflow used to identify eligible studies is depicted in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram in Figure 1. Table 1 provides study characteristics, levels of evidence, and critical appraisal (risk of bias) of the 32 included studies. Among these studies, there were 18 RCTs, seven comparative studies, and seven systematic review articles. The studies overall encompassed eight medical fields, with orthopedics and surgery accounting for most of the studies. The outcomes reported in most studies included Kirkpatrick levels of evidence I–III [reactions (56%), knowledge (50%), and behavior (100%), respectively], with only some studies reporting level IV outcomes [patient (28%)]. Overall, the risk of bias across the studies was determined to be low. Appendix C shows CASP results for RCT studies (see Text, Supplemental Digital Content 3, https://links.lww.com/SIH/B13, which shows CASP results for RCT studies). Appendix D shows CASP results for comparative studies (see Text, Supplemental Digital Content 4, https://links.lww.com/SIH/B14, which shows CASP results for comparative studies). Appendix E shows CASP results for systematic reviews (see Text, Supplemental Digital Content 5, https://links.lww.com/SIH/B15, which shows CASP results for systematic reviews). All studies measured at least one objective outcome, which includes knowledge, time-to-task completion, technical scores, repetitive training, and patient outcomes. Subject matter, control and training tasks, participants, location of studies, XR platforms used, and study outcomes are summarized in Tables 1 to 3.

F1FIGURE 1:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram.

TABLE 1 - Study Characteristics, Levels of Evidence, and Critical Appraisal (Risk of Bias) Author Year Country Study Type Topic N (Participants/Studies) Participant Type*
Students = S
Trainees = T
Novice = N
Experts = E Simulation Device Kirkpatrick Levels of Evidence Risk of Bias From CASP Checklist Blumstein et al
26 2020 US RCT Orthopedics 20 S VR III Low Brown et al33 2017 US RCT Surgery 26 T VR III, IV Medium Hooper et al27 2019 US RCT Orthopedics 14 T VR I, II, III, IV Low Logishetty et al34 2019 UK RCT Orthopedics 24 T VR II, III, IV Low Logishetty et al35 2018 UK RCT Orthopedics 24 S AR I, II, III Low Lohre et al36 2020 Canada RCT Orthopedics 26 T, E VR I, II, III Low Lohre et al37 2020 Canada RCT Orthopedics 18 T VR I, II, III, IV Low Mladenovic38 2019 Serbia RCT Dentistry 41 S VR II, III Medium Mok et al39 2021 China RCT Orthopedics 121 S VR III Low Nair et al40 2021 India RCT Ophthalmology 19 T VR I, III Low Orland et al41 2020 US RCT Orthopedics 16 S VR I, II, III Low Rai et al42 2017 Canada RCT Ophthalmology 28 T AR I, III Low Roehr et al43 2021 US RCT Medical education 25 S VR I, II, III Low Satava et al44 2020 US (+ Greece, Italy, UK) RCT Surgery 99 T VR II, III Low Valdis et al45 2016 Canada RCT Cardiac surgery 40 T VR III, IV Low Van Gestel et al46 2021 Belgium RCT Neurosurgery 16 S AR I, II, III Low Xin et al47 2020 China RCT Orthopedics 24 T VR III Low Xin 2 et al28 2019 China RCT Neurosurgery 16 T VR III Low Al Janabi et al48 2020 UK Comparative Surgery 72 S, T, E MR I, III Medium Andersen et al25 2016 US Comparative Surgery 20 S AR III Low Cowan et al49 2021 US Comparative Urology 17 T, N, E VR III Low Llena et al50 2017 Spain Comparative Dentistry 41 S AR I, II, III Low Raison et al51 2020 UK, Italy Comparative Surgery 43 N VR I, III Low Rojas-Muñoz et al52 2019 US Comparative Surgery 20 S AR II, III Low Wolf et al53 2021 Switzerland Comparative Surgery 21 S AR I, III Low Barteit et al6 2021 Germany Systematic review Medical education 956/27 S, T, N, E VR, AR, MR I, II, III Low Kovoor et al7 2021 Australia Systematic review Surgery 779/24 S, T, N, E AR III, IV Low Laverdière et al8 2019 Canada Systematic review Orthopedics >289/41 S, T, N, E AR I, II, III, IV Low Mao et al9 2021 Canada Systematic review + meta-analysis Surgery 307/17 S, T, N, E VR I, II, III Low Ong et al10 2021 Singapore Systematic review Ophthalmology Unknown/87 S, T, N, E VR, AR, MR I, III, IV Low Polce et al11 2020 US Systematic review + meta-analysis Orthopedics 494/24 S, T, N, E VR III Low Williams et al12 2020 UK Systematic review Surgery 774/18 S, T, N, E AR I, II, III, IV Low

*Participant type: S = students (eg, medical, nursing students) not including residents; T = trainees (eg, residents); N = attending-novice to XR; E = attending-expert to XR.

UK, United Kingdom; US, United States.


TABLE 2 - Study Outcomes for RCTs and Comparative Studies Study
RCT/Comparative Control Task Intervention Task Assessment Task Outcome Assessment Outcome Blumstein et al
26
RCT Learners given 20 min to read printed handout with steps for tibia nailing VR simulation for tibia nailing (20 min) Dry-lab simulation using animal sawbone and surgical instruments used during the actual procedure immediately and 2 wk post Global assessment scale by blinded surgeon Higher Global Assessment Scale (*P) immediately and 2 wk post, and steps completed (P*), improved knowledge of instruments (P†) Brown et al33
RCT Learners trained using standard robotic surgery simulator VR robotic surgery simulator Skills test on VR robotic surgery device Overall improvement and percent improvement (using pretest and posttest scores from MScore software) VR robotic device was comparable to robotic device Hooper et al27
RCT Learners given a book chapter and two articles Trained with VR twice THA on cadavers “Ottawa surgical competency operating room evaluation” score and posttest Improved overall THA score (P†), significant technical performance (P†), no difference in test scores Logishetty et al34
RCT Learners used online THA manual and annotated prerecorded THA videos Weekly (for 6 wk) training program in VR simulation lab Cadaver THA Primary: grading scale; secondary: step completion by task-specific checklist, error in orientation; time Improved grading scale (P*), improved step completion (P*), less errors (P*), and faster (P‡) Logishetty et al35
RCT Learners given personalized training from surgeon AR (live holographic orientation feedback) THA orientation on model Errors in acetabular cup orientation after each training session Fewer errors (P*) in 1st assessment, but no difference by 4th assessment Lohre et al36
RCT Learners given technical journal article to read Glenoid exposure on VR Glenoid exposure on cadavers OSATS, lab metric, verbal answers, time to completion Faster (P†) with superior instrument handling (P†). Equivalent knowledge scores. Greater realism and teaching (P†) Lohre et al37
RCT Learners watched instructional video VR case-based module RSA on cadaver OSATS scoring by blinded reviewer, PrecisionOS score in VR group Faster (P*), better overall OSATS (P*), higher verbal questions (P†). Precision score correlated with OSATS score. Mladenovic et al38
RCT Learners given articles on theoretical learning Control + VR 2 H/wk for 4 wk Nerve block on each other Self-assessment of knowledge/skills, objective, heart rate, anesthesia rate, performance time Can localize puncture site and controlled hand movements sig P < 0.05, time to complete sig P < 0.05, anesthesia and heart rate no difference Mok et al39
RCT Learners given 8-H lecture + 6-H practical class over 2 wk + 1-H PowerPoint simulation Had control instruction + 1 H of VR simulation time Synthetic model tendon assessment Global rating score (blinded) Higher score on Global Rating Score (P*) Nair et al40
RCT Learners given conventional institution-specific curriculum VR—eye tunnel dissection 20 live tunnel construction small incision cataract surgery Total number of prespecified errors. VR fewer errors (P†) Orland et al41
RCT Learners provided with technique guide Three VR training sessions 3–4 d apart Dry lab simulation using animal sawbone and surgical instruments used during the actual procedure Observer rating (blinded) Higher completion rate (P†), fewer incorrect steps (P†) Rai et al42
RCT Learners provided with traditional teaching AR Eye Simulator Eye incision on the simulator Total raw score, total time elapsed, and performance AR better in total score (P†) and performance (function of time) (P‡) Roehr et al43
RCT Learners given student-led instruction using task trainers 45 min to practice the lumbar puncture on VR LP on a task trainer A critical action procedural checklist
Time to complete Both traditional training and VR improved:
–performance (P*)
–time to complete (P‡)
–knowledge and confidence (P*) Satava et al44
RCT Learners given conventional institution-specific curriculum VR robotic simulator Robotic surgical simulator using an avian model Task errors and duration on 5 basic robotic tasks and cognitive test scores, GEARS ratings, and robot familiarity checklist scores. Trainees performed tasks faster and with fewer errors on the VR robotic simulator than controls (P†) Valdis et al45
RCT No additional training (1) wet lab, (2) dry lab, (3) VR lab
All groups trained to a level of proficiency determined by two experts Robotic ITA harvest and mitral annuloplasty tasks on a porcine model Time-based scores on successful completion of the assessments, and
Mean GEARS score (compared against experts) Wet lab and VR met expert level proficiency
Mean training time shortest for the dry lab and longest for the VR (1.6 H vs 9.3 H, P*). Van Gestel et al46
RCT Learners were taught free-hand technique for EVD
Both groups received standardized training given by a certified neurosurgeon AR with an anatomical overlay and tailored guidance for EVD through inside-out infrared tracking EVD placements on a custom-made phantom head EVD placement accuracy (mean target error in mm) and its clinical quality based on mKS AR guidance higher outcome accuracy (P‡) and procedural quality (P†) Xin et al47
RCT Learners observed a task trainer demonstration by a senior doctor and watched training video VR surgical simulator training Thoracic and lumbar pedicle screw placement on patients under an expert surgeon's supervision The number of successful and failed nailing operations.
Postoperative CT scan was used to validate whether the screws were acceptable. VR better in success rate and accuracy rate of screw placement (P†) Xin et al28
RCT Learners given video and task trainer with a 50-min demonstration 3 times Same video + VR surgical simulator for 50 min 3 times Cadaver pedicle screws on a task trainer Pedicle violations, accuracy and success of screw placement VR sig (P†) for: grade I (high accuracy) screws, success rate, less time Al Janabi et al48
Comparative Learners provided with training model Ureteroscopy four times with 15-min practice sessions using MR Ureteroscopy on a training model Procedural completion time, performance evaluation, satisfaction Procedural completion time (P‡) and performance evaluation (P*) were improved with MR. Participants preferred MR training Andersen et al25
Comparative Learners completed readings and faculty demonstration Port placement and abdominal incision using AR Port placement and abdominal incision on a task trainer Placement error, number of focus shifts, completion time Placement error was less with AR (P*). Focus shifts were less with AR (P*)

Comments (0)

No login
gif