One of the objectives of procedural and surgical education is to train healthcare practitioners to perform interventional procedures safely and efficiently.1 Traditional teaching methods have used curriculum-based approaches, model-based simulations, and mentorship programs to advance trainees to expert levels, aiming to enhance patient safety and optimize outcomes. However, these models may lack fidelity and specificity, and mentorship programs require extensive time, training, and effort from both faculty and students. There is an urgent need for effective and efficient surgical skills training. Medical and surgical errors remain major healthcare expenditures and contribute to patient morbidity and mortality worldwide.2
The digital transformation3 of extended reality (XR) is ushering in a paradigm shift in health care, seeking to optimize patient care and address the escalating educational requirements of healthcare professionals for skill proficiency. Extended reality encompasses technologies such as virtual4 reality (VR), augmented4 reality (AR), and mixed5 reality (MR). Advocates of XR suggest that it offers advantages to trainees by enhancing their learning environment through heightened sensory components in education, training, and patient care. Current applications of XR include creating immersive surgical environments with head-mounted devices, guiding surgeons on precise device placement and surgical approaches in the operating room through stereotactic navigation with augmented overlays, and developing interactive and immersive simulation modules incorporating XR specifically tailored to trainees.
A significant body of literature describing the technological advancements exists in the use of XR for surgical or procedural training, including multiple systematic reviews.6–12 These publications coincide with the decreasing costs of XR devices13 and their increasing implementation.14–22 Many XR studies focus on a subset of XR, a particular specialty, or reactions and knowledge. However, the overall impact of XR training on behavior and patient outcomes compared with standard training methods is not fully understood. Healthcare educators are exploring whether XR modalities can replace procedural training be used to supplement standard teaching methods. In addition, studies involving XR training are highly heterogeneous among populations and clinical settings making it challenging to compile accurate evidence to support when it should be used.
Therefore, our objective in this comprehensive systematic review was to investigate the question: how does the use of XR technology in any healthcare discipline for procedural and surgical training compare with standard teaching methods for educational learning? This analysis is critical for making informed decisions regarding the implementation of these technologies in training programs worldwide. We hypothesize that the use of XR technologies is at least equivalent to standard methods for technical-based training and patient outcomes.
METHODSA working group established by the simulation for society in healthcare in 2021 conducted a systematic review of the current literature examining the impact of XR, encompassing VR, AR, and MR, on surgical training. No ethical review board approval was necessary as no human subjects were involved.
Systematic Review ProcessWe adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses23 guidelines to conduct and report the findings of this systematic review. We used the Population, Intervention, Comparison, Outcome (PICO) method24 to formulate the following research question: Among health professions learners, what is the comparative effectiveness of XR simulations versus standard training methods in enhancing learning and improving patient outcomes?
Search Criteria and EligibilityWe conducted a search on October 8, 2021, in the MEDLINE and Embase databases, retrieving titles and abstracts without any date limitations. We examined the reference lists of review articles and identified four additional records.25–28 Appendix A (see Text, Supplemental Digital Content 1, https://links.lww.com/SIH/B11, which shows the search criteria) summarizes the search criteria used.
For inclusion in the study, eligible studies had to compare XR with standard methods for surgical or procedural training for any healthcare discipline. We considered systematic reviews on XR usage and any other study that compared XR use with standard training methods, including randomized controlled trials (RCTs), comparative studies, pilot studies, feasibility studies, and proof-of-concept studies. Studies with measurable outcomes29 related to satisfaction, confidence, knowledge, behavior, patient outcomes, mechanical outcomes, or transferability of surgical skills were included. Case studies, editorials, conference papers, and abstracts without full texts were excluded. Non-English studies or studies that solely reported satisfaction or feasibility outcomes were also excluded.
Screening ProcessA systematic review software, Covidence (Veritas Health Innovation, Melbourne, Australia, available at http://www.covidence.org), was used to remove duplicates and facilitate the screening of titles, abstracts, and full texts. Each of the 3639 titles and abstracts was independently reviewed by at least two reviewers for eligibility. A separate investigator then handled any discrepancies during the screening process to finalize the article selection for full-text review (n = 136). During our full-text review, we found that XR technologies used in medicine before 2016 generally did not align with the current definition of XR, particularly in the case of VR (eg, screen-based systems often being identified as VR). The term “virtual reality” has been in use for decades, and its definition has evolved, encompassing a spectrum from screen-based systems to immersive VR. For VR, this review focused on immersive VR (iVR) that uses stereoscopic imaging (eg, head-mounted displays, robot simulators) to create a virtual environment encompassing the entire visual field30; thus, we excluded the use of screen-based systems, such as those commonly used in VR laparoscopic trainers, because we wanted to specifically assess the comparisons between real environments and immersive environments. Given the emergence of commercially available XR devices and iVR technologies around 2015, our team concluded that medical studies using XR technology before 2016 generally did not involve comparable XR technology with that of 2016 and beyond. Thus, we did not include studies prior to 2016, including any immersive robotic simulation studies to limit representation bias.
One distinction that we clarify in surgical simulation with novel devices (robotic, VR, MR, AR) is that the overall goal is to train toward the surgical procedure—and not toward the device itself. As an example, the training in robotic VR surgery encompasses several learning and technical hurdles. First, the trainee must learn how to operate the robot, that is, manipulating the arms in multiple dimensions (anterior/posterior, medial/lateral, rotational, etc) and use the different instruments (camera, graspers, cutters). This adds an extra dimension to learning the surgery, as the use of the robot adds a new layer of complexity that is different than standard open surgery. Second, the trainee must learn how to manipulate tissue planes using the robotic arms for the specific surgery. Third, the trainee must learn the concepts of how to perform the surgery efficiently and safely.
Data Extraction, Levels of Evidence, and Critical AppraisalAfter identifying the eligible full-text articles (n = 32), we categorized them based on their study designs: RCTs, comparative studies, and systematic reviews. To facilitate data extraction, we used a standardized data extraction sheet in Appendix B (see Text, Supplemental Digital Content 2, https://links.lww.com/SIH/B12, which shows the data extraction sheet) and assigned them levels of evidence based on the Kirkpatrick model.29 We also used the Critical Appraisal Skills Program (CASP)31 tool to assess study quality because this tool is familiar with the team and the bias domains captured the variability of study designs.
Data Analysis and SynthesisThe findings of each study were summarized in tabular format, and themes for outcomes were identified to enable comparisons across different study types during the final synthesis. The findings were categorized into measurable outcomes such as knowledge, technical scores, and patient outcomes. In addition, cost analysis and adverse effects were also reported. Because of the heterogeneity among the included studies, a meta-analysis was not feasible. We used the Joanna Briggs Institute umbrella review for systematic reviews32 to summarize the data presented in the included systematic reviews.
RESULTSThe workflow used to identify eligible studies is depicted in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram in Figure 1. Table 1 provides study characteristics, levels of evidence, and critical appraisal (risk of bias) of the 32 included studies. Among these studies, there were 18 RCTs, seven comparative studies, and seven systematic review articles. The studies overall encompassed eight medical fields, with orthopedics and surgery accounting for most of the studies. The outcomes reported in most studies included Kirkpatrick levels of evidence I–III [reactions (56%), knowledge (50%), and behavior (100%), respectively], with only some studies reporting level IV outcomes [patient (28%)]. Overall, the risk of bias across the studies was determined to be low. Appendix C shows CASP results for RCT studies (see Text, Supplemental Digital Content 3, https://links.lww.com/SIH/B13, which shows CASP results for RCT studies). Appendix D shows CASP results for comparative studies (see Text, Supplemental Digital Content 4, https://links.lww.com/SIH/B14, which shows CASP results for comparative studies). Appendix E shows CASP results for systematic reviews (see Text, Supplemental Digital Content 5, https://links.lww.com/SIH/B15, which shows CASP results for systematic reviews). All studies measured at least one objective outcome, which includes knowledge, time-to-task completion, technical scores, repetitive training, and patient outcomes. Subject matter, control and training tasks, participants, location of studies, XR platforms used, and study outcomes are summarized in Tables 1 to 3.
FIGURE 1:Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram.
TABLE 1 - Study Characteristics, Levels of Evidence, and Critical Appraisal (Risk of Bias) Author Year Country Study Type Topic N (Participants/Studies) Participant Type**Participant type: S = students (eg, medical, nursing students) not including residents; T = trainees (eg, residents); N = attending-novice to XR; E = attending-expert to XR.
UK, United Kingdom; US, United States.
Comments (0)