Emerging digital technologies promise to shape the future health care industry [,]. According to our previous review [], most researchers had a positive impression of digital health interventions (DHIs). The number of DHIs is proliferating [-], which is affecting the way patients receive their health care services compared with face-to-face health care services and ultimately influencing the patient journey and overall patient experience (PEx) [,]. Good PEx is a key intent of patient-centered care [] and a core measure of care quality in digital health (DH) [,]. Digital technologies have the potential to enhance or provide comparable PEx compared with some face-to-face health care services [,-]. However, the uptake of digital technologies in health care is not as rapid as it has been in many other industries [], and their potential in health care remains unfulfilled []. According to a report by the World Health Organization (WHO) on the classification of DHIs, the health system is not responding adequately to the need for improved PEx [].
Despite the growing number of DHIs, evaluations that are timely, cost-effective, and robust have not kept pace with this growth [,,]. PExs in the wide range of DHIs are mixed [,]. Few published DHIs have resulted in high download numbers and active users []; most are released with minimal or no evaluation and require patients to assess the quality for themselves and take responsibility for any consequences []. Low-quality DH may disrupt user experience (UX) [], resulting in low acceptance, and some may even be harmful []. In addition, a DHI may be popular with patients but not valued by clinicians []. To generate evidence and promote the appropriate integration and use of digital technologies in health care, an overview of how to evaluate PEx or UX in varied DHIs is needed [,].
Evaluating the Digital PExIn this study, we used the definition of digital PEx from our previous review []: “the sum of all interactions affected by a patient’s behavioral determinants, framed by digital technologies, and shaped by organizational culture, that influence patient perceptions across the continuum of care channeling digital health.” This incorporates influencing factors of digital PEx [] and the existing definitions of DHIs [,], PEx [], and UX []. Compared with the general PEx and UX, it highlights patient perceptions that are affected by technical, behavioral, and organizational determinants when interacting with a DHI. DHI has become an umbrella term that often encompasses broad concepts and technologies [], such as DH applications, ecosystems, and platforms []. In this study, we followed the WHO’s definition of DHIs [], that is, the use of digital, mobile, and wireless technologies to support the achievement of health objectives. It refers to the use of information and communication technologies for health care, encompassing both mobile health and eHealth [,]. Compared with evaluating DHIs, PEx, and UX, little is known about evaluating digital PEx. However, combining the definition of digital PEx with the extensively explored measurement of PEx, UX, and DHIs can lead to an improved understanding of and enable the development of evaluation approaches for measuring digital PEx. Therefore, the evaluations of PEx, UX, and DHIs will be used as a starting point in this study to clarify when to measure, what to measure, and how to measure digital PEx.
When to MeasureFirst, the timing of measuring and evaluating digital PEx is an important consideration and must align with the contextual situation, such as evaluation objectives and stakeholders, to ensure practicality and purposefulness [,]. According to the European Union [] and the Department of Health of The King’s Fund [], an evaluation can be scheduled during the design phase or during or after the implementation phase. Similarly, the WHO [] introduced 3 DHI evaluation stages: efficacy, effectiveness, and implementation. The evaluation of efficacy refers to where the intervention is under highly controlled conditions, the evaluation of effectiveness is carried out in a real world context, and the evaluation of implementation occurs after efficacy and effectiveness have been established. Furthermore, an evaluation can be performed before, during, or after the evaluated intervention in both research and nonresearch settings []. However, decision-making on when to collect PEx data can be more complicated. As argued in earlier studies [,], immediate feedback has the benefit of gaining real-time insights, but patients may be too unwell, stressed, or distracted to provide detailed opinions. In contrast, when the feedback is related to medical outcomes or quality of life, it often requires a lengthy period after the intervention to observe any changes. However, responses gathered long after a care episode may be inferior because of recall bias.
What to MeasureSecond, there is a need for a decision on what is required to measure to assess digital PEx. The frequently mentioned UX evaluation concepts, such as usability, functionality, and reliability, from studies [-] investigating UX can be applied to evaluate the intervention outputs to anticipate digital PEx at a service level. Moreover, according to the existing constructs and frameworks of understanding or evaluating PEx [-], such as emotional support, relieving fear and anxiety, patients as active participants in care, and continuity of care and relationships, they can be adjusted to evaluate digital PEx by understanding patient outcomes at an individual level. In addition, the National Quality Forum [] proposed a set of measurable concepts to be used to evaluate PEx in telehealth, for example, patients’ increased confidence in, understanding of, and compliance with their care plan; reduction in diagnostic errors and avoidance of adverse outcomes; and decrease in waiting times and eliminated travel. Some of these concepts can be used to understand digital PEx at an organizational level by assessing the impact of the health care system.
How to MeasureThe third consideration is how to choose evaluation approaches appropriate for evaluating the digital PEx [], starting from widely used theories, study designs, methods, and tools for evaluating DHIs and the related PEx or UX. There is rapidly evolving guidance for guiding DH innovators [], such as the National Institute for Health and Care Excellence Evidence Standards Framework for Digital Health Technologies []. The strength of the evidence in the evaluation of DHIs often depends on the study design []. However, the high bar for evidence in health care usually requires a longer time for evidence generation, such as prospective randomized controlled trials (RCTs) and observational studies, which often conflicts with the fast-innovation reality of the technology industry [,]. In addition, many traditional approaches, such as qualitative and quantitative methods, can be used to collect experience-related data to evaluate the DHIs [,]. Qualitative methods such as focus groups, interviews, and observations are often used to obtain an in-depth understanding of PEx [] in the early intervention development stages []. Surveys using structured questionnaires, such as patient satisfaction ratings [,], patient-reported experience measures (PREMs) [,], and patient-reported outcome measures (PROMs) [,,], are often used to examine patterns and trends from a large sample. Hodgson [] believed that strong evidence results from UX data that are valid and reliable, such as formative and summative usability tests, and stated that behavioral data are strong, but opinion data are weak.
ObjectivesThis study aims to systematically identify (1) evaluation timing considerations (ie, when to measure), (2) evaluation indicators (ie, what to measure), and (3) evaluation approaches (ie, how to measure) with regard to digital PEx. The overall aim of this study is to generate an evaluation guide for further improving digital PEx evaluation research and practice.
This study consists of 2 phases. In phase 1, we followed the same study search and selection process as our previous research [] but focused on a different data extraction and analysis process to achieve our objectives in this study. In the previous study [], we identified the influencing factors and design considerations of digital PEx, provided a definition, constructed a design and evaluation framework, and generated 9 design guidelines to help DH designers and developers improve digital PEx. To highlight the connections between “design” and “evaluation” works in the development of DH and provide readers with a clear road map, we included some evaluation-related information in the previous paper as well. However, it was limited and described at a very abstract level. In this study, detailed information on the evaluation was provided, including evaluation timing considerations, evaluation indicators, and evaluation approaches, and we aimed to generate an evaluation guide for improving the measurement of digital PEx. Given that this is an evolving area, after we finished phase 1, we conducted an updated literature search as a subsequent investigation to determine whether an update of a review was needed in this study.
Phase 1: The Original ReviewStudy Search and SelectionFollowing the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [], we conducted an umbrella systematic review [] on literature reviews related to PEx and UX in DH. The term DH was first introduced in 2000 by Frank []. Therefore, Scopus, PubMed, and Web of Science databases were used for searching related articles that were published between January 1, 2000, and December 16, 2020. Furthermore, Google Scholar was used to search for additional studies that were identified during the review process through the snowballing method. The computer search resulted in 173 articles, of which 58 (33.5%) were duplicates. After removing the duplicates, the titles and abstracts of a small random sampling (23/115, 20%) were reviewed by 2 independent raters to assess the interrater reliability by using the Fleiss-Cohen coefficient, which resulted in k1=0.88 (SE 0.07; 95% CI 0.74-1.03). This was followed by a group discussion to reach an agreement on the selection criteria. Subsequently, the remaining titles and abstracts (92/115, 80%) were reviewed by TW individually. After screening the titles and abstracts, half of the articles (58/115, 50.4%) remained for the full-text review. Meanwhile, 4 additional articles were identified through snowballing and were included in the full-text screening. Another small random sample (12/62, 19%) was reviewed by the 2 raters to screen the full texts. After achieving interrater reliability, k2=0.80 (SE 0.13; 95% CI 0.54-1.05) and reaching a consensus on the inclusion criteria through another group discussion, TW reviewed the full texts of the remaining papers (50/62, 80%). Google Sheets was used for performing the screening process and assessments. Finally, as shown in [], a total of 45 articles were included for data extraction. A detailed search strategy, selection criteria, and screening process can be found in our previously published study []. [-] presents the included and excluded articles.
 Figure 1.  Study flow diagram. ICT: information and communication technology. Data Extraction and Thematic Analysis
Figure 1.  Study flow diagram. ICT: information and communication technology. Data Extraction and Thematic AnalysisWe used ATLAS.ti (Scientific Software Development GmbH; version 9.0.7) for data extraction. Data were extracted for the three predefined objectives: (1) evaluation timing considerations, (2) evaluation indicators, and (3) evaluation approaches of the digital PEx. In addition, we collected data related to evaluation objectives among the included studies. Data analysis followed the 6-phase thematic analysis method proposed by Braun and Clarke [,]: familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. First, we became familiar with the 45 articles included in the study. Second, after a thorough review, TW started iteratively coding the data related to the predefined objectives based on existing frameworks, including the Performance of Routine Information System Management framework [], monitoring and evaluation guide [], measures of PEx in hospitals [], and an overview of research methodology []. This resulted in 25 initial codes. After no additional new codes were identified, TW proposed a coding scheme to summarize the recurring points throughout the data. Then, GG, RG, and MM reviewed and discussed the coding scheme until they reached an agreement. Third, TW followed the coding scheme to code the data more precisely and completely and searched for themes among the generated codes. Fourth, TW, GG, RG, and MM reviewed and discussed these codes and themes to address any uncertainties. Fifth, the definitions and names of the generated themes were adjusted through team discussions. Finally, the analytical themes related to the evaluation timing, indicators, and approaches were produced and reported. Both deductive and inductive approaches [] were used to identify and generate themes. Four researchers were involved in the review process.
We first highlighted the evaluation timing considerations in terms of intervention maturity stages, the timing of evaluation, and the timing of datacollection, which were adopted from the description of the WHO and European Union () [,].
We then determined the evaluation indicators and classified them into 3 categories (). Intervention outputs are the direct products or deliverables of process activities and refer to the different stages of evaluation that correspond to the various stages of maturity of the DHI. Patient outcomes describe the intermediate changes in patients, including patients’ emotions, perceptions, capabilities, behaviors, and health conditions as determined by DHIs in terms of influencing factors and interaction processes. Health care system impact is the medium- to long-term, large-scale financial (intended and unintended) effects produced by a DHI.
Finally, we concluded evaluation approaches in terms of study designs, data collection methods and instruments, and data analysis approaches (). According to the WHO [], study designs are intended to assist in decision-making on evidence generation and clarify the scope of evaluation activities. Data collection and analysis are designed through an iterative process that involves strategies for collecting and analyzing data and a series of specifically designed tools [].
Table 1. Initial codes of evaluation timing considerations of the digital patient experience.Categories and initial codesDescriptionIntervention maturity stages [,,]aDHI: digital health intervention.
Table 2. Initial codes of evaluation indicators of the digital patient experience.Categories and initial codesDescriptionIntervention outputs [,-,]aDHI: digital health intervention.
bDH: digital health.
Table 3. Initial codes of evaluation approaches of the digital patient experience.Categories and initial codesDescriptionStudy designs []The decision to undertake an update of a review requires several considerations. Review authors should consider whether an update for a review is necessary and when it will be more appropriate []. In light of the “decision framework to assess systematic reviews for updating, with standard terms to report such decisions” [], we consider that research on PEx in DH remains important and evolves rapidly. In case we missed some newly published articles that would bring significant changes to our initial findings, we conducted a rapid scoping search for articles published after our last search. We reran the search strategy as specified before with the addition of date (from December 16, 2020, to August 18, 2023) limits set to the period following the most recent search. After removing duplicates (73/367, 19.8%), we collected 294 articles in total. Following the same screening process and selection criteria, we finally identified 102 new eligible articles. The excluded articles were either not a literature review with systematic search (74/294, 25.2%), not about DH (87/294, 29.6%), not about PEx (26/294, 8.8%), our own parallel publications (2/294, 0.7%), or not accessible in full text (3/294, 1%). The eligible and ineligible articles in this phase are available in . We found that the outcomes in the new studies were almost consistent with the existing data. For example, these articles either aimed to investigate what factors influence the feasibility, efficacy, effectiveness, design, and implementation of DH; examine how patients expect, perceive, and experience the DHIs; or intend to compare the DHIs with conventional face-to-face health care services. The research objectives of these new eligible articles are available in . We considered that their findings were unlikely to meaningfully impact our findings on when to measure, what to measure, and how to measure digital PEx. As suggested by Cumpston and Chandler [], review authors should decide whether and when to update the review based on their expertise and individual assessment of the subject matter. We decided to use these new articles as supplementary materials ( and ) but did not integrate them into the synthesis of this review.
This paper is a part of a larger study, and we have presented results related to study characteristics in a previous publication []. [-] provides detailed information regarding the characteristics of the included reviews, including research questions or aims, review types, analysis methods, number of included studies, target populations, health issues, and DHIs reported in each review. In this study, to achieve our research objectives, we identified reviews that reported different intervention maturity stages, timing of the evaluation, and timing of data collection. In addition, we identified a set of evaluation indicators of digital PEx and classified them into 3 predefined categories (ie, intervention outputs, patient outcomes, and health care system impact), which in turn included 9 themes and 22 subthemes. Furthermore, we highlighted evaluation approaches in terms of evaluation theories, study designs, data collection methods and instruments, and data analysis methods. We found that it was valuable to compare the evaluation objectives of the included studies. Therefore, we captured 5 typical evaluation objectives and the stakeholders involved, which clarified why and for whom DH evaluators carried out the evaluation tasks. The detailed findings are presented in the Evaluation Objectives section.
Evaluation ObjectivesOur review findings highlighted 5 typical evaluation objectives.
The first objective was to broaden the general understanding of the digital PEx and guide evaluation research and practice (11/45, 24%) [-]. For instance, 1 review [] aimed to identify implications for future evaluation research and practice on mental health smartphone interventions by investigating UX evaluation approaches.
The second was to improve the design, development, and implementation of the DHI in terms of a better digital PEx (15/45, 33%) [-,-]. As demonstrated in an included review [], the evaluation of DHIs is critical to assess progress, identify problems, and facilitate changes to improve health service delivery and achieve the desired outcomes.
The third was to achieve evidence-based clinical use and increase DHIs’ adoption and uptake (14/45, 31%) [,,,-,,,,-].
The fourth was to drive ongoing investment (3/45, 7%) [,,]; without compelling economic supporting evidence, the proliferation of DHIs will not occur. Therefore, ensuring the sustained clinical use, successful implementation, and adoption of and continued investment in DHIs require more evaluative information. This helps ensure that resources are not wasted on ineffective interventions [].
The fifth was to inform health policy practice (3/45, 7%) [,,]. As the 2 included articles stated [,], ongoing evaluation and monitoring of DHIs is critical to inform health policy and practice. In addition, in terms of the varied evaluation objectives, the evaluation activities serve different stakeholder groups, including program investigators, evaluators, and researchers; designers, developers, and implementers; end users, patients, and health care providers (HCPs); clients and investors; and governments and policymakers.
Evaluation Timing ConsiderationsAmong the included studies, evaluations were carried out at various stages of the intervention to fulfill the 5 evaluation objectives. Our findings showed that most reviews reported feasibility, efficacy, and pilot studies (32/45, 71%) [,,,-,,-] and then investigated effectiveness (20/45, 44%) [,,,,,,,,,,,,-,,-] and implementation studies (20/45, 44%) [,,,,,,,,,,,-,,,,,,]. Notably, some reviews included >1 type of study. Our findings show that the timing of evaluation can be directly at pre- or postintervention [,,,,-,-,,,,,,,,,,], at the baseline point or after a short- or long-term follow-up intervention [,,,,,-,,,,,,,,,,,], during intervention use [,], continued monitoring [,], and even at dropout []. One study [] suggested providing a period of technical training and conducting a baseline test to reduce the evaluation bias caused by individual technology familiarity and novelty. As demonstrated by another study [], pre- and postintervention assessments using clinical trials can measure intervention effectiveness (eg, patients’ blood glucose levels). In terms of the timing of data collection, 1 included study [] suggested that evaluations directly after the intervention are appropriate so that the users retain fresh memories of the experience. To sustain intervention outcomes over a longer period, longitudinal evaluations and long-term follow-up evaluations were recommended in 2 studies [,].
Evaluation IndicatorsOverviewEvaluation indicators relate to the goal to which the research project or commercial program intends to contribute. Indicators are defined as “a quantitative or qualitative factor or variable that provides a simple and reliable means to measure achievement, to reflect the changes connected to an intervention, or to help assess the performance of a development actor” []. On the basis of our initial codes, we grouped the evaluation indicators into 3 main categories: intervention outputs, patient outcomes, and health care system impact. Each category contains several themes and subthemes (-) and is discussed in detail in the below 3 sections: Intervention Outputs, Patient Outcomes, and Health Care System Impact.
Table 4. Themes, subthemes, and evaluation indicators of the intervention outputs of the digital patient experience.Themes and subthemesStudies (n=45), n (%)Evaluation indicatorsReferencesFunctionality (n=36, 80%)aHCP: health care provider.
Table 5. Themes, subthemes, and evaluation indicators of patient outcomes of the digital patient experience.Themes and subthemesStudies (n=45), n (%)Evaluation indicatorsReferencesEmotional outcomes (n=32, 71%)aDHI: digital health intervention.
bHCP: health care provider.
Table 6. Themes, subthemes, and evaluation indicators of health care system impact of the digital patient experience.Themes and subthemesStudies (n=45), n (%)Evaluation indicatorsReferencesEconomic outcomes (n=16, 36%)aDHI: digital health intervention.
Intervention OutputsIntervention outputs are partially determined by the intervention inputs and processes (ie, influencing factors and design considerations, such as personalized design) []. We identified 3 themes and 8 subthemes within this category (). The first theme, functionality, refers to the assessment of whether the DHIs work as intended. The subthemes included (1) the consistency of intended value (eg, the ability of the DHIs to collect the amount of accurate clinical metrics in real time [,,,]), (2) the quality of content and information (eg, tailored content [,,,,,,,]), (3) the appropriateness of intervention features (eg, the degree of system setup [,]), and (4) the use of intervention theories (eg, the presence of an underlying theoretical basis [,,,,,,,,]). The second theme, usability, refers to whether the DH system is used as intended []. Both technology quality attributes (eg, ease of use [-,,,,,,,,,]) and interaction design (eg, intuitive interface design [,,]) can be used for usability evaluations. The third theme, care quality, refers to effective, safe, people-centered, timely, accessible, equitable, integrated, and efficient care services []. For example, the assessment of convenient care accessibility (eg, care that fits into daily routines [,,,,,,,] and the credibility of DHIs’ owners [,]).
Patient OutcomesStudies used a variety of quantitative and qualitative factors and variables to measure and describe patient outcomes (), referring to 5 themes (emotional outcomes, perceptual outcomes, capability outcomes, behavioral outcomes, and clinical outcomes) and 12 subthemes. Emotional outcomes relate to patients’ positive or negative feelings that result from the use or anticipated use of DHIs. For example, a high level of patient satisfaction [,,,-,,,-,,,-,-] is a typical positive feeling. Increased concerns about data privacy and security [,,,,,,,] is a frequently mentioned negative feeling. Perceptual outcomes are the informed states of mind or nonemotional feelings the patients achieve before, during, or after using the DHIs [], including patients’ initial attitudes toward the DHIs (eg, internal motivation [,,,,,,]); patient-to-provider relationships, for example, those that are enhanced by perceived improved accessibility to HCPs [,,,,,,,,] versus those that are interfered with by perceived loss of face-to-face contacts [,,,,,,]; perceived empowerment (eg, increased confidence in managing their health conditions [,,,,,]) and burden (eg, increased perception of restriction [,-,,,,]); and overall acceptance of the DHIs (eg, willingness to use [,,,]). Capability outcomes refer to the improvement in patients’ self-management autonomy, health knowledge, and clinical awareness. DHIs may be effective at improving their independency, self-management autonomy, problem-solving, and decision-making skills [,,,,,,-,,,,]; gaining health literacy, knowledge, or understanding of their health conditions or care plans [,,,,,,,,]; and raising their clinical awareness to be more certain of when it was necessary to seek medical attention [,,,,]. Behavioral outcomes include activities that the patients adopt owing to DHIs [], including adherence to the intervention (eg, dropout rates [,,,,,,]), self-management behaviors (eg, physical and diet activities [,,,,,,]), and patient-to-provider communication (eg, increased interactions between patients and HCPs [,,,,,,,,,,]). Clinical outcomes are related to individual health conditions and the main intentions of the DHIs. For example, a reduction in anxiety, depression, and stress [,-,,,,,,,,,] and increased symptom control [,,,,,-,] can help to measure the individual health conditions.
Health Care System ImpactHealth care system impact contains 1 theme and 2 subthemes. Economic outcomes refer to the cost-effectiveness and health care services use. In terms of cost-effectiveness, for example, studies report less out-of-pocket expenses for patients because of reduced care and travel costs [,,,,,,,,] and greater time efficiency owing to shorter waiting, travel, and consultation time [,,,,,,]. Furthermore, indicators related to health care service use, such as the reduced number of hospital [,,,,] and emergency department visits [,], can be used to assess savings regarding health care services.
Evaluation ApproachesOverview of the ApproachesIn addition to evaluation timing considerations and indicators, strategies and specifically designed tools for collecting and analyzing data are required to set up the evaluation plan. Various evaluation approaches were identified based on our initial codes; these are depicted in 3 aspects (-): study designs, data collection methods and instruments, and data analysis approaches. Furthermore, we collected data related to evaluation theories that were used to guide the study designs, data collection, and analysis.
Table 7. Study designs for evaluating the digital patient experience.Study designsStudies, n (%)ReferencesMode of inquiry (n=36, 80%)Our findings showed that in some cases, theories are used to guide the evaluation process. An included review [] mapped various DHI evaluation frameworks and models into conceptual, results, and logical frameworks as well as theory of change. Among the included reviews, the National Quality Forum [,], UX model [], American Psychiatric Association A
Comments (0)