Depression is a pervasive mental health disorder affecting approximately 4 million adolescents in the United States, and this number is expected to grow annually [-]. Depression can be lifelong, often emerging in adolescence and increasing the risk of poor physical health, social and relational problems, academic and employment difficulties, and reduced overall well-being [-]. Despite a growing need for effective treatment options, adolescents access [,] and adhere to [,] mental health treatment at alarmingly low rates owing to a number of factors. Stigma and a general difficulty in identifying symptoms of a mental illness inhibit adolescents’ willingness or awareness of seeking treatment [,]. When they do seek help, adolescents face months-long waitlists []. Other external forces drive the unmet clinical needs of adolescents experiencing depression, including costs [,], a widening divide in the accessibility of treatment [,], and an outstripped supply of mental health professionals [,]. Barriers in access to mental health treatment and depression incidence rates in adolescents have been exacerbated by the COVID-19 pandemic, necessitating action and new solutions [-].
Digital health programs, accessible through smartphones and computers, offer potential avenues for delivering treatment options for mental health conditions, such as depression [-]. However, it is essential to acknowledge their limitations, which include the absence of evidence-based protocols and clinical validation [,,]. Digital therapeutics (DTx), which are evidence-based software programs designed to prevent, treat, or manage health conditions, serve to address these limitations []. Smartphone-based DTx are particularly well suited for adolescents given their technological literacy, ubiquitous cell phone use, and privacy preferences [,,,]. DTx are also easily scalable, can serve as adjunct to standard care, improve treatment fidelity [-], are cost-effective []. These technologies have been shown to be preliminarily effective in treating adolescents with a variety of mental health concerns including attention-deficit/hyperactivity disorder [], sleep disorders, eating disorders, and anxiety []. Ultimately, DTx fill critical gaps in non-DTx digital treatments and can help patients receive effective, data-driven treatment with the added benefit of home access at any time [,,].
Cognitive behavioral therapy (CBT) is a widely recognized and recommended treatment for depression and is considered the standard of care for adolescents [-]. Behavioral activation (BA), an important component of CBT [-], involves engaging in value-based behaviors that are rewarding or elicit a sense of mastery to help reduce depression symptoms using a combination of motivational strategies, reward seeking, and natural reinforcers, alongside reducing maladaptive and avoidant behaviors. BA has been shown to be an effective stand-alone treatment for adolescent depression [-]. Digital CBT applications have demonstrated promise in treating depression [,,,], and a BA-focused digital therapeutic may represent an effective and accessible treatment option for adolescents.
Limitations such as low user engagement and insufficient existing clinical validation (eg, randomized controlled trials [RCTs]) may limit the widespread adoption of DTx [,,,]. To our knowledge, there are no DTx for depression tailored specifically for adolescents, including BA-based DTx, and limited evidence exists regarding the efficacy of a smartphone-based DTx for adolescent depression [-].
ObjectivesThis study evaluates the efficacy of Spark, a 5-week self-guided mobile app intervention based on validated BA treatment protocols for adolescents with depression [,]. Adolescents aged 13 to 21 years were recruited nationwide into an internet-based RCT comparing Spark with an active control. Our main goal was to assess the preliminary efficacy of Spark in treating moderate to severe depression symptoms as an adjunct to usual care by measuring group differences in (1) changes in depression symptoms over time, including whether each group had a MCID in depression symptoms; (2) the proportion of participants who achieved remission; and (3) the proportion of participants who demonstrated treatment response. As this RCT was conducted between November 2020 and September 2021, another goal was to provide mental health resources to adolescents experiencing symptoms of depression during the COVID-19 pandemic.
This open-label, partial crossover RCT compared Spark with an app with psychoeducational content (control app) in adolescents aged 13 to 21 years with self-identified depression symptoms. The study was conducted in 2 phases: an evaluation of feasibility (phase 1) and a preliminary evaluation of efficacy (phase 2). The results from phase 1 (Spark version 2.0) have been reported elsewhere []. Here, we focus on the results from phase 2, which combined data collected using Spark version 2.1 and version 2.2 (refer to the Spark section for more information on the versions). Updates to the Spark app to version 2.2 were deployed approximately halfway during the study; version 2.1 was used between September 2020 and March 2021, and version 2.2 was used between June 2021 and September 2021. The study design, randomization, control app, and outcome measures were the same across both versions. There was an a priori plan to combine the data from the 2 Spark versions for analysis.
Ethical ConsiderationsThe study was approved by the Institutional Review Board of the Western Copernicus Group (20201686) under a nonsignificant risk investigational device exemption and overseen by an independent data and safety monitoring board. The trial was also registered on ClinicalTrials.gov (NCT04524598).
ParticipantsParticipants were recruited nationwide via internet-based advertising and word of mouth. The following inclusion criteria were applied in phase 2: (1) aged 13 to 21 years, (2) self-reported symptoms of depression, (3) residing in the United States for the duration of the 5-week study, (4) under the care of a US-based primary care or licensed mental health care provider and willing and able to provide the name and contact information of the provider during consent appointment, (5) English fluency and literacy of adolescents and consenting legal guardians if aged <18 years, (6) access to a smartphone (iPhone 5s or later or running Android 4.4 KitKat or later) and regular internet access, (7) willingness to provide informed e-consent or assent and have legal guardian willing to provide informed e-consent if aged >18 years, and (8) stable for at least 2 months on any treatment (including medication or psychotherapy) for a mental health disorder.
Participants were excluded if they self-reported any of the following: (1) self-reported lifetime suicide attempt or active self-harm or active suicidal ideation with intent; (2) diagnosed by a clinician with bipolar disorder, substance use disorder, or any psychotic disorder including schizophrenia; (3) incapable of understanding or completing study procedures and digital intervention as determined by the participant, patient or legal guardian, health care provider, or clinical research team, and (4) previously participated in the user testing or clinical testing of the Spark app.
The legal guardians provided consent for participants aged <18 years. Adolescent participants provided consent unless they were under 18. If they were under 18, their legal guardians provided consent and they provided assent.
ProcedureDuring a virtual consent and enrollment session over video conferencing, participants and their legal guardians (if aged <18 y) were provided with study details, provided electronic informed consent or assent, were assessed for eligibility, and completed web-based baseline questionnaires. Participants downloaded an app on their smartphone and were provided with a safety plan template with instructions [] to complete it on their own as a personal resource. Participants were randomized 1:1 to the Spark or control arm using a block randomization approach ranging in multiples of 2 from 6 to 12 [].
All participants had access to their assigned app for the 5-week intervention period. During the intervention period, participants completed an in-app Patient Health Questionnaire-8 (PHQ-8) [], a self-report questionnaire to assess symptoms of depression, and an internally developed symptom check questionnaire weekly, which they could complete anytime over a 7-day period. Legal guardians completed a weekly symptom check about their child on the web. Following the 5-week intervention period, app access was restricted, and participants and legal guardians completed web-based postintervention questionnaires. Participants randomized to the control arm were also offered access to the Spark app after the 5-week intervention period and completed postintervention questionnaires on the web again following this second intervention period (partial crossover data not reported here). Baseline and postintervention measures assessed participant characteristics, concurrent treatments, depression and anxiety symptoms, resilience, app feedback, and impacts of COVID-19 (refer to for schedule and full descriptions) [-].
Table 1. Schedule of assessments and descriptions.Assessment nameBaselineWeek 1Week 2Week 3Week 4Week 5 postinterventionDescriptionBaseline questionnairea,b✓aCompleted by adolescent.
bCompleted by caregiver.
cPROMIS: Patient-Reported Outcomes Measurement Information System.
InterventionsBoth the Spark and control programs were divided into 5 modules recommended to be completed at a pace of 1 module per week over the 5-week intervention period, but the participants were able to progress at their own pace. Content for a given week was not expected to take >60 minutes to complete. All participants were prompted to complete a weekly PHQ-8 and a clinical concerns questionnaire on the mobile app. Automated app notification reminders to complete these questionnaires were sent. If users had not opened the app in 3 days, an automated app notification encouraging participants to use the app was sent. Automated reminder notifications were sent 7 days before the end of the intervention period to remind participants that the intervention period would be ending in 7 days. Crisis resources were available for participants to access anytime in each app.
Spark (Versions 2.1 and 2.2)The Spark app is based on CBT, which implements BA [,]. A character called “Limbot” is used as a therapeutic guide to encourage users to complete activities and model examples of activities for users. In the app, participants read text, answered questions, inputted text, and completed interactive activities. Participants were encouraged to schedule activities to be completed outside of the app and reflect on the impact on their mood. Tasks in the mobile app progress in a linear fashion, that is, each task must be completed to progress to the next task.
Version 2.1 of Spark included a 5-level program focused on providing psychoeducational content and delivering the BA model of depression by teaching 2 core skills (mood activity logging and activity scheduling). Version 2.2 was also divided into 5 levels and expanded on the content of version 2.1, adding problem-solving, mindfulness, and relapse prevention content. Version 2.2 also included a reward system and animations for completing certain activities, some design changes to the user interface, and an increased number of in-app notifications. The 5 levels of Spark version 2.1 were as follows:
Level 1 (Start Your Journey): Program introduction and learning about the BA model of depressionLevel 2 (Making Choices): Mood tracking and up and down activitiesLevel 3 (Solving Problems): Learn about activity scheduling and complete 3 activations,Level 4 (Staying Active): Complete 4 activations andLevel 5 (Journey’s End): Complete 5 activations.Version 2.2 expanded on version 2.1 with the addition of new features, UI elements, and content. The 5 levels of Spark version 2.2 were as follows:
Level 1: Onboarding and Introduction to BA (no differences)Level 2: Mood tracking (no differences)Level 3: Mindfulness and Activity Scheduling (addition of 2 psychoeducational tasks teaching and reinforcing mindfulness skills)Level 4: Problem-Solving and Activity Scheduling (addition of 2 psychoeducational tasks teaching and reinforcing problem-solving skills), andLevel 5: Relapse Prevention and Activity Scheduling (addition of 6 psychoeducational tasks teaching relapse prevention skills).Psychoeducational ControlThe control app contained 5 modules of age-appropriate psychoeducational content related to the neurobiology of depression and did not contain any active CBT or BA components. In this app, participants read the text on screen related to the brain and behavior, the adolescent brain, depression in the brain, neurobiological factors influencing depression, and personality. The same version of the control app was used for the entire study. The 5 levels of the control app included Lesson 1: Understanding Behavior, Lesson 2: Exploring the Brain, Lesson 3: Mastering Messengers, Lesson 4: Riding the Wave, and Lesson 5: People and Personality.
Safety MonitoringA rigorous safety monitoring and classification procedure was followed. Potential safety-related events (ie, clinical concerns), which included worsening or persistently high depressive symptoms, any imminent risk events, suicidal ideation, hospitalizations, injury, illness, nonsuicidal self-injury, and direct or indirect indications of abuse, were identified in the following ways: (1) any concerning information provided to study staff during the onboarding session, in email, text, or phone correspondence during participation; (2) symptom deterioration based on weekly PHQ-8 scores was defined as (a) PHQ-8 score that was ≥15 and ≥5 points higher than the baseline score or (b) PHQ-8 scores ≥20 for 2 weeks in a row during the intervention, and (3) during the 5 week intervention period, all participants and their legal guardians (for those aged >18 y) were asked to complete a weekly internally developed symptom check questionnaire that asked them to report any negative symptoms or side effects that they experienced over the past week, to rate how negative each experience was on a scale of 0 (not at all) to 4 (extremely), and whether they believe each experience was caused by the app; (4) for participants randomized into the Spark arm, there were opportunities within the app that allowed for freeform text input. This freeform text was reviewed daily by study staff.
Data and incoming correspondence were reviewed daily and any identified potential clinical concerns were recorded by the study staff and verified by the study investigators. All logged clinical concerns were reviewed daily by a study investigator who determined whether the clinical concern required escalation to the study clinician (Dr Raph Rose), an independent licensed clinical psychologist not otherwise associated with the study or study sponsor, for clinical input or follow-up with the participant. The study clinician would then determine whether the participant was safe and eligible to continue with the study based on the information provided or based on contact with the participant or legal guardian. If a clinical concern related to suicidality was endorsed during the onboarding session, a study investigator administered the Ask Suicide-Screening Questions Toolkit [] to the participant to determine the imminent risk level, recommend emergency resources if required, and escalate to the study clinician for follow-up.
Participants were withdrawn from the study if clinical concerns met the following criteria: (1) if the study clinician determined that the participant was no longer eligible to continue with the study, (2) if the clinician could not monitor safety because of not being able to reach the participant or other listed contacts, and (3) if the clinician could not monitor safety because participants did not complete the weekly symptom check questionnaire for 2 consecutive weeks. In any of these circumstances, the participant was informed, withdrawn from the study, and sent a list of mental health resources via email.
Safety ClassificationSafety classification was based on the following definitions set forth by the US Food and Drug Administration (FDA) and FDA-recognized consensus standards, ISO 14155:2020 Clinical Investigation of Medical Devices for Human Subjects-Good Clinical Practice [-]. After study completion, a clinician who was not otherwise involved in the study (JF) reviewed all clinical concern data and provided preliminary event classifications. Any events deemed to be potential adverse events (AEs) by this clinician were sent to the study clinician (Dr Rose) for external classification. Dr Rose was blinded to identifying participant information, participant ID, and JF’s ratings to make final determinations. Information was presented to Dr Rose in a manner that concealed the group condition. Nonetheless, it is possible that Dr Rose could discern the group identification for certain classifications based on the content provided by a participant. To reduce the potential for bias in classification, if Dr Rose provided a stricter classification rating than JF, that rating was maintained.
An AE was defined as an “untoward medical occurrence, unintended disease or injury, or untoward clinical signs (including abnormal laboratory findings) in subjects, users or other persons, whether or not related to the investigational medical device [] and whether anticipated or unanticipated.”
An adverse device effect (ADE) was defined as an AE “related to the use of an investigational medical device” []. This includes any AE “resulting from insufficiencies or inadequacies in the instructions for use, the deployment, the implantation, the installation, the operation, or any malfunction of the investigational medical device.” This also includes “any event that is a result of a user error or intentional misuse” []. For this study, ADEs could have occurred in either the Spark or control arms.
A serious AE (SAE) or serious ADE was defined as an AE or ADE that met more than one of the following criteria: Resulted in fatality, posed a life-threatening risk or immediate risk of death at the time of occurrence, led to persistent or significant disability or incapacity, required prolonged inpatient hospitalization, represented an important medical event, as determined by appropriate medical judgment, that could jeopardize the participant’s well-being, or where medical or surgical intervention might be necessary to prevent one of the aforementioned outcomes. This did not include planned hospitalization for a preexisting condition [].
Unanticipated ADEs (UADEs), as defined in the FDA regulation 21 Code of Federal Regulation 812.3 [], also referred to as “unanticipated problems,” included any serious adverse effect on health or safety or any life-threatening problem or death caused by, or associated with, a device, if that effect, problem, or death was not previously identified in nature, severity, or degree of incidence in the investigational plan or application; or any other unanticipated serious problem associated with a device that relates to the rights, safety, or welfare of participants.
Outcomes and Statistical AnalysisAn a priori statistical analysis plan was restricted to participants with moderate to severe symptoms at baseline (PHQ-8≥10; moderate-to-severe cohort), unless otherwise specified. The α level was set to P=.05, and false discovery rate (FDR) correction for multiple comparisons was applied to the specified analyses.
Depression Symptoms: PHQ-8The primary outcome was a group difference in the change in depression symptoms from baseline to 5 weeks, as measured by the PHQ-8. A modified intention-to-treat (mITT) approach was used for primary analyses by including all participants randomized within the moderate-to-severe cohort. A per protocol (PP) analysis included data from participants in this cohort who had completed all weekly PHQ-8 questionnaires. Post hoc mITT and PP analyses were also conducted for participants with a baseline PHQ-8≥5 (mild-to-severe cohort) to evaluate the efficacy of Spark in participants with mild to severe symptoms of depression. Statistical analyses were performed using R version 4.1.2 (R Foundation for Statistical Computing) by an independent external statistician (LC) [].
Missing Data AnalysisLittle’s test [] was used to determine whether group differences existed in the proportion of missing PHQ-8 data across weeks for both moderate-to-severe and mild-to-severe cohorts. For any resulting significant results, the effects of known factors on missing data including Spark version, treatment group, week, baseline PHQ-8 severity, and age group were evaluated with follow-up χ2 tests to determine whether data could be missing at random [].
Multiple Imputation ProcedureMultiple imputation was implemented on missing data from participants who had completed at least the baseline PHQ-8 assessment. Information on Spark version (2.1 and 2.2), treatment group, baseline PHQ-8 score, age group, week, and individual PHQ-8 item was included to impute 100 data sets with the Pan [] method using the R mice package version 3.14.0 [,,]. Missing PHQ-8 item–level scores and assessment completion days from baseline were imputed.
Statistical AnalysisA linear mixed-effects model (LMM) analysis was implemented on an averaged imputed data set to evaluate the main effects of study arm (Spark and control) and week (0-5), and the study arm×week interaction using the R lme4 package (version 1.1.28 []). Four models were implemented in total, including the mITT and PP analyses for both the moderate-to-severe and mild-to-severe cohorts. In each model, group and week were entered as fixed factors. Spark version and assessment completion days from baseline were included as fixed factors to control for the effects of app version and differences in time between the completion of successive weekly assessments. PHQ-8 item–level scores and participants were included as random factors for the intercept, and the time from the baseline assessment was included as a random factor for the slope of participants. FDR correction was applied to the 4 P values for the study arm×week interaction effects. The effect size of the interaction (Cohen f2) was computed using pseudo R2 (f2=R2/) [,] with the R MuMIn package (version 1.46.0 []). Dfs were estimated using the Satterthwaite method provided in the R lmerTest package (version 3.1-3) [,]. Follow-up analyses were performed to evaluate the effect of week within each group.
We also evaluated whether there was an average minimal clinically important difference (MCID) in symptom severity within each group, defined as a ≥5 point average decrease in PHQ-8 score between baseline and postintervention [].
To assess the robustness of the study arm×week interaction, a generalized LMM (GLMM) with multiple imputation with the same model specification as the LMM analysis was implemented on item-level PHQ-8 data across the 100 imputed data sets using a 2-level Pan method [], and the resulting 4 P values were FDR adjusted.
Group differences in remission rates, defined as PHQ-8 score <5 at postintervention [], and treatment response rates, defined as a 50% reduction in PHQ-8 score between baseline and postintervention [], were tested using χ2 tests, and the resulting 4 P values were FDR adjusted.
Secondary Clinical OutcomesSecondary outcomes included anxiety symptoms (Generalized Anxiety Disorder Scale []), legal guardian–rated depression symptoms of their child (Mood and Feelings Questionnaire 61]), and participant-rated and legal guardian–rated (of their child) global functioning (Patient-Reported Outcomes Measurement Information System []) assessed at baseline and postintervention. Means, SDs, and 95% CIs for the average difference between baseline and postintervention were computed for these outcomes. A questionnaire was developed internally to assess the positive and negative effects of COVID-19, and the proportion of participants in each group that endorsed each item was computed.
SafetyThe total numbers of AE, ADE, SAEs, and UADE were computed per group for all randomized participants.
Other OutcomesOther outcomes measured included program adherence, engagement, and acceptability. Program adherence was assessed as the proportion of participants completing all sessions by postintervention and percent completion per module. Engagement for both study arms was assessed using the User Engagement Scale-short form [], minutes spent in the app per week, and total app sessions. Usability was assessed using the system usability scale []. App acceptability was assessed with internally developed self-reported and legal guardian–reported ratings of the app with a 10-point Likert scale, asking how much the app improved mood or symptoms of depression and how enjoyable it was. Means, SDs, and mean differences between study arms with 95% CIs were calculated for these measures.
A total of 168 adolescents consented to participate in this study between November 2020 and June 2021. The postintervention and follow-up data were collected between January 2021 and September 2021, when planned data collection had completed. Eight participants did not meet the inclusion criteria. A total of 160 patients were randomized (80 in the Spark arm; refer to for the CONSORT (Consolidated Standards of Reporting Trials) diagram; refer to for the characteristics). In total, 41 participants received Spark version 2.1 and 39 participants received Spark version 2.2. Data from participants with a baseline PHQ-8 score ≥10 (moderate-to-severe cohort; ) were included in all planned analyses, including mITT (n=121) and PP (n=86). Post hoc analyses included data from participants with a baseline PHQ-8 score≥5 (mild-to-severe cohort; ) for mITT (n=153) and PP (n=109). Three participants skipped a question on the PHQ-8 assessment and were included in the PP analyses. The Spark and control arms did not differ significantly on any of the baseline demographic characteristics (all P>.05).
 Figure 1.  CONSORT (Consolidated Standards of Reporting Trials) diagram. The diagram excludes the control extension arm. Patients were considered lost to follow-up if they did not complete the postintervention questionnaire, considered to have dropped out if they missed 2 weekly safety checks, and considered to have withdrawn from the study if they asked to be removed. Patients were considered as missing week 5 if they did not complete the week 5 questionnaires but moved on to the control extension phase. mITT: modified intention-to-treat; PHQ-8: Patient Health Questionnaire-8; PP: per protocol. Table 2. Participant characteristics (N=160).
Figure 1.  CONSORT (Consolidated Standards of Reporting Trials) diagram. The diagram excludes the control extension arm. Patients were considered lost to follow-up if they did not complete the postintervention questionnaire, considered to have dropped out if they missed 2 weekly safety checks, and considered to have withdrawn from the study if they asked to be removed. Patients were considered as missing week 5 if they did not complete the week 5 questionnaires but moved on to the control extension phase. mITT: modified intention-to-treat; PHQ-8: Patient Health Questionnaire-8; PP: per protocol. Table 2. Participant characteristics (N=160).Weekly means, SD, and 95% CIs for baseline to postintervention change scores per group; remission and treatment response rates per group are reported in .
Table 3. Patient Health Questionnaire-8 descriptives.Cohort and groupBaseline score, mean (SD)Week 1 score, mean (SD)Week 2 score, mean (SD)Week 3 score, mean (SD)Week 4 score, mean (SD)Postintervention score, mean (SD)Postbaseline score, mean difference (95% CI)Remission, n (%)Treatment response, n (%)Moderate to severe (n=121)The LMM revealed a nonsignificant study arm x week interaction (t127.66=−1.911; P=.06; FDR adjusted P=.06; f2=0.0012; power: 0.468-0.494; A). There was a nonsignificant effect of study arm (t177.54=.573; P=.57; f2=0.00034; power: 0.069-0.079) and a significant effect of week (t1977.06=−4.395; P<.001; f2=0.00215; power: 0.985-0.993). There was a significant effect of week in the Spark arm (t1759.08=−3.508; P<.001; change score=−5.08, 95% CI −6.72 to −3.42) and control arm (t561.98=−3.244; P=.001; change score=−3.51, 95% CI −5.09 to −1.93). The GLMM also produced a nonsignificant study arm x week interaction (F1,24186.41=3.223; P=.07; FDR adjusted P=.07).
At the end of the intervention period, the Spark arm showed significantly higher remission rates compared with the control arm (Spark, 11/63, 17% and control, 2/58, 3%; χ21=6.183; P=.01; FDR adjusted P=.03). Treatment response rates between the study arms (Spark, 15/63, 24% and control, 8/58, 14%) were not statistically significant (χ21=1.968; P=.07; FDR adjusted P=.16).
 Figure 2.  Depression severity by week: (A) modified intention-to-treat (mITT) moderate to severe cohort, (B) per protocol (PP) moderate to severe cohort, (C) mITT mild to moderate cohort, and (D) PP mild to moderate cohort. Note that the graphs depict average Patient Health Questionnaire-8 (PHQ-8) scores at each time point based on observed data. Per Protocol (n=86)
Figure 2.  Depression severity by week: (A) modified intention-to-treat (mITT) moderate to severe cohort, (B) per protocol (PP) moderate to severe cohort, (C) mITT mild to moderate cohort, and (D) PP mild to moderate cohort. Note that the graphs depict average Patient Health Questionnaire-8 (PHQ-8) scores at each time point based on observed data. Per Protocol (n=86)The LMM revealed a significant study arm x week interaction (t91.59=−2.546; P=.01; FDR adjusted P=.02; f2=0.0107; power: 0.699-0.699; B). There was a nonsignificant effect of study arm (t82.43=−0.072; P=.94; f2=.00031; power: 0.041-0.056) and a significant effect of week (t1511=−4.876; P<.00
Comments (0)