Testing behaviour change with an artificial intelligence chatbot in a randomized controlled study

Study design

We conducted a pilot randomised controlled study using a single-blinded, between-group design, with two intervention groups and a control group. We recruited a total of 59 participants using random volunteer sampling through community advertisements.

We conducted the current experiment on 4 days in October 2020: 19, 21, 28 and 29, based on the Unified Theory of Acceptance and Use of Technology (UTUAT) framework [29]. To use chatbox as a public health tool, we applied the UTUAT framework to gauge behavioural intentions of using the chatbot after the testing had concluded. We assessed the main factors of UTUAT (performance expectancy, effort expectancy, social influence, and facilitating conditions) by asking participants about the likelihood of recommending the chatbot to their friends and by administering the Digital Behaviour Change Intervention Engagement Scale (DBCI; [30]), as described below.

We administered the alpha prototype of Cory COVID-Bot with manual switching between conditions, such that the three groups tested the chatbot sequentially (the control group, then the compassion group, then the exponential growth group). Because we completed all chatbot interactions on weekdays with slots during and after work hours, the risk of selection bias was limited.

Monash Health granted ethics approval under identifier HREC/69725/MonH-2020-237291(v3).

Participants

The participants resided in Melbourne, in the Australian state of Victoria. We recruited them using random volunteer sampling through social media advertisements. They participated during Victoria’s second wave of COVID-19. At that time, it was clear from media reports and overall public sentiment that acceptance of an aggressive suppression strategy in Australia was fragile, as Victorians had experienced one of the strictest lockdowns in the world to date.

We selected participants to represent one of three target populations: 18–29 years of age, temporary visa holders, or Vietnamese nationals, among those who were fluent in English or Vietnamese. We compensated participants with an AUD 50 supermarket voucher upon completion of the exit survey after their interaction with Cory COVID-Bot.

Of the 59 participants, 11 participants forgot to fill out the pre-test survey, and 2 forgot to fill out the post-test survey, but all tested Cory COVID-Bot. Thus, 46 participants completed the entire study.

We randomly allocated participants to one of three groups: exponential growth, compassion, or control. After completion by each group, we manually shifted the software for the next.

Assessment procedure

Participants used their own smartphone devices for testing Cory COVID-Bot, and we used the video conferencing programme Zoom for the duration of the supervised interaction with the chatbot.

The avatar for this chatbot is a knowledgeable, friendly middle-aged librarian (see Fig. 1). He uses emojis to seem more human-like which is associated with more effective conversations [3]. We designed the avatar to have an interactive and engaging interface, and to speak in simple English or Vietnamese given the potential low literacy rates of of users.

We elicited participants’ attitudes about staying home (as part of the public health orders) in 30 different scenarios adapted from Van Baal et al. [31]. We prompted participants with reasons for going out, such as “Someone wants to go for a walk in the park at 5 pm. It is a popular neighbourhood park with narrow footpaths near their house”, and asked “How certain are you that it is alright for them to leave the house?”. These scenarios fall in three different risk categories (minimal risk, low risk, high risk). Participants responded on a visual analogue scale ranging from “Completely certain it is not alright” to “Completely certain it is alright”, with the middle of the range representing uncertainty about the right course of action (for more details, see 31).

To elicit participants’ perceptions of the importance of testing, we had the librarian ask “How important do you think it is to get tested if you experience symptoms?”. They answered using a visual analogue scale ranging from “Not important at all” to “Extremely important”. The exponential growth and compassion behavioural animations showed how exponential growth of cases occurs and the painful separation of a family due to COVID-19, respectively. We then related these concepts to the importance of testing to limit disease transmission. The 20 and 51s animations relied on images and emotive music and contained no spoken or written words to maintain accessibility for people from different language groups.

Subsequent to the dialogue about symptoms (details on the procedure below)—and for two of the groups, animations—we elicited testing of participants’ intentions as follows: “How likely is it that you would get tested if you had symptoms?”, with the participants asked to respond with one of three options: “very likely”, “I don’t know”, or “very unlikely”.

We assessed participants’ acceptance of Cory COVID-Bot by administering the DCBI. The DBCI Engagement Scale posed questions about their experience with a behaviour change intervention in this format: “How much did you experience the following?” and eight response items: “Interest”, “Intrigue”, “Focus”, “Inattention”, “Distraction”, “Enjoyment”, “Pleasure”, “Annoyance”. Participants’ responses ranged from “not at all” (coded as 1), “moderately” [4], and “Extremely” [7]. The DBCI Engagement Scale includes questions about DBCI components with which the participant has interacted, and how much time the participant spent with the DBCI. We adjusted these questions slightly to fit with the context: “Which elements of Cory do you remember accessing without the experimenter?” and “How much time (in minutes) do you roughly think you spent interacting with Cory without the experimenter?”. We added to the DBCI Engagement Scale an item about the extent to which they experienced learning.

We assessed participants’ eligibility for the study through an initial survey collecting their informed consent, data on their demographic variables, and their availability for a video-conferencing session to test the chatbot. Subsequently, we scheduled a video-conferencing session by email, then asked each person to fill out a second survey (the pre-test survey) a day before their scheduled test and we sent a reminder. This survey provided a baseline for their perceived importance of testing in the response to COVID-19 and their attitudes on staying home or going out when faced with public health orders (detailed above).

The chatbot test consisted of two parts: a supervised part with an experimenter who guided each participant through a sequence of 18 questions to ask Cory COVID-Bot (in their own words; for the questions, see Supplementary Materials), and an unsupervised part for which the experimenter ended the zoom meeting. Then participants could ask Cory COVID-Bot whatever they wanted for 30 min. At the beginning of the test, each user received an SMS link to access Cory COVID-Bot through their device; the link directed them to a conversation with Cory COVID-Bot on Facebook Messenger.

Participants in one of the intervention groups would be shown an animation near the end of the structured section of the testing sequence. Then came the question about their testing intentions. The experimenter did not ask about their response. Participants chose whether to answer this question; 25 out of the 59 participants did so. We asked participants to fill out one more survey after they finished the unsupervised interaction with Cory COVID-Bot to qualify for payment. This survey included the DBCI Engagement Scale [30].

Analysis

We used a cumulative link model with a logit link [32] to assess whether inclusion of the interventions mattered for the participants’ reports of whether they were likely to get tested for COVID-19 if they experienced symptoms. We chose this analysis method because it handles well ordinal data with multiple independent variables. The dependent variable had three levels: “very likely”, “I don’t know”, and “very unlikely”. The predictors included in the model were the group (exponential growth, compassion, or control), and the participants’ age and their sex as control variables.

We encountered convergence issues with the cumulative link model because the algorithm could not find uniquely determined parameters. As a result, we were unable to conduct pairwise post-hoc tests between groups based on the model, although we were able to analyse likelihood ratios of different models (such as testing whether the inclusion of each variable was important for the model). To conduct pairwise tests, we instead used one-sided Wilcoxon rank sum tests. We converted the factor levels to numerical values, where 1 signifies ‘very unlikely’, 2 signifies ‘I don’t know’, and 3 signifies ‘very likely’. To match the statistical test, we report medians with the Wilcoxon rank sum tests instead of means and standard deviations.

We also analysed whether participants reported higher perceived importance of getting tested after the interaction with Cory COVID-Bot with the manipulations versus without. For this analysis, we used a continuous ordinal regression [33, 34]. As noted, 11 participants forgot to fill out the pre-test survey, which caused partially overlapping data. Continuous ordinal regressions can handle non-normal, continuous, ordinal data, and are robust to partially overlapping data. The dependent variable for this model was the difference between participants’ perceived importance of getting tested after their interaction with the chatbot versus before. Responses were recorded on a visual analogue scale. We used condition, age, and sex as independent variables in this model.

To assess whether participants’ uncertainty about the acceptability of going out of their homes under the stay-at-home orders had decreased after the interaction with Cory COVID-Bot, we compared the absolute value on the VAS attitude scale before and after their interaction with Cory COVID-Bot. This procedure removed the valence (alright or not alright) but preserved the degree of certainty. For this analysis, we used continuous ordinal regression [34]. This model allowed us to estimate the effects of participants’ interaction with Cory COVID-Bot for each of the three risk levels in the original article [31]: high risk, low risk, and minimal risk. The independent variables in the model were the same as the above with the addition of the risk level of the considered scenario. P-values reported were considered significant at a false discovery rate corrected alpha of 0.05. For all analyses, we conducted programming in R [35].

Comments (0)

No login
gif