A Gamified Assessment Tool for Antisocial Personality Traits (Antisocial Personality Traits Evidence-Centered Design Gamified): Randomized Controlled Trial


IntroductionOverview

Antisocial personality (ASP) traits, which are characterized by manipulativeness, callousness, and deceitfulness, entail significant threats to organizational trust, team dynamics, and ethical decision-making []. Individuals who exhibit these traits often exploit others for their personal gain, disregard their team responsibilities, and engage in risky behaviors that undermine the long-term organizational health [,]. Traditional assessment tools, such as the Psychopathy Checklist–Revised and the Personality Inventory for DSM-5–Short Form (PID-5-SF), rely heavily on self-reports or clinical interviews, which are susceptible to social desirability bias—especially in nonclinical settings, in which individuals may consciously or unconsciously underreport problematic behaviors [,]. For example, self-ratings on the PID-5-SF often fail to capture situational impulsivity or deceitfulness, as they lack the ecological validity of real-time behavioral data in high-stakes scenarios [].

Even structured tools such as situational judgment tests (SJTs), which simulate workplace dilemmas, struggle to address the multidimensional nature of the antisocial traits defined by the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). These tests typically assume item independence and 1-dimensionality, thereby ignoring the interactive effects among different traits (eg, the possibility of manipulativeness and risk taking co-occurring in decision-making) []. Specifically, these methods often lack operability and interest in the context of practical application, thus failing to engage participants effectively [,,,]; accordingly, they are susceptible to subjective influences and the manipulation of results [,,].

Background

Gamified assessments represent a promising alternative in this context because they embed trait-relevant dilemmas in immersive contexts, in which participants’ choices reflect genuine behavioral tendencies rather than reflective self-evaluations []. However, existing gamified tools have not yet been aligned with clinical diagnostic criteria, thus giving rise to a situation of fragmented validation and limited utility in organizational settings that require rigorous psychometric evidence [-]. We address this gap by using evidence-centered design (ECD), a systematic methodology that translates theoretical constructs into observable behavioral evidence through a process involving 3 iterative stages: capability modeling, evidence modeling, and task design []. In the capability modeling stage, we defined target traits (eg, manipulativeness or deceitfulness) based on DSM-5 diagnostic criteria, thus ensuring alignment with clinical and organizational contexts. In the evidence modeling stage, we identified behavioral indicators by using empirical methods (eg, expert interviews) to bridge abstract traits with observable decisions in simulated scenarios. In the task design stage, we constructed gamified tasks that elicit the target behaviors, particularly using dynamic logic (eg, question‒jump paths) to create immersive, consequence-laden environments in which genuine responses are more likely to emerge.

Although ECD offers significant potential in this context, its application in the field of dark personality traits has not yet been explored in full. This point is especially evident in the context of efforts to operationalize multidimensional constructs such as antisocial behavior.

The ECD framework is a systematic method that can be used to develop assessment tools to construct evidence arguments in game-based assessments beginning in the early stages of assessment by clarifying the evidence needed, and to guide the design and development of the assessment tool accordingly []. The ECD assessment has been described as a reasoning process that involves drawing inferences regarding participants’ real-world knowledge and abilities from the limited evidence provided in the test environment. The core of ECD lies in the process of constructing a clear design framework that includes a capability model, an evidence model, and a task model []; thus, ensuring that the process of developing the assessment tool remains centered on explicit assessment objectives and evidence requirements [].

The capability model describes and defines the personality traits to be measured, whereas the evidence model translates the capability model into observable behaviors and performances, including details on how participants’ behaviors in given task contexts reflect the traits included in the capability model []. On the basis of the capability model and the evidence model, rules or models are established with the goal of constructing quantitative relationships between the models, which can range from simple scoring rules to complex logic trees or data-driven mathematical models, including machine learning models []. The task model describes the specific tasks that participants must complete as part of the assessment tool. These tasks should effectively elicit the target behaviors on the part of participants, thus providing useful evidence.

The ECD approach ensures the systematic and scientific development of assessment tools. This approach can enhance the reliability and validity of assessment tools and reduce assessment errors by establishing a clear design framework and defining evidence needs, as noted in Mislevy et al []. Moreover, the ECD approach can respond flexibly to different assessment needs and application scenarios, thus rendering the resulting tools more diverse and broadly applicable. For example, in high-stakes recruitment and selection processes, ECD can facilitate the design of fairer and more effective assessment tools [].

Our gamified assessment framework, which is anchored in the self-determination theory [] and flow theory [], functions based on 3 mutually reinforcing mechanisms: narrative immersion, responsive feedback, and dynamic flow induction. The narrative mechanism involves embedding questions in workplace scenarios (eg, “audit visits” or “human resources [HR] dismissal”) with the aim of allowing participants to adopt a virtual role that is separate from their real-world identity, thus fulfilling the need for autonomy posited by the self-determination theory and reducing individuals’ self-awareness of being assessed, thus mitigating the tendency toward socially desirable responses []. The feedback mechanism provides competence-relevant feedback via immediate consequences (eg, dismissal notices in response to dishonest decisions), thus prompting participants to internalize the goal of “surviving” the scenario and to align their responses with in-game logic rather than external judgment; thus, mitigating acquiescence bias []. The immersive flow mechanism induces cognitive flow via dynamic path selection (eg, branching storylines based on previous choices), in which context the high cognitive load that results from navigating these paths depletes the mental resources available for response manipulation and effectively “crowds out” deliberate distortion [].

Objectives

This study aims to fill this gap by developing and validating the Antisocial Personality Traits Evidence-Centered Design Gamified assessment tool (ASP-ECD-G), which integrates the DSM-5 criteria with workplace scenarios with the aim of measuring 7 core traits based on behavioral data.

This research was conducted in 3 phases: study 1 involved constructing the assessment ontology based on semistructured interviews; study 2 focused on the development of machine learning models aimed at mapping behavioral data to trait scores; and study 3 entailed validating the tool’s resistance to manipulation and user experience based on a 2×2 mixed experimental design.


MethodsOverview

As part of this research, 3 studies were designed to develop the ASP-ECD-G: study 1 involved developing an assessment ontology for ASP as well as constructing the capability model, evidence model, and task model within the framework of ECD; study 2 involved constructing an assessment model that linked the response task model with the capability model based on study 1; and study 3 validated the assessment characteristics of the ASP-ECD-G through a 2×2 mixed experimental design.

Development of the Assessment ConstructDetermining the Capability Model

This study used the alternative model for diagnosing ASP disorder from the DSM-5 [] as the capability model and integrated it with antisocial behaviors in organizational settings, thereby identifying the 7 behavioral characteristics of ASP provided in .

Textbox 1. Seven behavioral characteristics of antisocial personality.Manipulativeness: frequent use of charm, glibness, or flattery to influence or control others for personal gain.Callousness: a lack of empathy, which often involves disregarding others’ feelings or problems. When the individual causes harm to others, they express no guilt or remorse and may engage in aggressive and abusive behaviors.Deceitfulness: frequent engagement in fraud against others, misrepresentations of oneself, and embellishments or fabrication of information when it pertains to personal interests.Hostility: persistent and frequent anger, feelings of anger in response to minor slights and insults, and retaliation with harsh, sarcastic, or vengeful behaviors.Risk taking: engagement in potentially dangerous activities without fully considering the consequences, thereby often neglecting personal deficiencies and denying the reality of risks.Impulsivity: rapid responses to immediate stimuli without planning or considering the consequences, and a feeling of difficulty in making and following plans.Irresponsibility: a tendency to shirk one’s duties, commitments, or agreements and to opt out of responsibilities when personal interests are at risk.Constructing the Evidence Model

This study involved semistructured interviews with 9 professionals who had >3 years of work experience, including 1 senior manager from the retail industry; 2 midlevel managers from the telecommunications and smart hardware industries; and 6 frontline employees who were recruited from diverse sectors, such as smart hardware, traditional media, health care, finance, and business consulting. All the interviews were conducted on the web via a third-party platform, and each session lasted 30 to 60 minutes. The interview outline was developed based on the definitions of the 7 behavioral characteristics included in the capability model; its core content is presented in . Each interview started with warm-up questions with the goals of establishing a trusting relationship, easing tension, and gradually facilitating in-depth discussions while simultaneously collecting relevant background information regarding the participants. The study used the situation, task, action, and result principle to formulate probing questions for each behavioral event, thereby ensuring the completeness and authenticity of the scenario events discussed during the interviews. The interviews concluded when all the questions had been answered, and the interviewees were asked whether they had anything else to add regarding the questions and answers to confirm that no further information was needed.

The interview contents were transcribed from audio to text form. Through manual screening, the interviews were categorized and organized based on the 7 behavioral characteristics outlined in the interview outline. Workplace behaviors often overlap, thereby reflecting multiple characteristics. This is the result of the complex nature of work environments and decision-making processes. Different perspectives and interpretations can lead to varied understandings of individual behavior. For example, a leader berating an employee for the employee’s failure to complete a task because of illness might be viewed as hostile from the leader’s perspective but callous from the employee’s perspective. Accordingly, the study focused on specific scenarios, and 34 scenarios that effectively reflect ASP traits were derived from the 9 interviews. These scenarios are detailed in . Specific behavioral characteristics were then extracted from each scenario based on the core content of the question design highlighted in , in which context, each scenario could include 1 to 7 ASP traits.

After the 34 scenarios were organized and duplicates removed, 15 unique scenarios remained. The behaviors associated with each role in these scenarios were then extracted and matched with the 7 ASP traits, thus facilitating the identification of 24 typical workplace behaviors. The evidence model was then constructed, as illustrated in .

Table 1. Semistructured interview outline used to define antisocial personality behaviors according to the DSM-5a with workplace professionals (n=9).Behavioral characteristicsCore content of the question designWarm-up question“Briefly introduce your career experiences in chronological order”ManipulativenessNot following company rules, regulations, or lawsDeceitfulnessIntentionally hiding a great deal of information; delivering or reporting false informationImpulsivityMaking workplace decisions driven by emotions or motives without fully considering relevant informationHostilityExpressing strong hostility, anger, or dissatisfaction, and proactively attacking othersRisk takingDemonstrating indifference to workplace safety and regulations, including taking actions that pose potential risks to both individuals and teamsIrresponsibilityDisplaying dereliction of duty or irresponsibility, leading to an inability to complete one’s tasks on timeCallousnessShowing a lack of remorse for wrongdoings and an unwillingness to admit one’s mistakes and improveProbing questionsProbing and supplementing relevant information based on the STARb principle

aDSM-5: Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition.

bSTAR: situation, task, action, and result.

Figure 1. Capability model and evidence model for antisocial personality assessment. Constructing Assessment Tasks

We operationalized these behavioral characteristics in a game setting by incorporating 34 representative workplace behaviors into an interactive game consisting of 10 detailed subscenarios. We assigned each virtual character a basic name, departmental affiliation, and personal background information to enhance the realism of the game. In addition, we carefully designed transitions between scenes to ensure that the narrative flow was coherent. The relationships between each game scenario and the corresponding behavioral characteristics are presented in .

Three psychology experts with more than 3 years of workplace experience reviewed and revised the questions, options, and jump relationships based on the capability and evidence models until they reached a consensus. The finalized game consisted of 34 questions with 115 options (2-5 options per question), including 13 questions featuring logical jump relationships across 6 scenarios. This jump logic took 2 forms: question jump (eg, a situation in which choosing Q1-A jumps to Q3, whereas other options proceed to Q2) and progressive (eg, a situation in which Q1/Q2-D jumps directly to Q4). The logical design is illustrated in . Examples of the game’s item presentation are provided in .

Table 2. Game scenarios and corresponding antisocial traits in the ASP-ECD-Ga design (10 game scenarios and 34 workplace behavior characteristics).Game scenarioWorkplace behavior characteristicsAntisocial personality traitsProject and proposal decisionPreference for radical proposals and relative insensitivity to losses
Risk takingProposal reportingTaking credit for others’ proposals or achievements
Seizing subordinates’ or newcomers’ proposals or achievements
Callousness and deceitfulnessProject initiation meetingRetaliation against others who have harmed the focal individual
Proactive attacks when criticized
Immediate rebuttal when questioned
Publicly criticizing or denying others’ abilities or actions
Shifting blame for personal conflicts to others
Hostility, impulsivity, and manipulativenessHelping othersUnreasonably refusing to help others at work
CallousnessDelivering resultsDemanding or requiring employees to take on tasks or overwork
Shifting blame for personal conflicts to others
Ignoring complaints regarding overtime from subordinates
Failing to respond to criticism during critical periods
Manipulativeness, callousness, and irresponsibilityAltering parametersReadily obeying orders from superiors
Lying or altering information for personal gain
Risk taking and deceitfulnessEmergency situationsFrequently switching project groups or teams based on outcomes
A preference for flexible and changeable work
Pushing others to take responsibility during team crises
Demanding or requiring employees to take on tasks or overwork
Allowing others to bear pressure in situations featuring team difficulties
Expecting more resources to complete tasks
Instinctive retreat, or avoidance, in response to difficulties
Impulsivity, callousness, manipulativeness, and irresponsibilityInformation leaksSharing confidential information for personal gain
Lying or altering information for personal gain
Recommending that others should take risks based on false information
Manipulativeness and deceitfulnessAudit visitsRetaliation against others who have harmed the focal individual
Sharing confidential information for personal gain
Shifting blame for personal conflicts to others
Pushing others to take responsibility during team crises
Lying, or altering information, for personal gain
Preference for radical proposals and relative insensitivity to losses
Expecting more resources to complete tasks
Hostility, manipulativeness, callousness, deceitfulness, impulsivity, and irresponsibility

aASP-ECD-G: Antisocial Personality Traits Evidence-Centered Design Gamified assessment tool.

Figure 2. Logical roadmap for the gamified assessment tool. Assessment Model ConstructionDependent Variable Acquisition

During the process of model training for the ASP-ECD-G, we used participants’ scores on ASP items drawn from the simplified Chinese version of the DSM-5 Personality Inventory (PID-5-SF) [] as labels for the training set (hereafter referred to as ASP scores). The PID-5-SF has exhibited good reliability and validity in previous research, as indicated by a Cronbach α coefficient of 0.916. The specific items are detailed in .

Independent Variable Encoding Methods

We were inspired by traditional SJTs to assign scores to each option, although doing so exclusively based on fixed rules was impractical. The scoring rules were defined without consideration of the influences of prior questions, thus allowing them to serve as a comparative scheme to contrast this method with the 1-hot encoding approach in computer science.

Expert scoring involves assigning values to each option based on the number of ASP traits that it reflects (eg, 5 points for 1 option that reflects 5 out of 7 traits). Three experts assisted in the scoring process, and only the consensus scores obtained after multiple discussions were used. Because of varying question completion rates, the input of each participant was reshaped into a 34×2 matrix, in which context the figure of 34 represented the total number of questions, and the figure of 2 indicated question completion status and scores, for example, (0,0) for unanswered questions and (1,2) for a score of 2. The final training dataset consisted of a 286×34×2 matrix based on data obtained from 286 participants.

In this study, categorized option data that lacked ordinal information were processed via the 1-hot encoding approach. Specifically, each question was encoded as a 5-dimensional vector: for questions that featured fewer than 5 options (eg, Q1 featured only 4 options), zeros were added to the fifth dimension; when option A was selected, the first dimension was set to 1, and the remaining dimensions were set to 0; for unanswered questions, a 5D vector containing all zeros was generated. Ultimately, the training dataset is represented in the form of a 286×34×5 3D matrix.

Selection of Research Models

In our exploration of how game behaviors influence ASP traits, the linear regression (LR) model provides a preliminary framework to analyze the linear relationship between participants’ behavioral data and their levels of antisocial traits. However, deeper insights into decision-making require more complex machine learning methods.

The random forest (RF) model, which involves constructing multiple decision trees and aggregating predictions, captures nonlinear data relationships more accurately but may not fully reveal the interactions among different options. Artificial neural networks (ANNs), which are capable of simulating brain processing, excel at extracting nonlinear patterns when each game option is treated as an independent variable featuring interactive effects.

Because of the sequential and skip-based nature of the questions considered in this context, participant choices form time-series data, thus rendering recurrent neural networks (RNNs) suitable for capturing dynamic changes. However, standard RNNs face gradient issues involving long sequences. The gated recurrent unit (GRU) and long short-term memory (LSTM) models address this issue via gating mechanisms: GRUs simplify the structure by merging hidden states with gates, whereas LSTM models use 3 gates to manage information flow.

Previous research has relied on statistical scoring rules, but the question structure of the ASP-ECD-G has made it challenging for experts to assign consistent scores to identical options or scenario paths. The complexity of behavioral data renders expert scoring impractical. Thus, this study prioritized statistical and machine learning models (ie, LR, RF, ANN, RNN, GRU, and LSTM) for the analysis of gamified data, thus establishing a balance between computational efficiency and interpretability.

Model Training Process

Model training was performed with the assistance of the Adam optimization algorithm and the dropout technique, with the aim of optimizing performance by adjusting the batch size, number of epochs, and fine-tuning model complexity. Standard hyperparameters were established for different models: in the RF approach, n_estimators and max_depth were tuned to balance performance and prevent overfitting; in ANN, the number of layers or neurons and activation function were optimized with the goal of improving learning; and in RNN, GRU, and LSTM, hidden size, the number of units, and dropout were used to affect learning or memory and generalization, and the learning rate was carefully chosen for the optimizer. Details regarding the parameters are presented in .

Table 3. Model hyperparameter tuning results.Model typeHyperparameters
Raw score assignment1-hot encoding of raw scoresLRa modelRFb modeln_estimatorsc=32
random_state=2024
max_depthd=4
min_samples_splite=5
min_samples_leaff=1
max_featuresg=“log2”
n_estimators=5
random_state=2024
max_depth=6
min_samples_split=2
min_samples_leaf=1
max_features=“log2”
ANNh modelLayers: 2
Neurones: 32
Activation: “relui”
Dropout: 0.5
Optimizer: Adam (lrj=0.001)
Loss: “msek”
Epochs: 50
Batch size: 10
Layers: 2
Neurones: 16
Activation: “relu”
Dropout: 0.5
Optimizer: Adam (lr=0.001)
Loss: “mse”
Epochs: 100
Batch size: 10
RNNl modelLayers: 1 (RNN, 32 units)
Activation: “relu”
Dropout: 0.4
Optimizer: Adam
Loss: “mse”
Epochs: 100
Batch size: 10
Layers: 1 (RNN, 32 units)
Activation: “relu”
Dropout: 0.2
Optimizer: Adam
Loss: “mse”
Epochs: 100
Batch size: 50
GRUm modelLayers: 1 (GRU, 64 units)
Activation: “relu”
Optimizer: Adam (lrn=0.01)
Loss: “mse”
Epochs: 100
Batch size: 50
Layers: 1 (GRU, 32 units)
Activation: “relu”
Optimizer: Adam (lr=0.001)
Loss: “mse”
Epochs: 200
Batch size: 10
LSTMo modelLayers: 1 (LSTM, 16 units)
Activation: “relu”
Optimizer: Adam
Loss: “mse”
Epochs: 200
Batch size: 50
Layers: 1 (LSTM, 64 units)
Activation: “relu”
Optimizer: Adam
Loss: “mse”
Epochs: 200
Batch size: 10

aLR: linear regression.

bRF: random forest.

cn_estimators: number of estimators.

drandom_state: random state.

emax_depth: maximum depth.

fmin_samples_split: minimum samples split.

gmin_samples_leaf: minimum samples per leaf.

hmax_features: maximum features.

iANN: artificial neural network.

jrelu: rectified linear unit.

klr: learning rate.

lmse: mean squared error.

mRNN: recurrent neural network.

nGRU: gated recurrent unit.

oLSTM: long short-term memory.

Model Evaluation Metrics

The predictive performance of the ASP-ECD-G model was evaluated by reference to common metrics (ie, root mean square error [RMSE], mean absolute error [MAE], and criterion correlation, r) with the goal of assessing its accuracy and correlation with reference results. Lower RMSE or MAE values indicate higher levels of accuracy, whereas an r value close to 1 signifies a strong positive correlation, thus validating the predictive ability of the model.

As part of this study, 10 participants who met the same criteria as those used in study 2 were recruited to assess the test-retest reliability of the ASP-ECD-G tool. The gamified assessment was administered to these participants twice, with a 1-month interval between the 2 administration periods. The sample size of 10 is in line with relevant guidelines for reliability testing in small-scale validation studies, which focus on within-individual consistency over time. The test-retest design focused on evaluating the stability of behavioral responses to identical scenarios. In both administrations, all 34 interactive questions and logical jump paths were retained.

Participants

As part of this study, the Credamo platform (Beijing Yishu Mofa Technology Co, Ltd) was used to digitalize the gamified assessment questionnaire. The questionnaire included demographic variables (ie, gender, age, highest level of education, marital status, employment status, and position), as well as the gamified assessment tool and the ASP questionnaire.

The sampling criteria used for this study were as follows: (1) participants aged ≥18 years; (2) participants who had at least 1 job or internship experience; and (3) nonpsychology majors.

Participants who did not meet these criteria were excluded via the platform’s custom filtering mechanism. The questionnaire was distributed primarily via online social networks such as WeChat groups and Moments. A total of 291 eligible questionnaires were ultimately collected for this study. After further filtering based on lie detection items, 286 valid questionnaires were obtained.

The descriptive statistics of the sample referenced in study 2 are presented in . Regarding gender distribution, this study included 166 (58%) male participants and 120 (41.9%) female participants out of the 286 participants. With regard to age distribution, the study included 59.4% (170/286) of the participants aged between 23 and 28 years, followed by 26.6% (76/286) aged between 19 and 22 years. For participants’ levels of education, the sample predominantly included individuals with undergraduate degrees (234/286, 81.8%), followed by those with master’s degrees (36/286, 12.6%). Regarding marital status, 68.9% (197/286) of the participants were single. With respect to employment status, 62.9% (180/286) of the participants were employed at the time of this study, and 24.1% (69/286) were serving as interns. For participants’ positions, 62.6% (179/286) of the participants were junior employees, followed by 21% (60/286) who occupied junior management positions.

Table 4. Demographic characteristics of the study 2 participants (n=286).Variable and categorySample, n (%)Sex
Male166 (58)
Female120 (42)
Intersex0 (0)Age (y)
19-2276 (26.6)
23-28170 (59.4)
29-3536 (12.6)
36-454 (1.4)Highest level of education
High school or vocational school2 (0.7)
Associate degree13 (4.5)
Bachelor’s degree234 (81.8)
Master’s degree36 (12.6)
Doctorate1 (0.3)Marital status
Single197 (68.9)
Engaged11 (3.8)
Married, no children39 (13.6)
Married, with children30 (10.5)
In a relationship9 (3.1)Employment status
Employed180 (62.9)
Interning69 (24.1)
Formerly employed13 (4.5)
Never employed21 (7.3)
Other3 (1)Position
Junior employee179 (62.6)
Junior management60 (21)
Midlevel management30 (10.5)
Senior management12 (4.2)Other5 (1.7)Total286 (100)Analysis of Assessment PropertiesStudy Design

We built on study 2 by incorporating items used to measure pleasure, interest, positive emotions, negative emotions, and immersive experience into the gamified assessment tasks and the ASP questionnaire. These items are scored on a scale from 1 to 9, in which context higher scores indicate stronger experiences. The specific items are detailed in .

We used a 2×2 mixed experimental design, in which the assessment format (ie, gamified assessment vs questionnaire assessment) was used as the within-subject variable, and participant incentive (ie, with vs without) was used as the between-subject variable. This study aimed to investigate the impacts of participant incentives on individual performance across different assessment formats via gamified motivational mechanisms.

After the participants completed the questionnaire, they were asked the following question: “If you were to participate in a company’s new employee psychological test, which method would you prefer?” This question aimed to gauge their future willingness to use gamified assessment versus traditional questionnaire assessment.

Participant incentives were introduced as an external motivator to stimulate achievement motivation, in line with previous research that has linked ASP traits to instrumental rationality, that is, the prioritization of strategic self-interest over social norms [,]. Individuals who exhibit high levels of manipulativeness or deceitfulness often engage in utility-maximizing behaviors, thereby justifying their self-serving actions as rational responses to their perceived circumstances []. Individuals who exhibit stronger tendencies toward an ASP rationalize their unethical behavior as a necessary form of self-preservation, reframing it as a logical choice rather than an antisocial tendency [].

Rather than directly incentivizing “antisocial tendencies,” which carry a strong social stigma, this study framed the incentive based on “rationality,” a positively valenced construct that aligns with participants’ self-perceived strategic competence. While external incentives can enhance individuals’ task focus in challenging situations, they may also lead to deceptive self-presentation []. The design of this study was based on the hypothesis that this framing would enhance social desirability effects in questionnaires, in which context participants could adjust their responses consciously. Moreover, participants’ gamified assessment scores, which were based on the behavioral choices made in the context of immersive scenarios, were expected to remain unaffected.

The experimental condition involved an increase in the participation payment (from 9-15 RMB [US $1.25-$2.1]) in exchange for “rational results,” without defining specific criteria; this approach aimed to mimic the context of real-world strategic self-presentation (eg, job applicants tailoring answers with the goal of appearing competent). Participants in the control group completed the gamified and questionnaire assessments in the usual manner, whereas those in the experimental group were presented with an incentive: “This study hopes that you can be rational. If your results show that you are rational, we will increase your participation payment (from 9 to 15 yuan).”

Participants

The participants were selected based on criteria that were consistent with those used in study 2, and the questionnaires were distributed via the Credamo platform. We used GPower (version 3.1) software to calculate the required sample size to ensure sufficient statistical power for the repeated-measures ANOVA conducted as part of this 2×2 mixed experimental design. Regarding the mixed design featuring 2 between-subject and 2 within-subject assessments, we tested the main effects for both factor types. The analysis, which was based on a medium effect size (f=0.25), α=0.05, power=0.8, and a within-subject correlation assumption of 0.5, indicated minimum sample sizes of 98 for between-subject effects, 34 for within-subject effects, and 68 for interaction effects.

We collected 200 questionnaires across 4 groups with the goal of maximizing participation. A total of 148 valid responses remained after a screening process involving lie detection items. Each group (ie, the group with incentives and the group without incentives) included 74 participants.

Descriptive statistics concerning the sample are presented in . Of the 148 participants, 94 (63.5%) were female and 54 (36.5%) were male. The participants were predominantly aged between 29 and 35 years (54/148, 36.5%), followed by those aged between 23 and 28 years (45/148, 30.4%). Most participants had obtained bachelor’s degrees (98/148, 66.2%), followed by those who had obtained associate degrees (19/148, 12.8%). The majority of participants (95/148, 64.2%) were married with children. Regarding participants’ employment status, of the 148 participants, 143 (96.6%) were employed, whereas 5 (3.4%) were serving as interns. The participants’ job positions spanned from entry level to senior management, although entry-level employees represented the largest group (59/148, 39.9%).

Table 5. Demographic characteristics of the participants in study 3 (n=148).Variable and categoryParticipants, n (%)Sex
Male54 (36.5)
Female94 (63.5)
Intersex0 (0)Age range (y)
19-2211 (7.4)
23-2845 (30.4)
29-3554 (36.5)
36-4527 (18.2)
46-5011 (7.4)Education
Junior high school3 (2)
High school or vocational7 (4.7)
Associate degree19 (12.8)
Bachelor’s degree98 (66.2)
Master’s degree21 (14.2)Marital status
Single40 (27)
Engaged1 (0.7)
Married, no children9 (6.1)
Married with children95 (64.2)
In a relationship3 (2)Employment
Employed143 (96.6)
Intern5 (3.4)Position
Entry-level59 (39.9)
Junior management36 (24.3)
Middle management42 (28.4)
Senior management11 (7.4)Ethical Considerations

This study was approved by the ethics committee of Beijing Normal University (approval BNU202503310097), thus ensuring compliance with ethical guidelines. All the participants provided written informed consent. The informed consent form detailed the purpose of the study, the procedures used in this research, and the participants’ right to withdraw from the study without penalty; furthermore, it noted that the use of data was limited to the purposes of the study and that the data collected as part of this research would be anonymized for analysis. The participants’ privacy was protected via the deidentification of personal data, the secure storage of such data on password-protected computers, and the aggregated reporting of study results. All the participants received a basic participation payment of RMB 9, and those in the experimental group in study 3 received an additional RMB 6 as a reward for “rational outcomes”; these payments were provided via a digital platform and were not linked to trait scores to prevent bias. No identifiable information or images concerning participants are included in the manuscript or the multimedia appendices; thus, no additional consent for personal identification was needed. These measures are consistent with relevant ethical standards for informed consent, confidentiality, and participant well-being as outlined in the institutional and international research guidelines.


ResultsOverview

Study 1 involved semistructured interviews with 9 professionals (1 senior manager, 2 midlevel managers, and 6 frontline employees) with >3 years of experience; each interview took 30-60 minutes. For study 2, the Credamo platform was used to distribute digital questionnaires (demographics, gamified assessment, and ASP) to participants (aged ≥18 years, with work or internship experience, nonpsychology majors); 291 responses were collected, 286 of which were valid after lie detection. Recruitment for study 3 also used Credamo (with the same criteria as study 2); we collected 200 questionnaires, 148 of which were valid (n=74 per group with or without incentives after lie detection). The design process is shown in .

Figure 3. Research design flow diagram of three studies. Development of the Assessment Construct

The ASP-ECD-G is presented as a text-based game provided on a questionnaire platform. The storyline of the ASP-ECD-G simulates a workplace scenario in which players play the role of an employee who has just completed a 1-month probation period and is facing the challenge of securing a permanent position. Players must solve various workplace problems, including conflicts with colleagues, crisis management, ethical dilemmas, and team performance management. Each choice made by the player influences the development of the plot; however, the ultimate outcome involves the player receiving a dismissal notice from human resources.

The design of the ASP-ECD-G incorporates 3 game elements: narrative, immersion, and feedback. The narrative is integrated into the assessment content and is presented in the form of a workplace story in which participants are required to immerse themselves to answer questions. For each question, the game characters provide preset feedback based on the player’s choices, thereby driving the plot forward. Players must select the most suitable options from the choices available within the scenario, and they do not have the ability to save their progress or exit the game at a midway point. Incorrect selections can be rectified by returning to the previous page to reanswer the questions. The presentation of the ASP-ECD-G is illustrated in .

The ASP-ECD-G, similar to SJTs, is rooted in real-world scenarios and combines questions with options for assessment. However, in comparison with traditional scales and SJTs, the gamified assessment tool developed as part of this study exhibits 3 distinctive features. First, the scenarios included in the ASP-ECD-G follow a narrative sequence, and the options are characterized by logical transitions, thus violating the independence assumption among the items. Second, because of the logical transitions between questions, different participants may experience varying numbers of scenes during the course of the gamified assessment. Third, each question included in the ASP-ECD-G reflects 1 or more behavioral traits, thereby violating the 1-dimensionality assumption.

Because of these differences in data types, item interdependencies, and dimensional assumptions between the multidimensional nature of the ASP-ECD-G and traditional assessment methods, this study details the findings obtained via multidimensional validation analyses—in which item response theory is used alongside the nominal response model—in for the benefit of readers seeking in-depth insights. This study highlights the fact that these features make the assessment process of the ASP-ECD-G resemble a cohesive story rather than a collection of isolated questions. The ASP-ECD-G aims to replicate real workplace scenarios as closely as possible, and the logical transitions between different options enhance the coherence and immersion of the presented scenario for participants during the assessment process.

Assessment Model ConstructionResults of the Expert-Assigned Coding

The expert panel often found it difficult to reach a consensus during the process of scoring game options, thus leading to instability in the scoring results. The task of explaining the sources of differences in scores among participants was challenging, even if scores could be assigned to all paths. The core issue pertained to the task of determining whether different participants who chose the same option in response to the same question should be assigned the same score. Further consideration revealed that treating participants’ responses as a path made it difficult to account for the complexity and extensive information involved in this process. This situation led to instability in the scoring results, as the experts faced similar challenges in the process of scoring both question options and different paths within the same scenario.

After a fixed random seed was determined, the performances of different models on the same dataset were subjected to a comprehensive comparison. The evaluation included the RMSE and MAE for the training and testing sets, as well as the correlation (r) between the predicted and reference results on the testing set; this evaluation aimed to assess the performance of each model.

As indicated in , none of the models exhibited significant overfitting or underfitting following parameter tuning. Among the testing set results, the GRU model exhibited the best performance, as indicated by RMSE and MAE values of 0.380 and 0.313, respectively, and a correlation of 0.676, thus indicating a high level of consistency between the predicted and observed results.

The study further evaluated the performance of each model across different ASP score ranges in the testing set to assess the stability of the predictions. These results are presented in .

Table 6. Expert-assigned coding model performance for the prediction of antisocial personality scores (n=286; training–test split, 7:3).Dataset and evaluation metricLRa modelRFb modelANNc modelRNNd modelGRUe modelLSTMf modelTraining set
RMSEg0.4150.4320.4380.4360.3630.447
MAEh0.3320.3570.3430.3530.2880.356
ri0.6370.6940.6010.5980.7620.559Testing set
RMSE0.4530.4180.4620.4360.3800.433
MAE0.3560.3360.3620.3400.3130.344
r0.5100.6260.5290.5230.6760.567

aLR: linear regression.

bRF: random forest.

cANN: artificial neural network.

dRNN: recurrent neural network.

eGRU: gated recurrent unit.

fLSTM: long short-term memory.

gRMSE: root mean square error.

hMAE: mean absolute error.

ir represents the correlation between the predicted results and the reference results for the best-performing model.

Table 7. RMSEa values of expert-assigned coding models across different ASPb score ranges (test set, N=59)c.ASP score rangeSample sizeLRd modelRFe modelANNf modelRNNg modelGRUh modelLSTMi model1-1.510.594

Comments (0)

No login
gif