How to Distinguish Feigned from Genuine Depressive Symptoms: Response Patterns and Content Analysis of the SIMS Affective Disorder Scale

Participants

A total of 340 respondents participated in the study. The inclusion criteria were: (a) aged 18 years and older, (b) living in Italy, and (c) able to read questions on a computer monitor and understand the meaning of those questions. Data were collected over 15 days (i.e., November 16–30, 2020). The questionnaires were administered cross-sectionally on an online survey platform, which participants accessed via a designated link that was disseminated over email using convenience sampling. Nine participants were excluded because they did not understand the instructions (see the “Research Design” section) and 16 were excluded because they did not complete the questionnaires. The final sample consisted of 315 participants. All participants voluntarily responded anonymously, indicating their informed consent within. The procedures were clearly explained, and participants could interrupt or quit the study at any point without declaring their reasons for doing so. They did not receive any compensation for their participation. The experimental procedure was approved by the local ethics committee (Board of the Department of Human Neuroscience, Faculty of Medicine and Dentistry, Sapienza University of Rome), in accordance with the Declaration of Helsinki.

Participants were grouped into three subsamples: Honest [H], Simulators [S], and Honest with Depressive Symptoms [HDS]. The Honest group included 110 participants, aged 23–68 years (M = 37.30, SD = 12.52). Most were male (n = 66, 60.0%), Italian citizens (n = 105, 95.5%), and residents of central Italy (n = 70, 63.6%); 50 held a high school diploma (44.5%) and the majority were employees (n = 56, 50.9%) and unmarried (n = 71, 64.5%). The Simulators group was composed of 161 participants, aged 24–75 years (M = 36.30, SD = 11.87). Most were female (n = 90, 55.9%), Italian citizens (n = 160, 99.4%), and residents of central Italy (n = 105, 65.2%); 79 held a high school diploma (49.1%), 75 were employees (46.6%), and most were unmarried (n = 104, 64.6%). The Honest with Depressive Symptoms group comprised 44 participants, aged 25–64 years (M = 35.59, SD = 12.60). Most were female (n = 28, 63.6%), Italian citizens (n = 44, 100%), and residents of central Italy (n = 26, 59.1%); 20 held a high school diploma (45.5%), 16 were employees (36.4%), and most were unmarried (n = 32, 72.7%).

The chi-squared test (χ2) revealed statistically significant differences between the Honest, Simulators, and Honest with Depressive Symptoms groups, with respect to biological sex [χ2(2) = 9.666, p = 0.008], with more male participants in the Honest group compared to the other two groups; and employment status [χ2(8) = 17.547, p = 0.025], with more unemployed participants in the Honest with Depressive Symptoms group compared to the Honest group. No statistically significant differences emerged with respect to the other socio-demographic variables (see Table 1).

Table 1 Descriptive Statistics for the Total Sample (N = 315) and Each Group (i.e., Honest, Simulators, Honest with Depressive Symptoms)MeasuresThe Beck Depression Inventory – Second Edition (BDI-II)

The BDI-II (Beck et al., 1996; Ghisi et al., 2006) is one of the most widely used instruments for screening depressive symptomatology (von Glischinski et al., 2019). The self-administered test consists of 21 items that assess the cognitive, affective, motivational, and somatic symptoms of depression: sadness, pessimism, past failure, loss of pleasure, guilty feelings, punishment feelings, self-dislike, self-criticalness, suicidal ideation or wishes, crying, agitation, loss of interest, indecisiveness, feelings of worthlessness, loss of energy, change in sleeping patterns, irritability, change in appetite, concentration difficulty, tiredness or fatigue, and loss of interest in sex (Beck et al., 1996). Each item consists of a list of four statements arranged in order of increasing severity, referring to a particular symptom of depression that respondents may have felt during the prior 2 weeks. Answers are provided on a four-point scale, ranging from 0 to 3. The total score is the summation of respondents’ scores for the 21 items, with a maximum of 63. BDI-II items may be grouped into two subscales: the Somatic-Affective subscale, comprised of 12 items that describe the affective, somatic, and vegetative symptoms of depression; and the Cognitive subscale, comprised of 9 items that represent the cognitive symptoms of depression (Beck et al., 1996; Steer et al., 1999). The present study administered the official Italian adaptation of the test (Ghisi et al., 2006), and the internal consistency was excellent (α = 0.972).

The Structured Inventory of Malingered Symptomatology Affective Disorders Scale (SIMS AF)

The present study administered the Affective Disorders (AF) scale of the SIMS Italian adaptation (La Marca et al., 2011). The SIMS (Smith & Burger, 1997; Widows & Smith, 2005) is a multi-axial self-report questionnaire that aims at identifying respondents’ feigning of psychiatric symptoms and/or cognitive deficits. It is comprised of 75 items, describing implausible, rare, atypical, or extreme symptoms that respondents must endorse or reject. The measure has been validated in the clinical-forensic, psychiatric, and non-clinical fields (Harris & Merz, 2022; Monaro et al., 2018; Orrù et al., 2021, 2022). The SIMS AF scale consists of 15 items—each associated with a specific symptom of depression or anxiety; respondents report each symptom via a dichotomous response option (i.e., true vs. false). The SIMS AF scale has a cut-off of > 5 (La Marca et al., 2011). In the present study, it showed good internal consistency (α = 0.768).

Of note, although the total SIMS score suggests the presence of feigning, the subscale scores suggest the type of psychopathology that is being feigned (e.g., Shura et al., 2022; van Impelen et al., 2014). In the present study, we used only items from the AF scale, in order to mask the study aim. In more detail, the simulation design asked participants to simulate depressive symptoms. Therefore, the inclusion of all SIMS items may have made it immediately obvious which items should be endorsed, in order to comply with the feigning instructions, thereby hindering the content validity analysis. Furthermore, administering the SIMS in its entirety would have added a disproportionate number of non-believable and unrelated items, considering that the BDI-II is comprised of 21 items, whereas the SIMS and its AF scale include 75 and 15 items, respectively.

Research Design

A between-subjects design was implemented and the informatic system randomly assigned participants to one of two experimental groups, defined by the manipulated factor of instruction: Honest [H] vs. Simulators [S]. In the first group [H], participants completed the tests (i.e., SIMS AF, BDI-II) with the instruction to respond honestly. In the Simulators group [S], participants completed the tests with the instruction to feign depression, according to the DSM-5 criteria for major depressive disorder (American Psychiatric Association, 2013Footnote 1). Of note, the experimental instructions provided to Simulators contained coaching elements—namely symptom preparation and warnings (Puente-López et al., 2022). In fact, participants were clearly instructed to attend to not only the symptoms of major depressive disorder, but also the questionnaire features designed to detect feigning, as their aim was to respond in such a way that their deception would not be detected. We report the experimental instructions in the Appendix at the end of the manuscript.

At the end of the survey, a final question implemented as a manipulation check asked participants to describe how they responded to the items: “honestly,” “dishonestly,” or “I don’t remember.” Nine participants in the Simulators group were excluded from the analysis because they answered “honestly” to this question, suggesting that they may not have understood the instructions; and 16 were excluded because they did not complete the questionnaires.

Following the data collection, the descriptive statistics revealed that 28.57% of Honest participants (n = 44) scored higher than the cut-off of 12 for the BDI-II total score. According to the Italian technical manual (Ghisi et al., 2006, p. 67), this cut-off should be used as the initial interpretive criterion, for research purposes. Following this recommendation, we grouped participants with a BDI-II total score higher than 12 into a third group, called Honest with Depressive Symptoms [HDS].Footnote 2 This result was not surprising, considering that the data collection occurred during the second wave of the COVID-19 pandemic in Italy, when an increase in depressive symptoms among the general population was observed (Mazza et al., 2022).

Data AnalysisResponse Pattern Analysis

A preliminary analysis was run to investigate the response patterns of the three experimental groups (i.e., Honest, Simulators, Honest with Depressive Symptoms) on the SIMS AF scale and the BDI-II (i.e., total score, Cognitive and Somatic-Affective scale scores). We calculated the correlation coefficient (r) between the four scales (i.e., SIMS AF, BDI-II total, BDI-II Cognitive, BDI-II Somatic-Affective) and computed a correlation matrix for each experimental group. Finally, the z-test for comparing sample correlation coefficients was applied to determine differences between groups (with significance set to α = 0.05 and the critical value for the z-statistic set to z = 1.96).

Machine Learning Model

To determine an interpretable decision model to differentiate the three groups (i.e., Honest, Simulators, Honest with Depressive Symptoms), we built a three-class decision tree using machine learning (ML) methodology. More specifically, we trained a J48 algorithm using tenfold cross-validation. J48 is an algorithm used to generate decision trees (Quinlan, 1993). While it is one of the simplest statistical classifiers, its output has a high degree of interpretability and explainability (i.e., transparency). In domains such as healthcare, the transparency of ML models is particularly important, especially when artificial intelligence is applied to support clinical decision-making (Adadi & Berrada, 2020). Thus, various ML models imply a different trade-off between accuracy and transparency.

In the present research, we chose a simple but highly interpretable classifier (i.e., J48), as our aim was to produce a model that could concretely support clinicians and forensic experts in their efforts to detect simulated depression. Moreover, we followed a tenfold cross-validation procedure in order to guarantee model generalization and increase the replicability of the results. The k-fold cross-validation consisted of randomly and repeatedly splitting the sample into training and validation sets. This resampling procedure reduced the variance in the model performance estimation with respect to the use of a single training set and a single validation set, thus reducing overfitting (Kohavi, 1995). The sample (N = 315) was partitioned into k = 10 subsamples of equal size: 9 subsamples were used to train the model and the remaining subsample was used for its validation. This process was repeated 10 times, so that each of the 10 folds was used only once as a validation set. Finally, an estimated validation accuracy was generated by taking an average of the results obtained from the 10 folds. The ML model was run using WEKA 3.9 (Frank et al., 2016).

Content Validity Analysis

To assess the content validity of SIMS AF items, the following analyses were performed, considering only the Simulators and Honest with Depressive Symptoms groups. First, given the imbalance between groups (Simulators: n = 161; Honest with Depressive Symptoms: n = 44), the Simulators group was subsampled, extracting 44 observations randomly. Subsequently, chi-squared tests were performed to evaluate the associations between groups (Simulators vs. Honest with Depressive Symptoms) and responses to SIMS AF items (i.e., true vs. false). Concerning the magnitude of the associations, Cramer’s V ≤ 0.20 was considered indicative of a weak effect, 0.20 < Cramer’s V ≤ 0.60 of a moderate effect, and Cramer’s V > 0.60 of a strong effect.Footnote 3 Finally, a forward logistic regression model was implemented with only the items that emerged as significant in the previous analysis, in order to determine which items discriminated between groups. Data analyses were computed using the Pandas software library (McKinney, 2010) and SPSS version 28 (IBM Corp, 2021).

Comments (0)

No login
gif