Capacity to classify feigned symptoms of mental illness and/or cognitive impairment using the Structured Inventory of Malingered Symptomatology

Capacity to classify feigned symptoms of mental illness and/or cognitive impairment using the Structured Inventory of Malingered Symptomatology

Specific cut-off scores used in prior research to classify individuals as faking did not work particularly well, suggesting that further research is needed before the Rare Symptoms [RS] and Symptom Combinations [SC] scales of The Structured Inventory of Malingered Symptomatology (SIMS) can be used in applied settings. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2020, Vol. 44, No. 2, 167-177

Classification Accuracy of the Rare Symptoms and Symptom Combinations Scales of the Structured Inventory of Malingered Symptomatology in Three Archival Samples

Author

John F. Edens, Texas A&M University
Tiffany N. Truong, Texas A&M University
Randy K. Otto, University of South Florida

Abstract

Objective: The Structured Inventory of Malingered Symptomatology (SIMS) is a 75-item self-report measure intended to screen for potentially feigned symptoms of mental illness and/or cognitive impairment. We investigated the classification accuracy of 2 new detection scales (Rare Symptoms [RS] and Symptom Combinations [SC]) developed by Rogers, Robinson, and Gillard (2014) that appeared useful in identifying simulated mental disorder in their derivation sample of psychiatric inpatients. Hypothesis: We hypothesized that the rates of classification accuracy Rogers et al. reported for these 2 scales would generalize to other samples in which the utility of the SIMS previously has been investigated. Method: We computed RS and SC scores from archival SIMS data collected as part of 3 research projects investigating malingering detection methods: (a) general population prison inmates and inmates in a prison psychiatric unit receiving treatment for mental disorder (N = 115), (b) college students (N = 196), and (3) community-dwelling adults (N = 48). Results: Results supported the global classification accuracy of RS and SC but the suggested cut-score for both scales (>6) produced poor sensitivity. Lower potential cut-offs did, however, improve sensitivity to feigning somewhat while not excessively diminishing specificity. Conclusion: These results emphasize the importance of generalizability research when investigating the clinical utility of forensic mental health assessment methods, particularly specific decision rules used to classify individuals into discrete categories.

Keywords

Structured Inventory of Malingered Symptomatology, malingering, simulation designs, prisoners, classification accuracy

Summary of the Research

“The Structured Inventory of Malingered Symptomatology (SIMS) is a commercially published, self-report measure that is used as a screening tool to identify persons feigning symptoms associated with mental illness or cognitive impairment. It can be completed in 15 to 20 min and consists of 75 items that map onto five nonoverlapping, 15-item scales (Low Intelligence, Affective Disorders, Neurologic Impairment, Psychosis, and Amnestic Disorders) that can be combined into a total score” (p. 167).

“As is true with most psychological tests, it is unclear exactly how widely used the SIMS is in clinical-forensic contexts. In a survey of neuropsychologists from six European countries, [researchers] reported that approximately 13% of respondents reported using the SIMS. A somewhat more recent survey of North American neuropsychologists found that approximately 10% of respondents reported using the SIMS. Most recently, survey data from Canada indicated that 21% of forensic mental health professionals reported having used the SIMS within the preceding 5 years, with the mean number of administrations per examiner who reported use of the tool being 33. We performed a cursory review of U.S. appellate case law databases and found that the SIMS has been included in an array of criminal appeals in which potential malingered mental disorder has been a point of concern. For example, opinions based at least in part on the SIMS have been introduced in competence to stand trial, insanity, capital punishment, and competence to be executed proceedings. Clearly, the SIMS has made its way into the armamentarium of some experts, being used in various high-stakes cases involving the potential for significant loss of liberty (or even life). Numerous studies have examined the classification accuracy of the SIMS since its initial publication in 1997. In 2014, [researchers] published a meta-analysis of the available research, analyzing 31 studies that included 61 samples and 4,009 SIMS protocols. They concluded that, although the SIMS total score effectively differentiates feigners and honest responders and is resistant to coaching efforts, it is vulnerable to high false positive rates. In response they, as had other investigators, recommended increasing the SIMS total cut-score in an attempt to improve the tool’s specificity” (p. 167-168).

“In discussing some of the limitations of the SIMS, Rogers, Robinson, and Gillard (2014) noted that none of the original scales were developed using established detection strategies and that only its total score typically is used as a feigning screen. In response to these concerns, they constructed two new malingering detection scales based on data collected from 107 psychiatric inpatients participating in a simulation study” (p. 168).

“Based on these data, Rogers and colleagues (2014) developed two scales: Rare Symptoms (RS) and Symptom Combinations (SC). The RS scale is composed of SIMS items rarely endorsed by patients in the standard condition but likely to be endorsed by the patients instructed to feign, whereas the SC scale was constructed by identifying item pairs that were rarely endorsed by patients but likely to be endorsed (in combination) by those who were instructed to feign. The investigators concluded that cut-scores of greater than 6 yielded the best classification accuracy for both scales” (p. 168).

“These results provided preliminary but compelling support for the potential clinical utility of these new scales but, like any initial study, additional cross-validation and generalizability research is necessary. In the present investigation, we examined the utility of the experimental RS and SC SIMS scales developed by Rogers and his colleagues (2014) using archival data collected from three samples: (a) general population prison inmates and inmates in a prison psychiatric unit receiving treatment for mental disorder, (b) college students, and (c) community-dwelling adults” (p. 168).

“The results from Studies 1 and 2 indicated that the scales initially derived by Rogers and his colleagues (2014) worked relatively well at a global level, with two primary caveats: (a) college students instructed to feign depressive symptoms appeared relatively more difficult to accurately classify and, arguably more importantly, (b) performance specifically within the CSU prison sample was not significantly better than chance. However, it should be noted that when the SIRS was used as the criterion measure to identify malingering within the CSU sample (rather than relying on staff determinations regarding feigning), global performance (i.e., AUC values) for both scales appreciably improved” (p. 175).

“[T]hese results at a global level suggest that these new SIMS scales can identify feigned responding, but raise concerns about the generalizability of the provisional cut-scores created in the scale derivation process. We hope that the results from these three archival samples will spur further investigation into these promising new SIMS scales, both in terms of re-analyses of existing archival data sets as well as new research with more diverse samples and experimental designs” (p. 176).

Translating Research into Practice

“Despite the generally encouraging findings at the scale level, the cut-scores of >6 for the RS and SC scales recommended by Rogers et al. (2014) did not produce particularly useful rates of classification accuracy within these three samples. Among prisoners, a cut-score of >6 produced high specificity but low sensitivity for RS and SC, whereas a cut-score of >2 produced the highest overall correct classification rate. These results were generally consistent with results from the college student and community sample such that a cut-score of >2 yielded more optimal levels of sensitivity (for both samples) and specificity (for the college student sample). This modified cut-score in the college student sample appreciably improved sensitivity for psychosis (90%) and cognitive impairments (79%) but the scales’ performance in the depression condition (45%) continued to be relatively poor. Additionally, although specificity data were not available in the community sample, a cut-score of >6 resulted in the accurate identification of only 33 and 17% of participants as malingering on the RS and SC scales, respectively— even though these individuals produced significant elevations on the Psy-5 Psychoticism Scale. In contrast, a cut-score of >2 improved overall performance of these two scales (65 and 64%, respectively). Despite this relative improve in performance, 19% of community participants were still able to produce scores below two on the RS and SC scales” (p. 175).

Other Interesting Tidbits for Researchers and Clinicians

“The somewhat different pattern of effects we obtained relative to those reported by Rogers et al. (2014) could be because of several different (and nonmutually exclusive) factors. For example, it is possible that their item selection process identified some spurious effects that were specific to the distributional properties of these items in that particular inpatient sample. Alternately, the use of an inpatient sample of trauma patients to derive these new scales may have resulted in (nonspurious) results that are simply more specific to that particular population (i.e., psychiatric patients in a civil psychiatric setting). A more direct replication attempt with a similar sample that is provided similar instructional sets might result in better cross-validation results. In contrast to such a replication study, the current study investigated two nonclinical samples and a prisoner sample of inmates from both general population and a psychiatric unit. As such, it is possible that differences emerged because of the populations being sampled or the type and level of psychopathology evident among them” (p. 175).

“Additionally, participants in the Rogers et al. (2014) study were instructed to ‘simulate total disability’ (p. 459) in relation to their trauma symptoms, whereas the current study provided instructions to feign depression, psychosis, or cognitive impairments (student and community samples), and nonspecific major mental illness (prison sample) among simulation groups across our three archival data sets. The weaker performance of the RS scale in regards to the simulation of depression in particular could have resulted from its lack of inclusion of any of the SIMS’s Affective Disorders items, although five of the 13 SC item pairs include at least one Affective Disorders item. The developers applied empirically supported detection methods to their SIMS data, but their findings obviously were specific to items (RS) and item-pairs (SC) that were effective within the constraints of the simulation instructions provided to those particular patients residing within a specific inpatient trauma unit. It is curious, however, that the poorest performance was among the simulation subsample (depression) that is conceptually most similar to trauma or disability claimants—at least in comparison with the two other, more distinctive types of disorder (psychosis and cognitive impairment). Regardless, the generalizability of the accuracy of the particular items selected, particularly in relation to narrower or more specific types of feigned disorder that the SIMS was also designed to screen for (e.g., amnesia, neurological impairment)—as well as those it was not intended to identify (e.g., attention-deficit-hyperactivity disorder [ADHD])— remains an important area for future investigation” (p. 175-176).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!