Clinical Experience May Affect The Predictive Validity of the PCL-R
Clinical experience may impact the variability in scores on the PCL-R and its predictive capacity for future violence. This is the bottom line of a recently published article in International Journal of Forensic Mental Health. Below is a summary of the research and findings as well as a translation of this research into practice.
Featured Article | International Journal of Forensic Mental Health | 2017, Vol. 16, No. 2, 130-138
Does the Predictive Validity of Psychopathy Ratings Depend on the Clinical Experience of the Raters?
Marcus T. Boccaccini, Sam Houston State University
Katrina A. Rufino, University of Houston
Hyemin Jeon, Sam Houston State University
Daniel C. Murrie, Institute of Law, Psychiatry, and Public Policy, University of Virginia
We compared the predictive validity of Psychopathy Checklist- Revised (PCL-R; Hare, 2003) scores assigned to 80 sexual offenders by two trained graduate students (using ﬁle review) and at least one licensed clinician (using ﬁle review and interview). There was more variability among the scores assigned by the licensed clinicians than those assigned by the graduate students, and the scores assigned by licensed clinicians were stronger predictors of future offending than those assigned by the graduate students. Findings raise questions about a possible association between clinical experience and PCL-R score validity and highlight the need for more research in this area.
Psychopathy checklist; PCL-R; risk assessment; clinical experience
Summary of the Research
“International surveys show that the Psychopathy Checklist-Revised (Hare, 2003; PCL-R) is among the most widely used measures in risk assessment, and is often introduced as evidence in legal proceedings addressing risk. Meta-analyses tend to support the use of these Psychopathy Checklist (PCL) measures, consistently showing that PCL measure scores are small to moderate-sized predictors of recidivism and misconduct. There is, however, a significant amount of heterogeneity in PCL score predictive effects across studies, with some studies reporting much larger effects than others. Sample and study design features that explain some of this variability include sex, race, location of study (e.g., country), and whether scores are assigned on the basis of file review only or both file review and interview. Existing meta-analyses have not examined whether rater training or experience might also explain variability in predictive effects” (p. 130).
“The PCL-R manual does not require users to have any specific level of prior experience or training, although the PCL-R publisher (MHS) lists the PCL-R as a “Level C” product, which requires an advanced degree in psychology or psychiatry and completion of a graduate-level course in psychometrics. The PCL-R manual does however recommend that users have “extensive relevant experience” with forensic populations, demonstrated by “completion of a practicum or internship in a clinical-forensic setting, or several years of relevant work-related experience” (Hare, 2003, pp. 16–17). In one recent field study examining PCL-R scores in sex offender risk assessment, scores from evaluators who appear to have been more experienced (i.e., had conducted 35 or more assessments) were predictive of future violent offending, while scores from less experienced evaluators were not. It could be that the less forensically experienced clinicians were unduly influenced by crime details (i.e., sexual offenses) or based their scoring on a more global “nice- or bad-guy” impression of the offender, which the PCL-R manual describes as a common scoring bias among novice raters” (p. 130 – 131).
“Ultimately, the extent to which clinical experience and training are related to PCL scoring accuracy is unclear because none of the existing PCL studies have compared predictive effects from more experienced to less experienced raters. It may be that predictive effects from studies with research assistant raters would be even stronger if researchers used experienced clinicians” (p. 131).
“In this study, we compare the predictive validity of PCL-R scores for 80 sexual offenders who had been scored by two graduate student raters—both of whom had ample PCL-R training and several years of supervised clinical experience—and at least one licensed clinician. The 80 offenders were part of a larger sample of 124 offenders in a treatment program for civilly committed sexual offenders. Each offender had been civilly committed as a sexually violent predator in the state of Texas, and each had been evaluated by a state-contracted psychologist or psychiatrist prior to commitment. These evaluators were required by statute to assess for psychopathy (Texas Health and Safety Code x841.023) and all of the evaluators used the PCL-R” (p. 131).
The graduate student raters had a high level of agreement with each other. However, when compared to the licensed clinicians from the original evaluations, there were only low to moderate levels of agreement. The expert evaluators also had greater variability in their scores than the graduate student raters. Even though the graduate students evidenced greater reliability in their scoring, the PCL-R scores assigned by the licensed clinicians were the only ones that predicted offender outcomes.
Translating Research into Practice
“In this study, PCL-R scores from licensed clinicians outperformed those from graduate student raters with MA degrees and several years of supervised clinical experience, suggesting a possible association between clinical experience and the validity of PCL-R scores. This finding seems unexpected when considered alongside the larger body of research examining the association between experience and accuracy in clinical psychology. Recent reviews show that there is only a very small association between experience and accuracy and graduate students tend to perform no better or worse than practicing clinicians in many contexts” (p. 134).
“One possible explanation for our findings and the recent PCL:YV findings is that PCL-R assessments require what one researcher has described as “skillful inquiry”; the ability to focus information gathering resources on diagnostically relevant issues. It may be that more experienced clinicians are better than less experienced clinicians at knowing what information to collect. In one of the few studies to examine the associations between training, experience, interview strategy, and diagnostic accuracy, graduate students and practicing doctoral level psychologists asked questions to a computer-simulated patient. The computer program used the content of the evaluator’s first question to generate one of 203 possible answers. The evaluators could ask as many follow-up questions as they wanted, each followed by a computer-generated response. Years of experience and level of training were associated with the number of diagnostically relevant questions evaluators asked and diagnostic accuracy, but not the number of non-diagnostically relevant questions (e.g., background, history) they asked. In other words, more experienced evaluators made more accurate diagnoses because they asked more diagnostically relevant questions” (p. 135).
“Of course, the concept of “skillful” inquiry need not apply narrowly, only to interviews, but seems relevant to the task of skillfully identifying relevant details amid lengthy records (a skill that may be especially relevant in this study give that the graduate student raters could not perform interviews). Skillful inquiry may help explain our finding that
experienced clinicians showed more variability than graduate students in the scores they assigned. If experienced evaluators ask more diagnostically relevant questions and fewer non-diagnostically relevant questions, their scores should vary more than those from less experienced clinicians due to them picking up on valid indicators of psychopathic traits and being less affected by the types of nondiagnostic information that can lead to score inflation (e.g., offense details)” (p. 135).
“This study adds to a growing body of research that addresses the complexity that underlies real world PCL-R scoring. While there is evidence that the reliability and validity of PCL-R scores may be weaker in the field than in structured research studies, recent findings suggest that scores from some evaluators are more predictive than scores from other evaluators. This study examined clinician experience as one variable that may explain some of the variability in the predictive validity of PCL measure scores. Although findings from our study must be interpreted cautiously due to the more experienced raters having access to more data (i.e., clinical interview), our study adds to the small, but growing empirical literature suggesting that evaluator experience might matter in the context of risk assessment. Our findings, along with those from other recent studies, suggest that it is time to reexamine what we know about the role of experience in the accuracy of forensic assessment. Rather than answering complex questions about experience and accuracy, these exploratory findings should prompt further studies carefully designed to better explore the role of training and experience in assessment” (p. 136).
Other Interesting Tidbits for Researchers and Clinicians
“Although we found an association between experience and predictive validity, findings from this one study alone certainly do not provide conclusive evidence that PCL-R scores from more experienced raters outperform those from less experienced raters. Our findings are limited to a setting in which we have already documented especially large amounts of measurement error in PCL-R scores, and the graduate students were not able to interview offenders. Because those with more experience always had access to more data (i.e., interview), it is impossible to know the extent to which the differences in predictive validity we observed were attributable to rater experience, access to interview data, both experience and access, or some other factor (e.g., other rater characteristics). Thus, it is best to view our findings as preliminary, documenting the need for further research that examines the possible role of experience and conducting interviews in PCL-R and forensic assessment instrument scoring” (p. 134).
Join the Discussion
As always, please join the discussion below if you have thoughts or comments to add!
Authored by Amanda Reed
Amanda L. Reed is a doctoral student in John Jay College of Criminal Justice’s clinical psychology program. She is the Lab Coordinator for the Forensic Training Academy. Amanda received her Bachelor’s degree in psychology from Wellesley College and a Master’s degree in Forensic Psychology from John Jay College of Criminal Justice. Her research interests include evaluator bias and training in forensic evaluation.