Reevaluated Psychometric Properties of the Psychopathy Checklist-Revised
New research elucidates the psychometric properties of the Psychopathy-Checklist Revised (PCLR) while avoiding sampling limitations. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.
Psychometric Properties of the Hare Psychopathy Checklist-Revised (PCL-R) in a Representative Sample of Canadian Federal Offenders | Law and Human Behavior | 2016, Vol. 40, No. 2, 136-146
Psychometric Properties of the Hare Psychopathy Checklist-Revised (PCL-R) in a Representative Sample of Canadian Federal Offenders
Jennifer E. Storey, Mid Sweden University
Stephen D. Hart, Simon Fraser University and University of Bergen
David J. Cooke and Christine Michie, Glasgow Caledonian University
The Hare Psychopathy Checklist-Revised (PCL-R; Hare, 2003) is a commonly used psychological test for assessing traits of psychopathic personality disorder. Despite the abundance of research using the PCL-R, the vast majority of research used samples of convenience rather than systematic methods to minimize sampling bias and maximize the generalizability of findings. This potentially complicates the interpretation of test scores and research findings, including the “norms” for offenders from the United States and Canada included in the PCL-R manual. In the current study, we evaluated the psychometric properties of PCL-R scores for all male offenders admitted to a regional reception center of the Correctional Service of Canada during a 1-year period (n ? 375). Because offenders were admitted for assessment prior to institutional classification, they comprise a sample that was heterogeneous with respect to correctional risks and needs yet representative of all offenders in that region of the service. We examined the distribution of PCL-R scores, classical test theory indices of its structural reliability, the factor structure of test items, and the external correlates of test scores. The findings were highly consistent with those typically reported in previous studies. We interpret these results as indicating it is unlikely any sampling limitations of past research using the PCL-R resulted in findings that were, overall, strongly biased or unrepresentative.
PCL-R, cohort sample, psychopathy, PCL-R norms
Summary of the Research
“One of the most widely used psychological tests of PPD [psychopathic personality disorder] is the Hare Psychopathy Checklist-Revised, or PCL-R. Briefly, the PCL-R is a 20-item symptom construct rating scale intended for use in forensic settings. Each item reflects a specific feature of PPD. The lifetime presence and severity of each feature is rated on a 3-point scale (0 = absent, 1 = partially present, 2 = present) on the basis of all available clinical data. Item scores are summed to yield facet, factor, and total scores. Total scores of 30 and higher (out of a maximum possible 40 points) are generally considered diagnostic of PPD. The PCL-R manual summarizes PCL-R ratings for various groups, including male and female correctional offenders and forensic psychiatric patients in Canada, the United States, the United Kingdom, and Sweden. Focusing specifically on correctional offenders in Canada and the United States, the manual presents ratings for a normative sample of 5,408 males who underwent “standard assessments”(p. 136).
“The problem with the PCL-R “standard assessment” and “file review” normative samples for Canada and the United States is that they are not really normative samples at all. They were constructed by pooling samples of convenience, rather than by systematically sampling offenders or patients (e.g., using random, stratified random, or other procedures). For this reason, there is a very real possibility that the norms are not representative of people who may be found in any given setting…Only a handful of published studies have conducted detailed analysis of representative PCL-R ratings, that is, those gathered from offenders or patients selected using systematic sampling procedures” (p. 138).
“Ironically, then, there is a lack of systematic norms for the PCL-R for correctional offenders in Canada and the United States. Pooling samples of convenience—the strategy used in the PCL-R manual—does not necessarily yield results that are representative or unbiased. The lack of systematic norms complicates the interpretation of PCL-R scores in clinical forensic practice and re- search. In clinical forensic practice, the major concern is that the percentile ranks provided in the test manual and the recommended interpretation of score ranges (e.g., very low through very high; see Hare, 2003, p. 33) may be biased. Put simply, the test manual may give an inaccurate picture of what is a relatively high or low score on the PCL-R for male correctional offenders in Canada and the United States. In research, there are at least two major concerns. First, it is difficult to determine the extent to which sampling bias in the norms may have affected analyses of the structural reliability and factor structure of the PCL-R reported in the test manual. Pooled sample may yield a biased or unrepresentative picture of the psychometric properties (e.g., structural reliability, factor structure) of the PCL-R. Second, a pooled sample complicates comparison of PCL-R ratings obtained in Canada and the United States versus other countries for the purpose of cross-cultural validation of the test. To address limitations in past research, in the current study we evaluated the psychometric properties of PCL-R ratings for a sample of serious male offenders selected to be highly representative” (p. 139).
“To address limitations in past research, in the current study we evaluated the psychometric properties of PCL-R ratings for a sample of serious male offenders selected to be highly representative. We studied consecutive admissions to a reception center over a 1-year period. The reception center conducts assessments prior to institutional classification for the entire Pacific region of the Canadian federal prison service. The sample is therefore diverse with respect to correctional risks and needs, yet representative of the population of offenders in the Pacific region. We examined the distribution of PCL-R scores; classical test theory indexes of its structural reliability; the factor structure of test items; and the external correlates of test scores” (p. 139).
“The findings from our study were very similar to those reported in the test manual for offenders assessed via file review. The similarities were observed for PCL-R total, factor, and facet scores. They were also observed with respect to score distribution; Classical Test Theory indexes of structural reliability; factor structure; and external correlates, such as age, correctional risks, violence, and self-harm. Overall, these findings are very reassuring. They suggest two things about the PCL-R manual. First, the pooling of diverse samples in the test manual did, in fact, result in “normative samples” of the United States and Canada that are likely representative of various offender populations in those countries. Second, psychometric evaluations of the “normative sample” are likely generalizable to samples from those same offender populations. In addition, the findings add more general support for the validity of psychopathy as a mental disorder, and the PCL-R as a measure of psychopathy” (p. 144).
Translating Research into Practice
“We interpret these results as indicating it is unlikely any sampling limitations of past research using the PCL-R resulted in findings that were, overall, strongly biased or unrepresentative” (Abstract). Therefore, practitioners might feel more confident utilizing the norms provided in PCL-R manual as well as relying on previous research studies when administering the PCL-R and drawing conclusions. “It will also be important to replicate the current study via standard assessment (i.e., including interviews with offenders and patients), and at various points in time. Only then will we be able to form a more complete picture of the representativeness of the “normative samples” and the generalizability of the psychometric evaluations of the PCL-R presented in the test manual, particularly in light of the fact that the profile of correctional offenders and forensic patients may change in important ways over time” (p. 144).
Other Interesting Tidbits for Researchers and Clinicians
“Notwithstanding the similarities between our findings and those reported in the test manual highlighted in preceding text, we observed a few interesting differences. First, the structural reliabilities of Factor 1 and Facets 1 and 2 scores were somewhat higher in our sample than those reported in the test manual. This may reflect simple sampling variability, but it may also reflect the quantity and quality of file information available at the site where we collected data compared with the file information typically use for assessments via file review.
Second, we observed a better fit for Cooke and Michie’s three- factor hierarchical model with testlets than for Hare’s four-facet, two-factor hierarchical model. This finding is, on the surface, contrary to past research, especially that conducted by Hare and his colleagues. We suspect that the contradictory findings may be the result of the fact that most researchers who evaluate the Cooke and Michie model do not include testlets, which results in significantly worse fit than when testlets are included. Further, differing findings for the four-facet model may arise from confusion regarding which four-facet model is being tested (Cooke et al., 2007). It is important to appreciate that the various 4-facet models proposed have very different conceptual and statistical underpinnings.
Third, we observed higher correlations between PCL-R Factor 1 and Facets 1 and 2 scores and violence that were reported in some past research. This was especially true when examining any documented history of violence as opposed to violent index offenses. Again, sampling variability is one potential explanation for the contradictory findings. But it may also be that when violence is defined in terms of official criminality (e.g., arrest, charge, conviction) it will have higher correlations with scores on Factor 2 and in particular Facet 4, which are themselves heavily saturated with official criminality. For example, 3 of the 10 items that define Factor 2, and 3 of the 5 items that define Facet 4, are scored solely on the basis of official criminality, and official criminality counts heavily toward the scoring of one or two other items that define Factor 2 and Facet 4” (p. 144).
Join the Discussion
As always, please join the discussion below if you have thoughts or comments to add!
Authored by Marissa Zappala
Marissa Zappala is currently a second-year Master’s student in the Forensic Psychology program at John Jay College of Criminal Justice in New York. Her main research interests include cognitive biases, forensic assessment, and evaluator training and education. Following her Master’s, Marissa plans to pursue a doctoral degree in clinical psychology and an eventual career in psychological assessment.