But does it actually work? Field Validity of the STATIC-99 and STABLE-2007

But does it actually work? Field Validity of the STATIC-99 and STABLE-2007

Although there are no guarantees that measures developed in research studies will generalize to applied settings, this study found that two sexual recidivismrisk tools (Static-99R and STABLE-2007) predicted reoffending in a large, field validity study of men with a history of sexual offending. Variability in predictive accuracy in previous studies suggests that special efforts (e.g., appropriate training) are required to implement recidivism risk tools with high quality. This is the bottom line of a recently published article in Psychological Assessment. Below is a summary of the research and findings as well as a translation of this research into practice.


Featured Article | Psychological Assessment| 2021, Vol. 33, No. 7, 581-595

Field Validity of Static-99 R and STABLE-2007 with 4,433 Men Serving Sentences for Sexual Offenses in British Columbia: New Findings and Meta-Analysis


L. Maiike Helmus, Simon Fraser University
R. Karl Hanson, Carleton University
Daniel C. Murrie, University of Virginia
Carmen L. Zabarauckas, Ministry of Attorney General (BC); Simon Fraser University


Many forensic assessment measures are developed and validated under research conditions but applied in the field, where professionals or paraprofessionals have varied training, unknown fidelity to administration procedures, and contextual pressures related to their institutions or legal system. Yet few studies examine the generalizability of psychometric properties of these scales as actually applied in field settings. This study examined 4,433 individuals assessed by probation officers on the Static-99R or STABLE-2007 sexual recidivism risk scales in British Columbia, Canada. Sexual, violent, and any recidivism were examined. Static-99R and STABLE-2007 had moderate accuracy in discriminating recidivists from non-recidivists, and both scales added incrementally in predicting all three outcomes (with Static-99R demonstrating higher accuracy). Organizing the items into constructs, sexual criminality, general criminality, and youthful stranger aggression incrementally predicted all three outcomes. For violent and any recidivism, the incremental effect of sexual criminality was in the negative direction (i.e., high sexual criminality was associated with relatively lower rates of violent and any recidivism). Calibration analyses indicated that recidivism rates were lower than what would be predicted by the norms for the scales. The current study also presented a meta-analysis of 15 field validity studies of Static-99R and 4 field validity studies of STABLE-2007. Results of the current study and meta-analysis support the field application of Static-99R and STABLE-2007, while emphasizing the importance of training and proper implementation.


risk assessment, field validity, recidivism, predictive accuracy, sexual offences

Summary of the Research

“Research settings typically prioritize training, fidelity, and interrater reliability; raters who are not reliable are re-trained or replaced. In contrast, field practitioners may experience a number of subtle contextual pressures, due to the institutions they serve or their role in an adversarial legal system (i.e., adversarial allegiance; Murrie & Boccaccini, 2015). Where examinee participation is included, research studies typically afford confidentiality or anonymity and no consequences for refusal to participate or for the results of the assessment. In the field there is no anonymity and consequences are not just likely—they are indeed the point of the assessment (i.e., to inform decisions)” p. 582

“…the first independent, peer-reviewed field study of Static-99 was published, yielding discouraging results for the scale among individuals screened for civil commitment in Texas (AUC = .57). This large study raised concerns that Static-99 may not generalize well in field practice, particularly outside Canada. Since then, more field validity studies for Static-99R have accumulated, such that a meta-analysis is warranted. There are fewer field validity studies of STABLE-2007, but minimally enough for a meta-analysis.” p.582

“Combining seven samples with both Static-99R and Static-2002R items, Brouillette-Alarie et al. (2016) found evidence of three latent constructs: sexual criminality, general criminality, and youthful stranger aggression. Olver et al. (2016) replicated these constructs using Static-99R and static items from the VRS-SO. Brouillette- Alarie and Hanson (2015) examined these three constructs along with the items of STABLE-2007 and found that the STABLE items (except loneliness and capacity for relationship stability) could be mapped onto either sexual criminality or general criminality; STABLE- 2007 items did not provide meaningful measurement of youthful stranger aggression.” p.582

“Recent years have seen an emphasis on field validity research for forensic measures. The current study used a large field validity sample (N = 4,433) of Static-99R and STABLE-2007 assessments as scored in British Columbia, Canada. To appropriately examine predictive validity, we looked at both discrimination and calibration properties. Furthermore, we provided the first meta-analysis of the field research on these scales.” p.583

Translating Research into Practice

“The findings from this routine/complete field sample support the continued use of both scales in British Columbia, and generally found that the scales predicted similar to the average of other field validity studies (and Static-99R predicted higher than average when the Texas samples are retained). All items of both scales significantly predicted sexual recidivism as intended, with the exception of the Static-99R item for index non-sexual violence This is consistent with the meta-analysis of Helmus and Thornton (2015) who found this item to be the weakest Static-99R item, although it did predict in North American samples (not so much elsewhere)” p.591

“This meta-analysis also revealed that the quality of training and implementation are critical components for the accuracy of risk tools in field assessments. Although several of the studies included did not provide sufficient detail for a comprehensive assessment of implementation issues, a preliminary moderator analysis based on the information we did have found that studies with appropriate training systems in place for the officers administering the tools such as B.C. Corrections, found meaningfully and significantly higher accuracy than studies where the appropriateness of training was unknown. In contrast, there were no differences in predictive accuracy based on whether one of the co-authors of the scales was also a co-author on the study, suggesting no author allegiance effects, as reported in other reviews” p.591

“The construct validity analyses reinforced the view that risk for sexual recidivism is multidimensional. Sexual crimes are crimes; consequently, the factors associated with general rule violation (e.g., prior criminal convictions, lifestyle impulsivity) also increase the risk of sex crime recidivism. There are, however, sexcrime-specific risk factors, which are only weakly related to other types of rule violation. In the current study, the sex-crime-specific construct displayed weak positive associations with violent and general recidivism in the univariate analyses; however, in the multivariate analyses, they were negatively associated with these outcomes after controlling for general criminality and youthful stranger aggression. A similar negative association has been previously observed, which motivated the Static Development Team to recommend against using Static-99R to predict nonsexual violent and general recidivism; instead, they recommend that violent and general recidivism be assessed by young age and the general criminality factor from the Static- 2002R” p. 592

“The current results suggest that information may be lost by only considering total scores. It may be possible to improve predictive accuracy by considering the constructs assessed by risk tools. Rather than a list of risk factors, future risk tools could include subscales addressing latent constructs, and the overall assessment of risk could be based on combining subscale scores. Such an approach also has the potential of identifying psychologically meaningful propensities relevant to intervention and risk management strategies” p.592

Other Interesting Tidbits for Researchers and Clinicians

“These results are in contrast to previous reviews finding that field reliability/validity tend to be weaker compared to instrument development or research studies. We suspect two primary reasons for this discrepancy. One is that much of this previous research has focused on the Psychopathy Checklist-Revised (PCL-R), which involves more subjectivity and inferences in scoring, especially compared to Static- 99R. Consequently, the PCL-R may be vulnerable to greater decreases in reliability or administration fidelity, compared to Static-99R and the STABLE-2007, when applied in routine practice. Conversely, however, all the studies in our meta-analysis examined uses of the scales in correctional settings, whereas much PCL-R field research (and Static-99R field interrater research) has considered more adversarial court settings, particularly civil commitment in the United States. It is possible that adversarial contexts add distinct external pressures, which lower the reliability/validity of forensic measures. Most likely, both explanations are credible. One randomized study found that Static-99R is susceptible to adversarial allegiance biases in civil commitment proceedings, albeit markedly less so than the PCL-R” p.591

“One interesting finding was that for violent and any recidivism, the overall predictive accuracy (discrimination) was higher when the items were organized by constructs than when they were organized by type of variable (i.e., demographic and criminal history variables form the total score of Static-99R; STABLE-2007 total scores are based on evaluators’ ratings of psychological and community adjustment, largely from interview)” p.592

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!