Keep out of trouble: Validation of a risk assessment measure in a correctional sample

Keep out of trouble: Validation of a risk assessment measure in a correctional sample

Despite high interrater reliability and relative ease of administration, caution is advised when utilizing VRAG–R measure in predicting and managing recidivism risk. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 5, 507–518

A Cross-Validation of the Violence Risk Appraisal Guide—Revised (VRAG–R) Within a Correctional Sample


Anthony J.J. Glover, Correctional Services Canada, Kingston, Ontario, Canada
Frances P. Churcher, Carleton University
Andrew L. Gray, Simon Fraser University
Jeremy F. Mills, Carleton University
Diane E. Nicholson, Correctional Services Canada, Kingston, Ontario, Canada


The Violence Risk Appraisal Guide—Revised (VRAG–R) was developed to replace the original VRAG based on an updated and larger sample with an extended follow-up period. Using a sample of 120 adult male correctional offenders, the current study examined the interrater reliability and predictive and comparative validity of the VRAG–R to the VRAG, the Psychopathy Checklist—Revised, the Statistical Information on Recidivism—Revised, and the Two-Tiered Violence Risk Estimate over a follow-up period of up to 22 years postrelease. The VRAG–R achieved moderate levels of predictive validity for both general and violent recidivism that was sustained over time as evidenced by time-dependent area under the curve (AUC) analysis. Further, moderate predictive validity was evident when the Antisociality item was both removed and then subsequently replaced with a substitute measure of antisociality. Results of the individual item analyses for the VRAG and VRAG–R revealed that only a small number of items are significant predictors of violent recidivism. The results of this study have implications for the application of the VRAG–R to the assessment of violent recidivism among correctional offenders.


VRAG–R, risk assessment, violence, recidivism, offenders

Summary of the Research

“Risk assessment of offenders, particularly the assessment of violence risk, has long played a role within the criminal justice process. Use of structured risk assessment measures is increasing among clinicians, with 50% to 75% of clinicians using structured risk measures during forensic assessments. […] Structured risk assessment should serve four goals. First, salient risk factors for an individual should be identified. Second, an appropriate level of risk, known as a risk estimate, should be determined. Third, clinicians should identify strategies to reduce or manage risk. Finally, risk information should be effectively communicated.” (p. 507)

“Actuarial risk assessment measures are commonly used to appraise risk for various forms of recidivism (e.g., sexual, violent, and general). For the purposes of the current study, actuarial methods will be defined as measures that use empirically relevant items where their aggregate scores are then associated with a probability of future recidivism.” (p. 507)

“A recent update of the VRAG (i.e., the Violence Risk Appraisal Guide—Revised [VRAG–R; Rice, Harris, & Lang, 2013]) was undertaken to simplify scoring, integrate the VRAG and an actuarial measure designed to predict sexual recidivism (i.e., the Sex Offender Risk Appraisal Guide [SORAG; Quinsey et al., 2006]), and reduce time spent on scoring items.” (p. 508)

“A revised version of the VRAG, referred to as the VRAG–R, was recently developed, and has since been incorporated into clinical practice. […] A major strength of the revision was the extended length of the follow-up period for the sample (which ranged up to 49 years in length), which now afforded the inclusion of several participants who had yet to be released at the time of the earlier follow-up studies. […] Preliminary evaluations have found similar predictive validity for the VRAG–R relative to the VRAG. In the validation sample the VRAG–R obtained an AUC value of .75 for violent recidivism and an AUC [area under the curve] value of .76 for the entire sample. […] These values were similar to those obtained in using the VRAG in the same sample group. Furthermore, the authors tested the predictive validity of the VRAG–R after removing the Antisociality item, as this item requires training to score and may not always be readily available using file data. The VRAG–R obtained an AUC value of .75, indicating that its predictive accuracy is not limited if this item is missing. In contrast, however, preliminary research of the VRAG–R in psychiatric samples has shown that it is not predictive of inpatient aggression. Given the mixed results, it is important that the VRAG–R undergo cross-validation if it is to be used by clinicians in a broader forensic context.” (p. 508)

“The current study is a cross-validation of the VRAG–R in a correctional sample of adult male offenders that includes a comparative analysis with existing risk assessment measures (i.e., the VRAG, PCL–R, SIR–R1, and the Two-Tiered Violence Risk Estimates) […] In addition, our study will evaluate the interrater reliability of the VRAG–R among trained clinicians, which has not been previously examined for this measure. Establishing interrater reliability is important as it examines the consistency of the scoring and poor interrater reliability has been found to be associated with lower predictive accuracy. Finally, we will examine the predictive utility of the VRAG–R without the Antisociality (Facet 4) item, as well as with a substitute measure of antisociality.” (p. 508–509)

The sample included 120 federal male offenders from Canadian correctional facilities. The majority were Caucasian (78.3%), with age ranging from 19 to 48 years (M=30.37, SD=7.48). A little over 49% of the sample had an index offense of robbery. At the time of the outcome data collection, 71.7% have completed their sentence. In addition to the aforementioned measures, recidivism information and was collected from Canadian Police Information Centre records, and time-at-risk was calculated as the number of days from the offender’s release to the date of the first postrelease conviction. The first author scored the items for all the measures apart from SIR–R1 during the original incarceration. SIR–R1 was administered at the time of admission by the parole staff. TTV was scored using archival information postrelease by one of the authors. VRAG–R was scored similarly to TTV by the lead author. An independent rater coded 30 randomly selected files to assess interrater reliability.

“Results of the current study demonstrated an overall modest predictive validity of the VRAG–R within our correctional sample, but failed to support its application using the associated risk likelihood bins. Although the VRAG–R showed a high level of association with other measures utilizing historical items, it demonstrated only a moderate degree of predictive validity for both general and violent recidivism. […] It is interesting to note that little change in predictive validity was observed when Facet 4 was both removed from the VRAG–R, as well as replaced with the ARE of the TTV suggesting that the Antisociality item of the VRAG–R could be removed without changing the predictive utility of the measure.” (p. 514)

“When the predictive validity of the VRAG and VRAG–R was examined over time, both measures displayed poor short-term predictive accuracy. […] Despite the performance of the two measures appearing to increase over time and maintaining a relatively moderate level of predictive accuracy, the poor short-term performance of the two measures is worrisome as the greatest proportion of recidivism occurs early after the initial release from an institution. It may be that the fluctuation in predictive validity seen within the short-term is reflective of the impact of environmental factors on risk (e.g., community supervision, short-term treatment effects). Such factors may diminish with the passage of time, resulting in greater predictive accuracy in the long-term due to the influence of the underlying risk (i.e., static risk) posed by the offender (e.g., the offender reaches the expiry of his sentence and is no longer under the jurisdiction of the criminal justice system).” (p. 514)

“The VRAG–R’s high level of interrater reliability in the present study was consistent with the values found for actuarial measures in previous prediction studies. The items of the VRAG–R are clearly defined, easy to score, and less prone to scoring error. Moreover, the ability to remove the Antisociality item from the measure without compromising predictive accuracy could facilitate more efficient administration and less need for intensive training (e.g., PCL–R training). […] As the VRAG–R has replaced [the total PCL-R score] with the simpler Facet 4 (Antisociality) score, it may prove to have more consistent scoring between raters. Similarly, the VRAG–R does not contain the diagnostic items of the original VRAG such as schizophrenia and personality disorder which, like the PCL–R, require clinical judgment.” (p. 514)

Translating Research into Practice

“The VRAG–R may hold some promise in terms of clinical practice for risk assessment purposes. Much like the SIR–R1, it identifies salient historical risk factors that contribute to an offender’s likelihood of risk, provides a risk estimate of future offending, and effectively communicates this risk estimate by stating it as a percentage of reoffending at two future time points. However, as it is a measure that relies solely on static risk factors, the VRAG–R does not meet the criteria of helping to provide strategies for managing or reducing an offender’s level of risk, and is therefore unsuitable for this purpose. It must therefore be used in conjunction with a measure that would provide this information.” (p. 515)

“Overall, while providing some support for the use of the VRAG–R with male offenders, results of the current study have implications for clinical practice. With respect to positive aspects of the VRAG–R, first, results of the current study demonstrate that the predictive validity of the revised VRAG is comparable to that of the original version. Second, our results replicate earlier research findings regarding the limited utility of the PCL–R as part of the VRAG. Third, the strong interrater reliability of the measure between trained clinicians shows that the VRAG–R is both relatively easy to score and can be scored consistently across raters. This is important, as this consistent scoring reflects the stringent scoring criteria intended by the authors as described by Harris et al. (2015). Despite these positive aspects, caution is warranted when interpreting the results for short-term outcomes given the low AUC values observed for both the VRAG and VRAG–R following initial release from custody. However, given the increase in AUC values over time, clinicians may be somewhat more confident in using the VRAG and VRAG–R for making long-term predictions. However, we recommend that cross-validation with a larger sample is required before the VRAG–R can be adopted for clinical use in correctional settings.” (p. 515)

Other Interesting Tidbits for Researchers and Clinicians

“There are several limitations in the current study. For instance, the use of file information to retrospectively code some of the measures for the current study may limit the usefulness of the results due to missing information or a lack of opportunity to clarify file information. Despite this, every effort was made to ensure that all data could be accurately coded. […] Larger sample sizes will be required to provide reliable estimates of risk among correctional offenders be accurately coded.” (p. 515)

“Concerning statistical power, attempts were made to account for the sample size through the statistical methods selected (e.g., nonparametric statistical analyses). […] The sample size for the current study was sufficient for these types of analyses. Indeed, statistical significance was achieved for effect sizes considered small to moderate in magnitude and the sample size of the current study is not unlike the sample sizes in applied risk assessment studies previously conducted with Canadian offenders.” (p. 515)

“Another potential limitation concerns the generalizability of the results, which may be limited due to the homogenous nature of the sample given that the majority of the offenders within the current cross-validation sample were Caucasian. Validations with samples that are more racially diverse are needed before conclusions about the breadth of effectiveness of the VRAG–R can be drawn.” (p. 515)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add! To read the full article, click here.

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.