Can We Really Be the Judges? A Rare Look into the Use of Performance Validity Tests with Chinese-Speaking Immigrants in The United States

Can We Really Be the Judges? A Rare Look into the Use of Performance Validity Tests with Chinese-Speaking Immigrants in The United States

A Study of the TOMM and DCT in Chinese-Speaking Immigrants with Limited English Proficiency in the United States | 2023 Vol. 22, No. 1, 1-13


Yi-Ting Chang; Department of Psychology, Fordham University, Bronx, New York, USA; Department of Psychology, University of North Texas, Denton, Texas, USA

Barry Rosenfeld; Department of Psychology, Fordham University, Bronx, New York, USA

Wai-Cheong C. Tam; Department of Psychology, Chung Yuan Christian University, Taoyuan City, Taiwan

Cheng-Yun Teng; Department of Psychology, Palo Alto University, Palo Alto, California, USA; Department of Applied Psychology, New York University, New York, New York, USA

Ying Han; Department of Psychology, Fordham University, Bronx, New York, USA


The accuracy of performance validity tests (PVTs) with culturally diverse populations has increasingly been questioned. High false positive rates have been found in some PVTs in culturally and linguistically diverse individuals within the U.S. and internationally. No study to date has investigated the accuracy of PVTs with Chinese-speaking immigrants (CSI) in the U.S. The current study aimed to evaluate two PVTs, the Test of Memory Malingering (TOMM) and Dot Counting Test (DCT), to determine their accuracy in a community sample of CSI with limited English proficiency. These two measures were used in a simulation design, contrasting 52 participants who were instructed to respond honestly to 22 participants instructed to feign incompetency to stand trial. Results demonstrated the scores of TOMM Trial 1 and Trial 2 were effective in classifying honest responders from simulators, whereas the DCT E-score did not differentiate the groups better than chance. However, false positive rates for the TOMM Trial 1, Trial 2, and the DCT E-score were relatively low. Only one honest responder (1.9%) was classified as exerting insufficient effort in TOMM Trial 1 and DCT E-score, and the TOMM Trial 2 did not misclassify any honest responders. Implications and cautionary statements are provided and discussed.


Classification accuracy; cognitive effort; cross-cultural validity; feigning; malingering

Summary of the Research

“The assessment of feigned cognitive impairment is considered an essential part of forensic evaluations, given the high prevalence of malingering that has been seen in clinical and medical populations…Performance validity tests (PVTs) were first used to investigate suspicious cognitive deficits and identify poor motivation in the mid-1900s…Numerous PVTs are readily available to forensic clinicians, such as the Test of Memory Malingering (TOMM; Tombaugh, 1996)…and the Dot Counting Test (DCT; Boone et al., 1996)…despite the widespread use of PVTs, relatively little research has examined their utility outside of Western, English-speaking populations…Considering that the accuracy of PVTs has been found to be negatively affected by a range of linguistic and cultural factors…the accuracy of forensic mental health evaluations that rely on these measures with culturally diverse individuals may be questioned…Several studies have also highlighted the importance of assessing the degree of acculturation when evaluating the performance of PVTs in cross-cultural research…” (p. 1-3).

“Relatively little research has examined PVTs in Chinese samples…To date, no published research has examined the utility of PVTs with CSI [Chinese-speaking immigrants] living in the United States, despite the fact that these measures have been commonly used in forensic evaluations for decades. The present study investigated the utility of two commonly used cognitive effort tests, the TOMM and DCT, in a community sample of CSI residing in New York who had limited English proficiency. In addition, potential contributors to poor performance in participants who exerted normal effort were also examined, including age, education, and level of acculturation…The sample was recruited through flyers posted in public locations in the neighborhoods of New York City serving a predominately CSI population or were referred from a community-based organization (CBO) that provides a range of behavioral health services to Asian, mostly Chinese, immigrants. Participants were from Mainland China, Taiwan or Hong Kong, spoke Mandarin fluently, and had limited English proficiency (based on self-report)…” (p. 3).

“…Contrary to past findings (Nijdam-Jones et al., 2017; Weiss & Rosenfeld, 2010, 2017), false positive rates of the TOMM Trial 1, Trial 2, and the DCT E-score were relatively low in this CSI sample with limited English proficiency…analyses indicated that the DCT E-score did not differentiate feigners from honest responders better than chance, and TOMM Trial 1 and Trial 2 generated only moderate classification accuracy. Specifically, the DCT only identified about a quarter (n = 6 of 22) of feigners, while both TOMM trial 1 and Trial 2 identified half (n = 11 of 22). These sensitivity rates are roughly comparable to past research with the TOMM, regardless of whether the tool is used with culturally diverse participants…The low sensitivity of the DCT E-score found in the current and past studies suggest that research investigating other indicators of DCT is warranted. Despite only modest overall predictive accuracy, the low rate of false positive classifications (high specificity) is encouraging and supports further research using the TOMM and DCT in culturally and linguistically diverse samples…” (p. 8).

“The low false positive rates may also reflect the impact of education in this sample. In the current study, 84.6% of the honest responder participants…had completed at least nine years of formal education and 44.2%...had completed the equivalent of a high school degree…The effects of age and presence of a mental disorder on classification accuracy of PVTs in the present study also warrant note. In this sample, older genuine responders had a higher risk of being misclassified as feigning by the DCT and participants with a mental disorder diagnosis were more likely to be misclassified both by the TOMM (Trial 1) and the DCT. The results were in accordance with previous studies using the TOMM…and the DCT...Regarding the effect of acculturation on classification accuracy, we found that the longer the participants lived in the U.S., the higher the likelihood of being incorrectly classified as exerting insufficient effort in both the TOMM and DCT…in the current study, years living in the U.S. was positively associated with age, negatively associated with education, and corresponded to a greater likelihood of a mental disorder, all of which were associated with a higher risk of being misclassified as feigners…” (p. 9).

Translating Research into Practice

“…Considering the differences observed between the current and past studies, future research should continue to investigate the influence of education on the accuracy of PVTs, along with the possible impact of translators and the quality of rapport between evaluators and study participants…special attention should be paid to immigrants with a history of mental illness and have limited interaction with (Anglo) American people, as they appear to have a higher risk of being misclassified as feigning. Similarly…match between examiners and examinees as well as examinees’ level of education should also be noted, and may reduce the risk of classification errors in actual clinical settings” (p. 9-10).

“We also acknowledge that not all practitioners provide forensic assessments using evaluees’ native language, and hence we strongly encourage working with a qualified and skillful translator who ideally shares the same culture as the individual being evaluated, even when translated PVTs are available. In addition, it is suggested that evaluators consider the evaluees’ level of acculturation to American culture, and in particular, the frequency with which they interact with (Anglo) Americans. These considerations may help further reduce the risk of erroneous conclusions in evaluations with CSI” (p. 10).

“More research investigating feigning measures in individuals with a diverse cultural and linguistic background is undoubtedly needed. Future research should address whether the ‘match’ between examiners and examinees impacts the accuracy of PVTs and other assessment techniques. This can be evaluated by, for example, using second or third-generation Chinese-American researchers who speak little Mandarin and identify themselves as more similar to American mainstream culture, or even using translators to evaluate the effect of these practices. Likewise, a more sophisticated analysis of acculturation is clearly needed to better understand the impact of familiarity with Western culture on the accuracy of test results” (p. 10).

Other Interesting Tidbits for Researchers and Clinicians

“The TOMM (Tombaugh, 1996) was created to detect feigned or exaggerated memory deficits. It includes two learning trials (Trial 1 and Trial 2) and one optional retention trial. Only Trial 1 and Trial 2 were administered in the current study. In each learning trial, 50 line-drawings of common objects are presented for three seconds each, followed by 50 recognition panels with two objects on each panel, one of which was previously presented along with a new (not previously shown) drawing. The examinee is asked to select the object that was previously shown during the learning trial…The DCT was developed by Andre Rey in the 1940s to detect insufficient cognitive effort and was adapted by Boone et al. (2002) using normative data for several diagnostic groups. The test is comprised of 12 cards with dots printed on each, that are exposed to the examinee one at a time…The examinee is asked to count the dots as quickly as possible. The Effort Index or E-score was introduced by Boone and colleagues to improve detection of insufficient cognitive effort, based on the average time spent to count the ungrouped cards plus the average time spent to count the grouped cards and the total number of counting errors made…” (p. 4).  “Two individual items of the SMAS [Stephenson Multigroup Acculturation scale] were also associated with PVT scores: I attend social functions with (Anglo) American people, and I have many (Anglo) American acquaintances. Based on these findings, more frequent interactions with Anglo Americans corresponded to a lower likelihood of being misclassified by the TOMM and DCT. It remains unexplored whether frequent exposure to American culture would result in better performance on PVTs among individuals from diverse cultural backgrounds. Counterintuitively, other aspects of acculturation such as frequency of English use (i.e., I speak English at home or I think in English) were not correlated with performance on the PVTs. It is yet premature to assert a clear relationship between acculturation and the accuracy of the TOMM and DCT…” (p. 9).

Additional Resources

As always, please join the discussion below if you have thoughts or comments to add!