Item

Effective faking of verbal deception detection with target‐aligned adversarial attacks

Kleinberg,Bennett
Loconte,Riccardo
Verschuere,Bruno
Abstract
Background Deception detection through analysing language is a promising avenue using both human judgements and automated machine learning judgements. For both forms of credibility assessment, automated adversarial attacks that rewrite deceptive statements to appear truthful pose a serious threat. Methods We used a dataset of 243 truthful and 262 fabricated autobiographical stories in a deception detection task for humans and machine learning models. A large language model was tasked to rewrite deceptive statements so that they appear truthful. In Study 1, humans who made a deception judgement or used the detailedness heuristic and two machine learning models (a fine‐tuned language model and a simple n‐gram model) judged original or adversarial modifications of deceptive statements. In Study 2, we manipulated the target alignment of the modifications, that is, tailoring the attack to whether the statements would be assessed by humans or computer models. Results When adversarial modifications were aligned with their target, human ( d  = −0.07 and d  = −0.04) and machine judgements (51 dropped to the chance level. When the attack was not aligned with the target, both human heuristic judgements ( d  = 0.30 and d  = 0.36) and machine learning predictions (6378 were significantly better than chance. Conclusions Easily accessible language models can effectively help anyone fake deception detection efforts both by humans and machine learning models. Robustness against adversarial modifications for humans and machines depends on that target alignment. We close with suggestions on advancing deception research with adversarial attack designs and techniques.
Description
Date
2025-07-01
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
deception, faking, heuristics, machine learning, natural language processing, verbal lie detection
Citation
Kleinberg, B, Loconte, R & Verschuere, B 2025, 'Effective faking of verbal deception detection with target‐aligned adversarial attacks', Legal and Criminological Psychology. https://doi.org/10.1111/lcrp.70001
Embedded videos