2026-01-28

Backdoor Complications: A Comprehensive Analysis and Mitigation of the Unforeseen Consequences of Backdoor Attacks

Zusammenfassung

Pre-trained language models (PTLMs) have become integral to modern natural language processing (NLP), yet their reuse exposes them to supply chain risks such as backdoor attacks. Existing studies assume that attackers target specific downstream tasks, overlooking how a backdoored PTLM behaves when fine-tuned for unrelated applications. In practice, such unintended adaptation can trigger anomalous and inconsistent predictions, revealing the backdoor and compromising its stealthiness. We define this phenomenon as backdoor complications, i.e., unintended behavioral side effects emerging on non-target tasks. This work presents the first systematic quantification and mitigation of backdoor complications. Through extensive experiments on 3 widely used PTLMs and 15 benchmark datasets, we show that complications are pervasive across both single- and multi-task attack settings, causing triggered outputs to collapse into arbitrary classes. To address this issue, we propose the Complication-Suppressed Backdoor Attack (CSBA), a task-agnostic, multi-objective framework that leverages auxiliary non-target datasets to suppress backdoor complications. CSBA effectively suppresses complications on unseen downstream tasks while maintaining near-perfect attack success rates. Our work reveals a critical side effect in backdoored PTLMs and provides a new perspective on the stealthiness and robustness of model supply chain security.