Send email Copy Email Address
2026

Rethinking Assessments of Prompt Injection Attacks

Summary

Prompt injection attacks are recognized as one of the primary risks faced by LLM-integrated applications in recent years. However, common evaluation frameworks remain insufficient, lacking comprehensiveness and real-world relevance. To bridge this gap, we revisit the common evaluation framework and conduct an extensive evaluation across eight different evaluation settings, including 37 real-world applications, 185 injected tasks, 21 attack instructions, and a total of 143,745 queries. The evaluation highlights several findings. For example, real-world applications are more vulnerable to prompt injection attacks compared to those used in research settings. While complex attack instructions are more sophisticated, they are less effective than simple attack instructions. To uncover the root causes of these phenomena, we further investigate the model’s internal representations during attacks, offering profound insights into the underlying dynamics of these attacks. Additionally, we conduct an assessment of both prompt-level and model-level defense mechanisms and highlight their limitations in real-world applications. By exploring more diverse scenarios across different dimensions, our framework provides a solid foundation for assessing vulnerabilities in LLM-integrated applications and evaluating the efficacy of defensive strategies.

Conference Paper

Annual Meeting of the Association for Computational Linguistics (ACL)

Date published

2026

Date last modified

2026-06-29