Annual Meeting of the Association for Computational Linguistics (ACL)
Peering Behind the Shield: Guardrail Identification in Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL)
InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
Annual Meeting of the Association for Computational Linguistics (ACL)
Rethinking Assessments of Prompt Injection Attacks
ACM Conference on Computer and Communications Security (CCS)
UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images
Usenix Security Symposium (USENIX-Security)
Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
Usenix Security Symposium (USENIX-Security)
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Usenix Security Symposium (USENIX-Security)
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Usenix Security Symposium (USENIX-Security)
Foundations and Trends® in Privacy and Security Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
Conference on Empirical Methods in Natural Language Processing (EMNLP)
The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective