Annual Meeting of the Association for Computational Linguistics (ACL)
Annual Meeting of the Association for Computational Linguistics (ACL)
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training
Annual Meeting of the Association for Computational Linguistics (ACL)
ACM Conference on Computer and Communications Security (CCS)
UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images
Usenix Security Symposium (USENIX-Security)
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language
Annual Meeting of the Association for Computational Linguistics (ACL)
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
Annual Meeting of the Association for Computational Linguistics (ACL)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
IEEE Symposium on Security and Privacy (S&P)
GPTracker: A Large-Scale Measurement of Misused GPTs
IEEE Symposium on Security and Privacy (S&P)
On the Effectiveness of Prompt Stealing Attacks on In-The-Wild Prompts
Usenix Security Symposium (USENIX-Security)
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns