Conference on Neural Information Processing Systems (NeurIPS)
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms
Usenix Security Symposium (USENIX-Security)
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
International Conference on Learning Representations (ICLR)
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
ACM Conference on Computer and Communications Security (CCS)
ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models