Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms
International Conference on Learning Representations (ICLR)
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Usenix Security Symposium (USENIX-Security)
Two-in-One: A Model Hijacking Attack Against Text Generation Models
ACM Conference on Computer and Communications Security (CCS)