Home CISPA Helmholtz Center for Information Security

2026-07-01

DE-CLIP: Few-Shot Anomaly Detection via Difference-Guided Embedding Editing

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)

Tags

Authors

Full Paper Visit Detail Page

2026-04-09

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Tags

Authors

Rui Zhang
Hongwei Li
Yun Shen
Xinyue Shen
Wenbo Jiang
Guowen Xu
Yang Liu
Michael Backes
Yang Zhang

Full Paper Visit Detail Page

2026

Open Schrödinger’s Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)

Tags

Authors

Full Paper Visit Detail Page

2025-10-15

UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Conference / Medium

ACM Conference on Computer and Communications Security (CCS)
UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-08-13

From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language

Conference / Medium

Usenix Security Symposium (USENIX-Security)
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language

Tags

Trustworthy Information Processing

Authors

Yihan Ma
Xinyue Shen
Yiting Qu
Ning Yu
Michael Backes
Savvas Zannettou
Yang Zhang

Full Paper Visit Detail Page

2025-07-27

When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs

Tags

Trustworthy Information Processing

Authors