Home CISPA Helmholtz Center for Information Security

2026-07-02

Open Schrödinger’s Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)
Open Schrödinger’s Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2026-07-01

DE-CLIP: Few-Shot Anomaly Detection via Difference-Guided Embedding Editing

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)

Tags

Authors

Full Paper Visit Detail Page

2026-04-09

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Tags

Authors

Rui Zhang
Hongwei Li
Yun Shen
Xinyue Shen
Wenbo Jiang
Guowen Xu
Yang Liu
Michael Backes
Yang Zhang

Full Paper Visit Detail Page

2025-10-15

UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Conference / Medium

ACM Conference on Computer and Communications Security (CCS)
UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-08-13

From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language

Conference / Medium

Usenix Security Symposium (USENIX-Security)
From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language

Tags

Trustworthy Information Processing

Authors

Yihan Ma
Xinyue Shen
Yiting Qu
Ning Yu
Michael Backes
Savvas Zannettou
Yang Zhang

Full Paper Visit Detail Page

2025-07-27

When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs

Tags

Trustworthy Information Processing

Authors