Home CISPA Helmholtz Center for Information Security

2026-07-14

GEO-Detective:
Unveiling Location Privacy Risks in Images with LLM Agents

Conference / Medium

European Conference on Computer Vision (ECCV) GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2026-07-07

Peering Behind the Shield:
Guardrail Identification in Large Language Models

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL) Peering Behind the Shield: Guardrail Identification in Large Language Models

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2026

InferPilot:
Autonomous Inference Attacks Against ML Services With LLM-Based Agents

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL) InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

Tags

Authors

Full Paper Visit Detail Page

2026

Rethinking Assessments of Prompt Injection Attacks

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL) Rethinking Assessments of Prompt Injection Attacks

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-10-15

UnsafeBench:
Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Conference / Medium

ACM Conference on Computer and Communications Security (CCS) UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-08-14

Synthetic Artifact Auditing:
Tracing LLM-Generated Synthetic Data Usage in Downstream Applications

Conference / Medium

Usenix Security Symposium (USENIX-Security) Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications

Tags

Authors

Yixin Wu

Full Paper Visit Detail Page

2025-08-13

On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

Conference / Medium

Usenix Security Symposium (USENIX-Security) On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

Tags

Authors

Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang

Full Paper Visit Detail Page

2025-05-08

HateBench:
Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

Conference / Medium

Usenix Security Symposium (USENIX-Security) HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-05-08

HateBench:
Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

Conference / Medium

Usenix Security Symposium (USENIX-Security)

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025

Safety at Scale:
A Comprehensive Survey of Large Model and Agent Safety

Article

Foundations and Trends® in Privacy and Security Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Tags

Trustworthy Information Processing

Authors

Xingjun Ma
Yifeng Gao
Yixu Wang
Ruofan Wang
Xin Wang
Ye Sun
Yifan Ding
Hengyuan Xu
Yunhao Chen
Yunhao Zhao
Hai Huang
Yige Li
Yixin Wu
Jiaming Zhang
Xiang Zheng
Yang Bai
Yiming Li
Zheyun Wu
Xipeng Qiu
Jingfeng Zhang
Xudong Han
Haonan Li
Jun Sun
Cong Wang
Jindong Gu
Baoyuan Wu
Siheng Chen
Tianwei Zhang
Yiyong Liu
Mingming Gong
Tongliang Liu
Shirui Pan
Cihang Xie
Tianyu Pang
Yinpeng Dong
Ruoxi Jia
Yang Zhang
Shiqing Ma
Xiangyu Zhang
Neil Gong
Chaowei Xiao
Sarah Erfani
Tim Baldwin
Bo Li
Masashi Sugiyama
Dacheng Tao
James Bailey
Yu-Gang Jiang

Full Paper Visit Detail Page

Yixin Wu

Email

Address

Publications by Yixin Wu

Year 2026

Year 2025