Send email Copy Email Address

Email

Address

Im Oberen Werk 1
66386 St. Ingbert (Germany)

Awards (selection)

2022: Busy Beaver Award for "Privacy of Machine Learning"

2019: Best paper award at NDSS 

Short Bio

Dr. YAng Zhang is Faculty at CISPA. His research concentrates on trustworthy machine learning (privacy, safety, and security). Moreover, he works on measuring and understanding misinformation and unsafe content like hateful memes on the Internet. Over the years, he has published multiple papers at top venues in computer science, including CCS, NDSS, Oakland, and USENIX Security. His work has received the NDSS 2019 distinguished paper award and the CCS 2022 best paper award runner-up.

CV: Last stations

Since 2020
Faculty at CISPA Helmholtz Center for Information Security
2019 - 2020
Research Group Leader at CISPA Helmholtz Center for Information Security
2017 - 2018
Postdoctoral Researcher - Host: Michael Backes - CISPA, Saarland University
2012 - 2016
Ph.D. in Computer Science at University of Luxembourg, highest honor

Publications by Yang Zhang

Year 2026

Conference / Medium

European Association for Computational Linguistics (EACL)
Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Article

IEEE Transactions on Dependable and Secure ComputingBackdoor Complications: A Comprehensive Analysis and Mitigation of the Unforeseen Consequences of Backdoor Attacks

Conference / Medium

National Conference of the American Association for Artificial Intelligence (AAAI)
SL-CBM: Enhancing Concept Bottleneck Models with Semantic Locality for Better Interpretability

Year 2025

Conference / Medium

Conference on Neural Information Processing Systems (NeurIPS)
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Conference / Medium

Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms

Conference / Medium

Conference on Empirical Methods in Natural Language Processing (EMNLP)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Conference / Medium

IEEE International Conference on Computer Vision (ICCV)
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

Conference / Medium

ACM Conference on Computer and Communications Security (CCS)
UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Article

IEEE Transactions on Dependable and Secure ComputingRevealing the Risk of Hyper-parameter Leakage in Deep Reinforcement Learning Models

Conference / Medium

Usenix Security Symposium (USENIX-Security)
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning