E-mail senden E-Mail Adresse kopieren
2026-03-24

Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Zusammenfassung

Vision Language Models (VLMs) have demonstrated remarkable capabilities in processing multimodal data, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally Identifiable Information (PII) leakage. While relevant research has been conducted on single-modal language models to some extent, the vulnerabilities in the multimodal setting have yet to be fully investigated. Our work assesses these emerging risks and introduces a concept-guided mitigation approach. By identifying and modifying the model’s internal states associated with PII-related content, our method guides VLMs to refuse PII-sensitive tasks effectively and efficiently, without requiring re-training or finetuning. We also address the current lack of multimodal PII datasets by constructing various ones that simulate real-world scenarios. Experimental results demonstrate the method can achieve on average 93.3% refusal rate for various PII-related tasks with minimal impact on unrelated model performances. We further examine the mitigation’s performance under various conditions to show the adaptability of our proposed method.

Konferenzbeitrag

European Association for Computational Linguistics (EACL)

Veröffentlichungsdatum

2026-03-24

Letztes Änderungsdatum

2026-03-04