Send email Copy Email Address
2024-07-17

Towards Understanding Unsafe Video Generation

Summary

Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models can indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three opensource SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we create an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identify 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we create the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called Latent Variable Defense (LVD), which works within the models internal sampling process. Since LVD is capable of detecting unsafe samples at the initial phase of the inference process, it achieves a defense accuracy of 0.90 while reducing time and computational resources by 10× when sampling a large number of unsafe prompts. Our experiment includes three open-source SOTA video diffusion models, each achieving accuracy rates of 0.99, 0.92, and 0.91, respectively. Additionally, our method is tested with adversarial prompts and on image-to-video diffusion models, and achieves above 0.90 accuracy on both settings. Our method also shows its interoperability by improving the performance of other defenses when combined with them. We will publish our constructed video dataset1 and code2 .

Conference Paper

Network and Distributed System Security Symposium (NDSS)

Date published

2024-07-17

Date last modified

2025-03-06