2026-06-17

Watermark Degradation Across Model Iterations

Zusammenfassung

Modern image generative models are capable of producing photorealistic images that become increasingly indistinguishable from real data. As those images are published online, they are often scraped for subsequent training runs of new generative models. This practice of training on generated data has been shown to degrade model performance and amplify existing biases. A possible mitigation lies in embedding watermarks into generated content to identify synthetic content and enable data provenance. However, the persistence of watermarks through iterative training-generation cycles remains poorly understood. In this work, we investigate how image watermarks degrade across multiple generations of model training. We adopt a dataset inference framework that aggregates weak per-sample signals via statistical hypothesis testing, enabling reliable detection even when watermark traces are subtle. We observe disparate behavior across watermarking methods with one of them remaining highly detectable in subsequent generations, even when the watermarked training data constitutes only 1% of the training data, while other watermarks degrade severely throughout generations and have a finite lifetime. Our findings highlight the need for more robust watermarking methods that are designed to withstand retraining for reliable data provenance.