2025-06-30

Neeko: Model Hijacking Attacks Against Generative Adversarial Networks

Zusammenfassung

Generative models have garnered significant interest in the realm of machine learning but are costly to produce and face growing regulatory constraints, requiring resource-heavy training and collaboration with various stakeholders, especially data providers. Such collaborative environments have given rise to a new threat known as model hijacking attacks. Adversaries can tamper with the training process to embed a hidden task, so that train/hijack high-end models at minimal costs or even sidestep regulations. In this paper, we extend the scope of model hijacking from classifiers to generative models by introducing the first model hijacking attack tailored for Generative Adversarial Networks (GANs), namely Neeko. Neeko is based on a novel U-Net-based Disguiser and allows a compromised GAN to generate authentic-looking images from its original distribution, but when downscaled, these images are visually changed to be from the hijacking dataset distribution. Through experiments on different image benchmark datasets, we demonstrate the efficacy and stealthiness of Neeko. Neeko poses security and accountability risks associated with training public GANs on potentially malicious or illegal datasets and raises concerns about evading those regulations addressing deepfakes and synthetic images.