Send email Copy Email Address

2024-10-28
 

Prompt stealing: CISPA researcher discovers new attack scenario for text-to-image generation models

In a study entitled "Prompt Stealing Attacks Against Text-to-Image Generation Models", presented at the USENIX Security Symposium 2024 this summer, CISPA researcher Xinyue Shen demonstrates that reverse engineering can also be successful with AI-generated images. Using a tool called PromptStealer that she developed, Shen and her colleagues were able to extract the original prompt from AI-generated images. With this, she uncovers a new attack scenario for text-to-image generation models and, at the same time, provides a protection mechanism called PromptShield.

Due to the immense increase in the quality of their generated results, AI image generators have received a lot of attention recently. Most image generators, such as Stable Diffusion or DALL-E, are text-to-image generators. In order to generate a perfect image, precise text input, the so-called prompts, are a crucial factor. As this requires highly specialized knowledge, a new profession has evolved in the form of prompt engineers. However, there is another peculiarity with AI images, explains CISPA researcher Xinyue Shen: "To get an image in a certain style, you need a precise description as well as a so-called modifier, which describes the style of the image. Without this, the results tend to be arbitrary."

But there was more that caught Shen's attention: "I realized that as a result of the importance of prompts, a new market has emerged," says Shen. "Prompt engineers sell their text input for generating AI images on platforms like Promptbase." With just a few clicks and some euros, anyone who is interested can purchase a prompt for a specific image and save themselves time-consuming trial and error. However, digital marketplaces often bring with them new attack scenarios. "We wanted to find out if there was a way to obtain the prompts without paying for them," explains Shen. "We called this scenario prompt stealing." According to the researchers, this means that the prompt is extracted from an AI-generated image without the consent of the prompt engineer, depriving platforms such as Promptbase of their commercial basis.

A new tool named PromptStealer

Initial attempts to generate the prompt using a text decoder did not bring the desired results, so Shen decided to develop her own tool. She found that both the image description and a specific modifier are crucial for a precise prompt. "We named the new method PromptStealer," says Shen. "As both the subject and the modifiers are important, we solve the issue step by step in our tool. First, we use a subject generator in order to obtain the subject depicted in the picture. Then we use a detector for the modifiers to predict them precisely too." The CISPA researcher conducted both quantitative and qualitative analysis to prove that PromptStealer provides better results than other methods such as Image Captioning or CLIP Interrogator. The AI images generated with the prompts of PromptStealer were the ones closest to the original image.

PromptShield against attacks

Being cybersecurity researchers, Shen and her colleagues have also been thinking about how to prevent prompt stealing attacks. "An obvious idea was to consider ways to reduce the performance of machine learning models," the CISPA researcher explains. "Our aim was to prevent models such as PromptStealer from recognizing the modifier used. That's because recognizing the exact modifier is crucial for a precise prompt". She successfully prevented this by adding perturbations to the AI-generated image. The relevance of this attack scenario discovered by Shen is shown by the fact that it has already been recognized in the Vulnerability Severity Classification for AI Systems by the software company Microsoft. The CISPA researcher has made the data from her study freely available on the Internet. It includes a curated dataset with 61,467 AI-generated images from the Lexica platform, as well as the code for her tool PromptStealer.