E-mail senden E-Mail Adresse kopieren
2025-05-12

On the Effectiveness of Prompt Stealing Attacks on In-The-Wild Prompts

Zusammenfassung

Large Language Models (LLMs) have increased demand for high-quality prompts, now considered valuable commodities in prompt marketplaces. However, this demand has also led to the emergence of prompt stealing attacks, where the adversary attempts to infer prompts from generated outputs, threatening the intellectual property and business models of these marketplaces. Previous research primarily examines prompt stealing on academic datasets. The key question remains unanswered: Do these attacks genuinely threaten in-the-wild prompts curated by real-world users? In this paper, we provide the first systematic study on the efficacy of prompt stealing attacks against in-the-wild prompts. Our analysis shows that in-the-wild prompts differ significantly from academic ones in length, semantics, and topics. Our evaluation subsequently reveals that current prompt stealing attacks perform poorly in this context. To improve attack efficacy, we employ a Text Gradient based method to iteratively refine prompts to better reproduce outputs. This leads to enhanced attack performance, as evidenced by improvements in METEOR score improvements from 0.207 to 0.253 for prompt recovery and from 0.323 to 0.440 for output recovery. Despite these improvements, we showcase that the fundamental challenges persist, highlighting the necessity for further research to improve and evaluate the effectiveness of prompt stealing attacks in practical scenarios.

Konferenzbeitrag

IEEE Symposium on Security and Privacy (S&P)

Veröffentlichungsdatum

2025-05-12

Letztes Änderungsdatum

2025-05-08