Large language model (LLM) powered agents, particularly GPTs by OpenAI, have revolutionized how AI is customized, deployed, and used. However, misuse of GPTs has emerged as a critical, yet largely underexplored, issue within OpenAI’s GPT Store. In this paper, we present the first large-scale measurement study on misused GPTs. We introduce GPTracker, a framework designed to continuously collect GPTs from the official GPT Store and automate the interaction with them. As of the submission of this paper, GPTracker has collected 755,297 GPTs and 28,464 GPT conversation flows over eight months. Using an LLM-driven scoring system combined with human review, we identify 2,051 misused GPTs across ten forbidden scenarios. Through both static and dynamic analyses, we explore the landscape of these misused GPTs, including the trends, builders, operation mechanisms, and effectiveness. We find that builders of misused GPTs employ various tactics to bypass OpenAI's review system, such as integrating external APIs, hiding intention in descriptions, and URL redirection. Notably, GPTs activating external APIs are more likely to provide answers to inappropriate queries than other misused GPTs, showing an average 22.81\% increase in answer rate in the Illegal Activity scenario. Leveraging VirusTotal, we identify 50 malicious domains shown on 446 GPTs, where 33 are labeled as phishing, 28 as malware, and 2 as spam, with some domains receiving multiple labels. We responsibly disclosed our findings to OpenAI on September 11, 2024, and November 12, 2024. 1,316 out of 1,804 GPTs reported in the first disclosure were removed by September 25. Our study sheds light on the alarming misuse of GPTs in the emerging GPT marketplace and offers actionable recommendations for stakeholders to mitigate future misuse.
IEEE Symposium on Security and Privacy (S&P)
2025-05-12
2025-05-08