"In the beginning, everything happened very quickly," CISPA researcher Sahar Abdelnabi recounts. "In mid-February, we were approached by Kai Greshake and Dr. Christoph Endres. They had this hypothesis that integrated LLM applications were affected by security vulnerabilities. After they shared with us their initial idea, we ran the first experiments that sucessfully demonstrated the vulnerability, and within only a week we wrote up our initial investigation. We wanted to do this as quickly as possible because it was a very prominent topic at the time." It quickly became apparent that the hypotheses were right. "Later after Bing Chat and GPT-4 release, we confirmed the attacks on actual real-world applications and performed a detailed and deeper investigation of the threats," Sahar continues.
With their experiments, the research group was able to show how unwanted instructions can be given to LLMs by manipulating the data they access. This process is known as "indirect prompt injections" because these attacks are carried out without the users’ knowledge. In concrete terms, this means that users of AI chatbots such as BingChat, which are integrated into other applications, could be provided with false information or seduced into taking unwise actions. The vulnerabilities were disclosed to vendors such as OpenAI and Microsoft, but due to their complexity could not be fully fixed yet.
Federal Office for Information Security takes up issue
In February 2023, the CISPA researchers published the vulnerabilities that had been found in a first report. This led, among other things, to the German Federal Office for Information Security (BSI) publishing a position paper on the topic entitled "Large AI Language Models - Opportunities and Risks for Industry and Authorities." In mid-July, the BSI again raised awareness of the issue, giving specific recommendations for companies working with the technology. "When integrating LLMs into applications, a systematic risk analysis should be performed that explicitly assesses the risk posed by Indirect Prompt Injections," the report states. In addition, the BSI recommends that "human control and authorization is performed" before potentially critical AI actions are executed.
Reactions in the media and EU policy
The findings of the CISPA researchers and IT-security consultants were met with much media attention. Numerous national and international IT portals such as Heise, Wired and Hackaday as well as various news media picked up on the topic. Via the EU research network ELSA (European Lighthouse on Safe and Secure AI), which is coordinated by CISPA-Faculty Professor Dr. Mario Fritz, the topic was also taken to EU level. The AI Networks of Excellence Centres (NoEs), of which ELSA is a member, have proposed to develop the concept for a "European AI Challenge" together with the European Commission in the coming months. Mario pins high hopes on this: "We can only make the innovation opportunities offered by AI, Foundation Models and Large Language Models a sustainable success if Europe catches up with the global leaders and uses its unique approach to safe and trustworthy AI as a locational advantage."
Advantages of cooperation between science and practice
From a Saarland perspective, the project is above all an excellent example of successful regional cooperation between CISPA as a science partner and sequire technology as an IT service provider. Kai Greshake, one of the first authors of the study, works for sequire, which offers services in the field of security, including consulting on the secure use of AI systems. For CISPA researcher Sahar Abdelnabi, the benefits of collaborations such as this are obvious. "Collaboration with companies and startups offers more perspectives on the practical applications of these models for end-users," Sahar recounts. Dr. Christoph Endres of sequire technology appreciates the proximity to top-level research in Saarbrücken: "Only a few hours after Kai's discovery, we had found competent contacts at CISPA who evaluated, confirmed and further developed his results - in an uncomplicated and unbureaucratic manner. We are very grateful for this."
The full results of the study are available in a detailed scientific publication in English, which can be downloaded here. Most recently, the work has been presented at a workshop at the International Conference on Machine Learning (ICML) as well as the BlackHat conference, the world's largest industry conference in IT security.