20 CISPA Papers at CCS 2024.

The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection and privilege escalation. We also observe that jailbreak prompts increasingly shift from online Web communities to prompt-aggregation websites and 28 user accounts have consistently optimized jailbreak prompts over 100 days. To assess the potential harm caused by jailbreak prompts, we create a question set comprising 107,250 samples across 13 forbidden scenarios. Leveraging this dataset, our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4, and the earliest one has persisted online for over 240 days. We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.

read full paper

Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these independently fine-tuned models. MM directly combines multiple fine-tuned task-specific models into a merged model without additional training, and the resulting model shows enhanced capabilities in multiple tasks. Although MM provides great utility, it may come with security risks because an adversary can exploit MM to affect multiple downstream tasks. However, the security risks of MM have barely been studied. In this paper, we first find that MM, as a new learning paradigm, introduces unique challenges for existing backdoor attacks due to the merging process. To address these challenges, we introduce BadMerging, the first backdoor attack specifically designed for MM. Notably, BadMerging allows an adversary to compromise the entire merged model by contributing as few as one backdoored task-specific model. BadMerging comprises a two-stage attack mechanism and a novel feature-interpolation-based loss to enhance the robustness of embedded backdoors against the changes of different merging parameters. Considering that a merged model may incorporate tasks from different domains, BadMerging can jointly compromise the tasks provided by the adversary (on-task attack) and other contributors (off-task attack) and solve the corresponding unique challenges with novel attack designs. Extensive experiments show that BadMerging achieves remarkable attacks against various MM algorithms. Our ablation study demonstrates that the proposed attack designs can progressively contribute to the attack performance. Finally, we show that prior defense mechanisms fail to defend against our attacks, highlighting the need for more advanced defense.

read full paper

We propose the first constructions of anonymous tokens with decentralized issuance. Namely, we consider a dynamic set of signers/issuers; a user can obtain a token from any subset of the signers, which is publicly verifiable and unlinkable to the issuance process. To realize this new primitive we formalize the notion of blind multi-signatures (BMS), which allow a user to interact with multiple signers to obtain a (compact) signature; even if all the signers collude they are unable to link a signature to an interaction with any of them. We then present two BMS constructions, one based on BLS signatures and a second based on discrete logarithms without pairings. We prove security of both our constructions in the Algebraic Group Model. We also provide a proof-of-concept implementation and show that it has low-cost verification, which is the most critical operation in blockchain applications.

read full paper

A recent trend towards running more demanding web applications, such as video games or client-side LLMs, in the browser has led to the adoption of the WebGPU standard that provides a cross-platform API exposing the GPU to websites. This opens up a new attack surface: Untrusted web content is passed through to the GPU stack, which traditionally has been optimized for performance instead of security. Worsening the problem, most of WebGPU cannot be run in the tightly sandboxed process that manages other web content, which eases the attacker's path to compromising the client machine. Contrasting its importance, WebGPU shader processing has received surprisingly little attention from the automated testing community. Part of the reason is that shader translators expect highly structured and statically typed input, which renders typical fuzzing mutations ineffective. Complicating testing further, shader translation consists of a complex multi-step compilation pipeline, each stage presenting unique requirements and challenges. In this paper, we propose DarthShader, the first language fuzzer that combines mutators based on an intermediate representation with those using a more traditional abstract syntax tree. The key idea is that the individual stages of the shader compilation pipeline are susceptible to different classes of faults, requiring entirely different mutation strategies for thorough testing. By fuzzing the full pipeline, we ensure that we maintain a realistic attacker model. In an empirical evaluation, we show that our method outperforms the state-of-the-art fuzzers regarding code coverage. Furthermore, an extensive ablation study validates our key design. DarthShader found a total of 39 software faults in all modern browsers Chrome, Firefox, and Safari that prior work missed. For 15 of them, the Chrome team assigned a CVE, acknowledging the impact of our results.

Read full paper

A randomness beacon is a source of continuous and publicly verifiable randomness which is of crucial importance for many applications. Existing works on distributed randomness beacons suffer from at least one of the following drawbacks: (i) security only against a static/non-adaptive adversary, (ii) each epoch takes many rounds of communication, or (iii) computationally expensive tools such as Proof-of-Work (PoW) or Verifiable Delay Functions (VDF). In this paper, we introduce , the first adaptively secure randomness beacon protocol that overcomes all these limitations while preserving simplicity and optimal resilience in the synchronous network setting. We achieve our result in two steps. First, we design a novel distributed key generation (DKG) protocol that runs in bits of communication but, unlike most conventional DKG protocols, outputs both secret and public keys as group elements. Here, denotes the security parameter. Second, following termination of , parties can use their keys to derive a sequence of randomness beacon values, where each random value costs only a single asynchronous round and bits of communication. We implement and evaluate it using a network of up to 64 parties running in geographically distributed AWS instances. Our evaluation shows that can produce about 2 beacon outputs per second in a network of 64 parties. We compare our protocol to the state-of-the-art randomness beacon protocols in the same setting and observe that it vastly outperforms them.

read full paper

We introduce Helium, a novel framework that supports scalable secure multiparty computation (MPC) for lightweight participants and tolerates churn. Helium relies on multiparty homomorphic encryption (MHE) as its core building block. While MHE schemes have been well studied in theory, prior works fall short of addressing critical considerations paramount for adoption such as supporting resource-constrained and unstably connected participants. In this work, we systematize the requirements of MHE-based MPC protocols from a practical lens, and we propose a novel execution mechanism that addresses those considerations. We implement this execution mechanism in Helium, which makes it the first implemented framework to support MPC under network churn based solely on cryptographic assumptions. We show that a Helium network of 30 parties connected with 100Mbits/s links and experiencing a system-wide churn rate of 40 failures per minute can compute the product between a fixed 512 × 512 secret matrix (e.g., a collectively-trained private model) and a fresh secret vector (e.g., a feature vector) 8.3 times per second. This is ∼1500 times faster than a state-of-the-art MPC framework operating under no churn.

read full paper

Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models from the perspectives of safety, bias, and authenticity. Our findings, centered on Stable Diffusion, indicate that model updates paint a mixed picture. While updates progressively reduce the generation of unsafe images, the bias issue, particularly in gender, intensifies. We also find that negative stereotypes either persist within the same Non-White race group or shift towards other Non-White race groups through SD updates, yet with minimal association of these traits with the White race group. Additionally, our evaluation reveals a new concern stemming from SD updates: State-of-the-art fake image detectors, initially trained for earlier SD versions, struggle to identify fake images generated by updated versions. We show that fine-tuning these detectors on fake images generated by updated versions achieves at least 96.6\% accuracy across various SD versions, addressing this issue. Our insights highlight the importance of continued efforts to mitigate biases and vulnerabilities in evolving text-to-image models.

read full paper

Key Encapsulation Mechanisms (KEMs) are a critical building block for hybrid encryption and modern security protocols, notably in the post-quantum setting. Given the asymmetric public key of a recipient, the primitive establishes a shared secret key between sender and recipient. In recent years, a large number of abstract designs and concrete implementations of KEMs have been proposed, e.g., in the context of the NIST process for post-quantum primitives. In this work, we (i) establish stronger security notions for KEMs, and (ii) develop a symbolic analysis method to analyze security protocols that use KEMs. First, we generalize existing security notions for KEMs in the computational setting, introduce several stronger security notions and prove their relations. Our new properties formalize in which sense outputs of the KEM uniquely determine, i.e., bind, other values. Our new binding properties can be used, e.g., to prove the absence of attacks that were not captured by prior security notions. Among these, we identify a new class of attacks that we coin re-encapsulation attacks. Second, we develop a family of fine-grained symbolic models that correspond to our hierarchy of computational security notions, and are suitable for the automated analysis of KEM-based security protocols. We encode our models as a library in the framework of the Tamarin prover. Given a KEM-based protocol, our approach can automatically derive the minimal binding properties required from the KEM; or, if also given a concrete KEM, can analyze if the protocol meets its security goals. In case studies, Tamarin automatically discovers, e.g., that the key exchange protocol proposed in the original Kyber paper [12] requires stronger properties from the KEM than were proven in [12].

read full paper

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.

read full paper

Nowadays, powerful large language models (LLMs) such as ChatGPT have demonstrated revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. Consequently, the detection of machine-generated texts (MGTs) is becoming increasingly crucial as LLMs become more advanced and prevalent. These models have the ability to generate human-like language, making it challenging to discern whether a text is authored by a human or a machine. This raises concerns regarding authenticity, accountability, and potential bias. However, existing methods for detecting MGTs are evaluated using different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework that encompasses various methodologies. Furthermore, it remains unclear how existing detection methods would perform against powerful LLMs. In this paper, we fill this gap by proposing the first benchmark framework for MGT detection against powerful LLMs, named MGTBench. Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods. Our ablation study shows that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples. Additionally, our findings reveal that metric-based/model-based detection methods exhibit better transferability across different LLMs/datasets. Furthermore, we delve into a more challenging task: text attribution, where the goal is to identify the originating model of a given text, i.e., whether it is a specific LLM or authored by a human. Our findings indicate that the model-based detection methods still perform well in the text attribution task. To investigate the robustness of different detection methods, we consider three adversarial attacks, namely paraphrasing, random spacing, and adversarial perturbations. We discover that these attacks can significantly diminish detection effectiveness, underscoring the critical need for the development of more robust detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of powerful MGT detection methods on their respective datasets and the development of more advanced MGT detection methods.

read full paper

Network-facing applications are commonly exposed to all kinds of attacks, especially when connected to the internet. As a result, web servers like Nginx or client applications such as curl make every effort to secure and harden their code to rule out memory safety violations. One would expect this to include regular fuzz testing, as fuzzing has proven to be one of the most successful approaches to uncovering bugs in software. Yet, surprisingly little research has focused on fuzzing network applications. When studying the underlying reasons, we find that the interactive nature of communication, its statefulness, and the protection of exchanged messages (e.g., via encryption or cryptographic signatures) render typical fuzzers ineffective. Attempts to replay recorded messages or modify them on the fly only work for specific targets and often lead to early termination of communication. In this paper, we discuss these challenges in detail, highlighting how the focus of existing work on protocol state space promises little relief. We propose a fundamentally different approach that relies on fault injection rather than modifying messages. Effectively, we force one of the communication peers into a weird state where its output no longer matches the expectations of the target peer, potentially uncovering bugs. Importantly, this weird peer can still properly encrypt/sign the protocol message, overcoming a fundamental challenge of current fuzzers. In effect, we leave the communication system intact but introduce small corruptions. Since we can turn either the server or the client into the weird peer, our approach is the first that can effectively test client-side network applications. In an extensive evaluation of 16 targets, we show that our prototype Fuzztruction-Net significantly outperforms other fuzzers in terms of coverage and bugs found. Overall, Fuzztruction-Net uncovered 23 new bugs in well-tested software, such as the web servers Nginx and Apache HTTPd and the OpenSSH client.

read full paper

Browser extensions are third-party add-ons that provide myriads of features to their users while browsing on the Web. Extensions often interact with the websites a user visits and perform various operations such as DOM-based manipulation, script injections, and so on. However, this also enables nefarious websites to track their visitors by fingerprinting extensions. Researchers in the past have shown that extensions are susceptible to fingerprinting based on the resources they include, the styles they deploy, or the DOM-based modifications they perform. Fortunately, the current extension ecosystem contains safeguards against many such known issues through appropriate defense mechanisms. We present the first study to investigate the fingerprinting characteristics of extension-injected code in pages’ JavaScript namespace and through other observable side-effects like changed cookies. Doing so, we find that many extensions inject JavaScript that pollutes the applications’ global namespace by registering variables. It also enables the attacker application to monitor the execution of the injected code by overwriting the JavaScript APIs and capturing execution traces through the stacktrace, the set of APIs invoked, etc. Further, extensions also store data on the client side and perform event-driven functionalities that aid in attribution. Through our tests, we find 2,747 Chrome and 572 Firefox extensions to be susceptible to fingerprinting. Unfortunately, none of the existing defense mechanisms prevent extensions from being fingerprinted through our proposed vectors. Therefore, we also suggest potential measures for developers and browser vendors to safeguard the extension ecosystem against such fingerprinting attempts.

read full paper

Most existing membership inference attacks (MIAs) utilize metrics (e.g., loss) calculated on the model's final state, while recent advanced attacks leverage metrics computed at various stages, including both intermediate and final stages, throughout the model training. Nevertheless, these attacks often process multiple intermediate states of the metric independently, ignoring their time-dependent patterns. Consequently, they struggle to effectively distinguish between members and non-members who exhibit similar metric values, particularly resulting in a high false-positive rate. In this study, we delve deeper into the new membership signals in the black-box scenario. We identify a new, more integrated membership signal: the Pattern of Metric Sequence, derived from the various stages of model training. We contend that current signals provide only partial perspectives of this new signal: the new one encompasses both the model's multiple intermediate and final states, with a greater emphasis on temporal patterns among them. Building upon this signal, we introduce a novel attack method called Sequential-metric based Membership Inference Attack (SeqMIA). Specifically, we utilize knowledge distillation to obtain a set of distilled models representing various stages of the target model's training. We then assess multiple metrics on these distilled models in chronological order, creating distilled metric sequence. We finally integrate distilled multi-metric sequences as a sequential multiformat and employ an attention-based RNN attack model for inference. Empirical results show SeqMIA outperforms all baselines, especially can achieve an order of magnitude improvement in terms of TPR @ 0.1% FPR. Furthermore, we delve into the reasons why this signal contributes to SeqMIA's high attack performance, and assess various defense mechanisms against SeqMIA.

read full paper

The video game market is one of the biggest for software products. Video game development has progressed in the last decades to complex and multifaceted endeavors. Gamesas-a-Service significantly impacted distribution and gameplay, requiring providers and developers to consider factors beyond game functionality, including security and privacy. New security challenges emerged, including authentication, payment security, and user data or asset protection. However, the security community lacks in-depth insights into the security experiences, challenges, and practices of modern video game development. This paper aims to address this gap in research and highlights the criticality of considering security in the process. Therefore, we conducted 20 qualitative, semi-structured interviews with various roles of professional and skilled video game development experts, investigating awareness, priorities, knowledge, and practices regarding security in the industry through their first-hand experiences. We find that stakeholders are aware of the urgency of security and related issues. However, they often face obstacles, including a lack of money, time, and knowledge, which force them to put security issues lower in priority. We conclude our work by recommending how the game industry can incorporate security into its development processes while balancing other resources and priorities and illustrating ideas for future research.

read full paper

This work addresses the verification gap between formal protocol specifications and their real-world implementations by monitoring compliance with formal specifications. We achieve this by instrumenting the networking and cryptographic libraries used by applications to generate event streams, even without access to the source code. An efficient algorithm is then employed to match these event streams against valid traces defined in the formal specification. Unlike previous approaches, our algorithm is capable of handling non-determinism, allowing it to support multiple concurrent sessions. Furthermore, our method introduces minimal overhead, as demonstrated through experiments on the WireGuard userspace implementation and a case study based on prior work. Notably, we find that the reference Tamarin model for WireGuard requires only minor adjustments, such as defining wire formats and correcting small inaccuracies uncovered during our case study. Finally, we provide formal proofs of soundness and completeness for our algorithm, ensuring that it accepts only valid event streams according to the specification and guarantees that all such valid streams are recognized.

read full paper

Metaverses are virtual worlds where users can engage in social exchanges, collaborate, or play games. Their clients now are JavaScript programs that run inside modern web browsers. They implement functionalities typical of multiplayer video games, like 3D and physics engines, requiring them to maintain complex data structures of objects in the browser’s memory. Unfortunately, these objects can be accessed and manipulated by malicious users, allowing them to learn about events beyond the ones rendered on screen or to hijack the physics of the metaverse to spy on other users.In this paper, we propose one of the first comprehensive security assessments for web clients of metaverse platforms. We begin with a survey and selection of three metaverse platforms and introduce a software-centric threat modeling approach designed to identify the security-relevant entities. Then, we propose a JavaScript global object snapshot diffing technique to identify in-memory objects correlated with the attribute and design 10 attacks, of which eight successfully executed against at least one of the metaverses, enabling a malicious user to perform audio/video surveillance or continuous user position tracking — to mention a few — who could exacerbate current threats posed by stalkers and online abusers. Finally, we discuss the implications of our attacks should the metaverse become a business tool and possible solutions.

read full paper

The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model’s tendency to hallucinate and catastrophic forgetting (CF) in the pre-training stage, rendering the veracity of divulged PIIs negligible. In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs. We formalize the privacy leakage problem in LLMs and explain why forgotten PIIs can be recovered through empirical analysis on open-source language models. Based upon these insights, we evaluate the performance of Janus on both open-source language models and two latest LLMs, i.e., GPT-3.5-Turbo and LLaMA-2-7b. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline and significantly outperforms the state-of-the-art privacy extraction attacks including prefix attacks and in-context learning (ICL). Furthermore, our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack, allowing an adversary to conduct such an attack at a low cost.

read full paper

Following the recent release of AI assistants, such as OpenAI’s ChatGPT and GitHub Copilot, the software industry quickly utilized these tools for software development tasks, e.g., generating code or consulting AI for advice. While recent research has demonstrated that AI-generated code can contain security issues, how software professionals balance AI assistant usage and security remains unclear. This paper investigates how software professionals use AI assistants in secure software development, what security implications and considerations arise, and what impact they foresee on secure software development. We conducted 27 semi-structured interviews with software professionals, including software engineers, team leads, and security testers. We also reviewed 190 relevant Reddit posts and comments to gain insights into the current discourse surrounding AI assistants for software development. Our analysis of the interviews and Reddit posts finds that despite many security and quality concerns, participants widely use AI assistants for security-critical tasks, e.g., code generation, threat modeling, and vulnerability detection. Their overall mistrust leads to checking AI suggestions in similar ways to human code, although they expect improvements and, therefore, a heavier use for security tasks in the future. We conclude with recommendations for software professionals to critically check AI suggestions, AI creators to improve suggestion security and capabilities for ethical security tasks, and academic researchers to consider general-purpose AI in software development.

read full paper

The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on the given prompt descriptions. Their potent capabilities, while beneficial, also present risks. Previous efforts relied on the approach of training binary classifiers to detect the generated fake images, which is inefficient, lacking in generalizability, and non-robust. In this paper, we propose the novel zero-shot detection method, called ZeroFake, to distinguish fake images apart from real ones by utilizing a perturbation-based DDIM inversion technique. ZeroFake is inspired by the findings that fake images are more robust than real images during the process of DDIM inversion and reconstruction. Specifically, for a given image, ZeroFake first generates noise with DDIM inversion guided by adversary prompts. Then, ZeroFake reconstructs the image from the generated noise. Subsequently, it compares the reconstructed image with the original image to determine whether it is fake or real. By exploiting the differential response of fake and real images to the adversary prompts during the inversion and reconstruction process, our model offers a more robust and efficient method to detect fake images without the extensive data and training costs. Extensive results demonstrate that the proposed ZeroFake can achieve great performance in fake image detection, fake artwork detection, and fake edited image detection. We further illustrate the robustness of the proposed ZeroFake by showcasing its resilience against potential adversary attacks. We hope that our solution can better assist the community in achieving the arrival of a more efficient and fair AGI.

read full paper

These summaries have been created with the assistance of ChatGPT.

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

BadMerging: Backdoor Attacks Against Model Merging

Blind Multisignatures for Anonymous Tokens with Decentralized Issuance

DarthShader: Fuzzing WebGPU Shader Translators & Compilers

GRandLine: Adaptively Secure DKG and Randomness Beacon with (Almost) Quadratic Communication Complexity

Helium: Scalable MPC among Lightweight Participants and under Churn

Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

Keeping Up with the KEMs: Stronger Security Notions for KEMs and automated analysis of KEM-based protocols

Membership Inference Attacks Against In-Context Learning

MGTBench: Benchmarking Machine-Generated Text Detection

No Peer, no Cry: Network Application Fuzzing via Fault Injection

Peeking through the window: Fingerprinting Browser Extensions through Page-Visible Execution Traces and Interactions

SeqMIA: Sequential-Metric Based Membership Inference Attack

Skipping the Security Side Quests: A Qualitative Study on Security Practices and Challenges in Game Development

SpecMon: Modular Black-Box Runtime Verification of Security Protocols

The Big Brother’s New Playground: Unmasking the Illusion of Privacy in Web Metaverses from a Malicious User’s Perspective

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns

ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models

Blind Multisignatures for Anonymous Tokens with Decentralized Issuance