2023-05-05

Sentence Embedding Encoders are Easy to Steal but Hard to Defend

Summary

Self-supervised learning (SSL) has become the predominant approach to training on large amounts of data when no labels are available. Since the corresponding model architectures are usually large, the training process is, in itself, costly, and training relies on dedicated expensive hardware. As a consequence, not every party can train such models from scratch. Instead, new APIs offer paid access to pre-trained SSL models. We consider transformer-based SSL sentence encoders and show that they can be efficiently extracted (stolen) from behind these APIs through black-box query access. Our stealing requires down to 40x fewer queries than the number of the victim's training data points and much less computation. This large gap between low attack costs and high victim training costs strongly incentivizes attackers to steal encoders. To protect the transformer-based sentence encoders against stealing, we propose to embed secret downstream tasks to their training which serve as watermarks. In general, our work highlights that sentence embedding encoders are easily stolen but hard to defend.