How attackers can hijack machine learning models

Machine learning is considered a key technology of artificial intelligence. It is used, for example, in many sensitive areas such as autonomous driving, financial applications, or authentication solutions such as Face ID. With the help of learning algorithms, complex models are now being developed that learn by experience and can thus make predictions and decisions independently. Similar to a person who educates themself by reading many books, a model needs a lot of data input in the training phase to work well later. "Because training data is hard to come by and sometimes millions of data sets are needed for training, it is getting increasingly common for multiple users to work together to train a model with the appropriate data sets. However, merging data from different sources carries risks," Salem says.

"Data poisoning attacks" can be used to manipulate data, and therefore the models, during the training process. "For example, the image of an apple could be labeled as a banana, misleading the model," Salem explains. Such attacks are usually relatively easy to detect because the manipulated datasets look different, and, moreover, the models no longer perform the original task well after being manipulated. It is possible to disguise the manipulation of data sets cleverly. But Ahmed Salem has demonstrated that it is even possible to let the model perform an external task in addition to the original one without the model owners noticing.

Attackers can thus not only hijack the now enormously expensive models and use them for free. The attack can also lead to legal risks for the model owners, who may give away their models for illegal or unethical purposes without realizing it. "Attackers could, for example, manipulate a model to mass-produce toxic comments, which can then be distributed across social media," Salem explains.

According to the researcher, defense mechanisms that are effective against model hijacking attacks certainly already exist. "Unfortunately, these mechanisms also often significantly reduce the function of the models," Salem says. Finding appropriate solutions here that both work against the attacks and interfere as little as possible with the functionality of the models is an exciting field of research, he says.

Ahmed Salem has completed his PhD studies at CISPA and has been working as a postdoc at Microsoft Research in Cambridge since February 2022. In addition to data protection and privacy of machine learning models, he is interested in applied cryptography.

translated by Oliver Schedler