Our new Faculty: Dr. Adam Dziedzic, expert in trustworthy machine learning

"There are two approaches to research: One is to be an expert in a particular field and use your expertise, which acts as a kind of hammer, to hit the nails you find," explains Adam Dziedzic. By nails, he means the problems in the field of machine learning (ML) that are currently popping up like mushrooms due to rapid developments. "The SprintML research group has a different motivation. We want to know: What are the main problems of our time? And then we look for solutions to them. My ultimate goal is that users of ML models can really trust them," says the researcher. Since September 1st 2023, Dziedzic has been a senior scientist at CISPA, addressing almost all the major challenges of machine learning with his research: privacy, confidentiality, robustness, interpretability, security, and the supreme discipline of trustworthiness, which subsumes all the other challenges. What sounds rather simple – enabling trustworthy artificial intelligence – almost requires fakir skills, to use the same metaphor. For AI to be trustworthy, we have to understand its decisions, it has to act fairly, make precise predictions, and may not disclose secret data.

Trust needs a solid foundation

“In order to approach our goal, our work is currently based on three main pillars: The protection of privacy, confidentiality, and the robustness of the models," says Dziedzic. According to the researcher, many of today's ML models are already amazingly powerful: They are able to learn from large data sets without human supervision or prior labeling of the data, recognize patterns in them and later solve a wide variety of downstream tasks. Anyone who has ever interacted with chatbots like ChatGPT knows that there are hardly any questions to which the machine does not have an answer. Even to the most sensitive ones. As a result, many people unintentionally reveal a lot about themselves through the inputs they provide in the chat. "This is one of my major research topics: the security of data when using chatbots," says Dziedzic. However, this is by far not the only topic that drives the scientist. For ML models to be trustworthy, in addition to the protection of data when it is used, the integrity and origin of the models is also important. Therefore, Dziedzic is also concerned with the question of how to prevent ML models from being copied or manipulated. "The developers of ML models often invested a lot of time, money, computing power, and enormous amounts of data to train their models. Unfortunately, those models are frequently stolen. Either for the purpose of using the model without monetary or computational effort or in order to carry out further attacks on it by means of precise knowledge of the model," explains Dziedzic.

According to the researcher, a number of defense mechanisms for theft attacks already exist for models that are only trained on data that has previously been labeled by humans. In contrast, so-called "self-supervised models", that is, models that can also learn from unlabeled data, have often been fair game for attackers. "We have therefore recently presented a new approach that can read the data set used for training like a kind of signature and thus allows conclusions about theft and copies," Dziedzic explains. He considers this approach very promising for better protecting the integrity of models in the future.

The robustness of the models

Now let's move on to the third pillar: the robustness of the models. "This refers primarily to their stability in the event of attacks and data manipulation." At the very latest, when artificial intelligence is used to tackle major tasks such as diagnosing diseases, we will no longer be able to tolerate its errors. Currently, Dziedzic is particularly interested in the robustness of so-called collaborative machine learning. Collaborative learning is the umbrella term for various ML learning methods. They are all designed to collaborate with several machines in order to achieve better results. "Just as doctors communicate with colleagues to improve their diagnostic skills, ML models likewise learn better when working together." That said, collaborative learning, which often involves sharing training data, also poses risks, especially in highly sensitive fields such as healthcare: "Private data could accidentally be leaked between learners, or the misjudgments of some participants could affect the robustness of the joint predictions," the researcher explains. Previous approaches to address these risks have often only focused on individual problem areas or have led to a significant loss of model performance.

Dziedzic wants to enable collaborative learning that is effective and trustworthy

"It is our goal to take collaborative learning to a whole new level. Instead of training models from scratch, we want to use so-called Open Foundation Models (OFM) as a basis." OFMs are large pre-trained ML models that are extremely good at extracting features from raw data, which can then be processed by the downstream adaptation. A few tweaks is all it takes to get them ready for special fields of application. To make this work, we need methods that allow us to provide data protection guarantees for such collaborative models. Moreover, we want to ensure that private information is not inadvertently leaked by the models and improve the robustness of such models against attacks. This holds great potential for the use of AI in medicine, for example." An ambitious project that the researcher will be working on together with his research group SprintML Lab. He is co-leading this research group together with Dr. Franziska Boenisch, who is also a Faculty at CISPA.

Dziedzic’s professional background

Before joining CISPA as a Faculty, Dziedzic was a postdoctoral fellow at the Vector Institute and the University of Toronto. There he was a member of the CleverHans Lab and completed his PhD at the University of Chicago. Dziedzic received his Bachelor's and Master's degrees from the Warsaw University of Technology. He also studied at the Technical University of Denmark and conducted research at the EPFL in Switzerland. After that, he worked at CERN in Geneva, at Barclays Investment Bank in London, Microsoft Research (Redmond, USA), and Google (Madison, USA). According to him, there were many reasons why he decided to join CISPA last year: "One is that CISPA is growing fast and is still in the development phase. I'm looking forward to contributing to the center's growth and success story. On top of that, CISPA is already one of the world's leading centers for security and trustworthiness, especially in the field of machine learning. CISPA is a great environment for us researchers to flourish. This can be seen in the development of our SprintML Lab research group. It is growing well and we have excellent applicants. CISPA's good reputation allows us to find ambitious students. It is a real pleasure to work with them."

If you are interested in Adam's research and related topics, check out his website: https://adam-dziedzic.com/ and the website of the SprintML group: https://sprintml.com/