How devices learn together from distributed data

"Federated learning, in which many devices jointly train a machine learning (ML) model, offers enormous advantages compared to centralized approaches in terms of data protection and the required computing power," says Dr. Sebastian Stich. A new faculty member at CISPA, Stich's work currently focuses on optimization algorithms that can further improve the performance and security of federated learning. The mathematician explains how it works and why this field of research is so exciting.

We all incessantly produce data collected by the smartphones in our pockets and the smartwatches on our wrists: what our pulse is when we climb stairs, what our blood pressure is, or the oxygen level in our blood. With ML methods, we could effectively analyze these vast amounts of data and thus enormously improve, for example, the diagnosis, early detection, and treatment of diseases. However, securely combining large volumes of data for processing on a central server poses not only technical problems but also, and above all, data protection issues. And that's where federated learning comes in: "In federated learning, the collected data sets remain stored on the devices and are not shared. Instead, the devices evaluate the data locally and use the results to jointly train a centrally stored ML model," Stich explains. This technology is already widely used; Google, for example, is working with federated learning to improve the autocorrect function of its keyboards. But how does ML actually work and why does it facilitate the analysis of collected data?

"First, a model must be created that can use mathematical functions to express a prediction, calculate probabilities, or identify correlations, for example. ML algorithms can independently adjust the parameters of the model so that the models can learn and transfer patterns from the sample data," the 36-year-old explains.

With federated learning, he says, an additional challenge is that communication between the small models on each device and the central model, which is trained together, is costly. Trying to keep communication as low as possible often results in the problem known as overfitting, he said. "To keep the communication overhead low, the models exchange information only once a day. But if the models only ever see their own data, then it fits too closely to that, but often no longer properly to the other data in the network," says Stich, who addresses this problem and many other federal learning issues in his research. In his paper "SCAFFOLD: Stochastic Controlled Averaging for Federated Learning," accepted at the prestigious International Conference on Machine Learning (ICML), he proposed an improvement to the Federated Averaging (FedAvg) algorithm commonly used in federated learning. "With some kind of correction to the trend line in the data analysis on the local devices, you can make sure that the devices stay reasonably synchronized."

Another ongoing research topic for the mathematician is how to compress messages sent back and forth between models to reduce communication overhead. "I'm sure that federated learning will continue to be an issue in the future. But I imagine there will be a lot more to come in terms of algorithms."

The Swiss native moved to CISPA from EPFL Lausanne in December and sees his research at the intersection of mathematics, computer science, and statistics. "Working at CISPA is so interesting for me because I can devote myself almost exclusively to my research here. Some of my new colleagues are working on decentralized systems like me and have very different backgrounds. Thus, I can imagine an exciting collaboration.

translated by Oliver Schedler