Qualcomm Fellowship for CISPA researcher Dingfan Chen

X-rays, MRI scans, the genomic makeup of a tumor, lab results, and individual medical histories - all of these data play a critical role in the early detection, treatment, and diagnosis of diseases. Studies show that artificial intelligence (AI) can analyze these many types of complex data, in some cases even more reliably, than doctors with decades of experience.

Medicine is not the only area in which AI could have a revolutionary effect. But it is the best example to explain why fundamental change is still a little way off: The dream of a world with fewer diseases and better treatment options stands in contrast to the nightmare of a world without privacy. For machine learning models to be able to support physicians, they must be trained with large amounts of data. Such data is highly sensitive, just as complex - and difficult to obtain so far. That's because the few existing data sets cannot be made publicly available to the research community due to privacy concerns. The good news is that researchers worldwide have long been working on innovative approaches to reconciling progress with privacy protection.

One such researcher is Dingfan Chen. The Ph.D. student sees the artificial production of new data as a possible solution to train models well for their tasks. "The artificial data is based on the purely structural understanding of real datasets and cannot contain identifying features." The artificial representations of original data sets are generated using novel algorithms in so-called deep generative models. Unlike their counterpart, discriminative models, these generative models can, for example, not only decide whether a photo shows a horse but can also produce a real-looking photo of a horse - or, for that matter, a new data set. This process is known as data synthesis. "In fact, privacy guarantees can be incorporated into the data synthesis pipeline, resulting in much more privacy than simply anonymizing data. But unfortunately, with the methods that already exist, we still struggle to produce data that is useful for real-world application scenarios."

This is where the research is proverbially chasing its own tail: "If we want to give a strong theoretical guarantee of privacy compliance for synthesized data, we need a lot of training data for an appropriate data synthesis model. Meanwhile, the more features these data contain about an individual, such as age, gender, or date of birth, the better we can work with them. But of course, more data protection risks are also associated with their analysis and processing. And in turn, it is more difficult for us to generate synthetic data that doesn't violate individuals' privacy."

That's where Dingfan's research comes in. The CISPA researcher plans to approach the problem from three angles. "There are already several approaches to tailor the training objectives for generative models more precisely to privacy-compliant data synthesis. So far, however, this has lacked a unified view of the different types of models, methods, and data modalities. But only when we have that can we systematically explore new architectures and leverage the strengths of the different methods." The second measure that Dingfan says could bring about improvement is to look at what tasks the generators have been designed for so far. "Training the models on complex data sets without limiting their task scope is tough. Therefore, we should clearly define the task and exploit the task-discriminative knowledge." Another improvement step the researcher suggests is to make data from different sources more usable. "So far, little work is done with such data in deep generative models. This is where I want to find new approaches."

The researcher is pleased to receive the Qualcomm Award and the funding for her research project. "The award is a recognition of the value of my research and shows that it is needed. In the best case, my work will lead to further research in different directions. Of course, I may also need the funding, for example, to buy more computing power or to hire collaborators." Her previous research experience makes her confident about the new project, which is initially planned for one year. "I think I can do a good job advancing research in this area."

About the Qualcomm Innovation Fellowship Europe:

The Qualcomm Innovation Fellowship is an annual program focused on recognizing, rewarding, and promoting the most innovative engineering doctoral students in Europe, India, and the United States. The European program rewards excellent young researchers in artificial intelligence and cybersecurity with individual awards of $40,000, dedicated mentors from the Qualcomm Technologies team, and the opportunity to present their work in person to an audience of technical leaders at the company's San Diego headquarters.

translated by Oliver Schedler