As the world is more and more data driven, the question "what's going on in my data?" has become one of the most important questions in both industry and science – and, not surprisingly, also in cybersecurity. After all, before we can determine whether something is strange, we need to determine what is normal. Before we can trust an algorithm or its result, we need it to be interpretable, explainable, fair and transparent. Before we can release any data, we need guarantees that the data is useful, but also that there are no hidden associations or dependencies that could break privacy or trade secrets. Hence, we work on each of these problems, laying the foundations for trustworthy processing of empirical data.
Patterns and Anomalies. We develop methods that can explain data, and the processes behind it, in easily interpretable terms. In particular, we study how to efficiently extract the most important and statistically significant patterns from data, and how to use these patterns to identify, score, and explain both what is normal and what is strange in the data.
Tracing Back Rumours. We investigate algorithms that given only highly noise partial information are able to reliably trace back what happened in the past; for example to determine who started a fake news rumour, the most likely starting points of an outbreak of an actual virus, or,the most important nodes in a botnet.
Causal Inference. We study one of the most fundamental questions in science: how to determine cause from effect given only empirical data. That is, we develop theory and methods for the grand goal of determining the true causes of a phenomenon, and the conditions that give rise to it, given only a limited dataset that may not include all relevant features. Although this problem is partially impossible, this so far has not withheld us from making large strides towards its solution.