When considering a data set it is often unknown how complex it is, and hence it is difficult to assess how rich a model for the data should be. Often these choices are swept under the carpet, ignored, left to the domain expert, but in practice this is highly unsatisfactory; domain experts do not know how to set $k$, what prior to choose, or how many degrees of freedom is optimal any more than we do. The Minimum Description Length~(MDL) principle can answer the model selection problem from an intuitively appealing and clear viewpoint of information theory and data compression. In a nutshell, it asserts that the best model is the one that best compresses both the data and that model. It does not only imply the best strategy for model selection, but also gives a unifying viewpoint of designing optimal data mining algorithms for a wide range of issues, and has been very successfully applied to a wide range of data mining tasks, ranging from pattern mining, clustering, classification, text mining, graph mining, anomaly detection, up to causal inference. In this tutorial we give an introduction to the basics of model selection, show important properties of MDL-based modelling, successful examples as well as pitfalls for how to apply MDL to solve data mining problems, but also introduce advanced topics on important new concepts in modern MDL (e.g, normalized maximum likelihood (NML), sequential NML, decomposed NML, and MDL change statistics) and emerging applications in dynamic settings.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)