E-mail senden E-Mail Adresse kopieren
2026

Snor: Simpler Descriptions Through Overlapping Patterns

Zusammenfassung

The pattern explosion is a well-known problem that refers to the humongous numbers of results discovered by traditional pattern mining algorithms. One line of research that combats this, pioneered by Krimp, is to employ the minimum description length (MDL) principle for selecting small sets of characteristic patterns. When Krimp was proposed twenty years ago, it was shown to work well but also had several limitations. Over time its major shortcomings have been addressed, except for one: the cover function still only considers non-overlapping patterns to describe transactions. While efficient, this may result in redundant pattern sets, as we need combinations of patterns rather than just the true patterns to succinctly describe the data. In this paper, we propose Snor, a cover function that enables simpler descriptions by allowing overlapping patterns in the cover of a transaction. Our key observation is that the cover problem is an instance of (weighted) set cover, and we can hence use a greedy method to approximate it. Snor can be straightforwardly combined with existing search algorithms for finding itemset code tables, such as Krimp and Slim. In experiments on 18 benchmark datasets, we demonstrate that Snor leads to simpler descriptions, i.e., allowing overlap leads to smaller code tables while compression stays the same. Further, classification accuracy and computation time remain almost the same.

Kapitel

Veröffentlichungsdatum

2026

Letztes Änderungsdatum

2025-10-29