We are interested in discovering those patterns from data with an empirical frequency that is significantly differently than expec- ted. To avoid spurious results, yet achieve high statistical power, we propose to sequentially control for false discoveries during the search. To avoid redundancy, we propose to update our expect- ations whenever we discover a significant pattern. To efficiently consider the exponentially sized search space, we employ an easy- to-compute upper bound on significance, and propose an effective search strategy for sets of significant patterns. Through an extens- ive set of experiments on synthetic data, we show that our method, Spass, recovers the ground truth reliably, does so efficiently, and without redundancy. On real-world data we show it works well on both single and multiple classes, on low and high dimensional data, and through case studies that it discovers meaningful results.
ACM International Conference on Knowledge Discovery and Data Mining (KDD)
2022-08-14
2024-12-27