2022-08-14

Discovering Significant Patterns under Sequential False Discovery Control

Summary

We are interested in discovering those patterns from data with an empirical frequency that is significantly differently than expec- ted. To avoid spurious results, yet achieve high statistical power, we propose to sequentially control for false discoveries during the search. To avoid redundancy, we propose to update our expect- ations whenever we discover a significant pattern. To efficiently consider the exponentially sized search space, we employ an easy- to-compute upper bound on significance, and propose an effective search strategy for sets of significant patterns. Through an extens- ive set of experiments on synthetic data, we show that our method, Spass, recovers the ground truth reliably, does so efficiently, and without redundancy. On real-world data we show it works well on both single and multiple classes, on low and high dimensional data, and through case studies that it discovers meaningful results.