Send email Copy Email Address
2022-08-14

Discovering Significant Patterns under Sequential False Discovery Control

Summary

We are interested in discovering those patterns from data with an empirical frequency that is significantly differently than expec- ted. To avoid spurious results, yet achieve high statistical power, we propose to sequentially control for false discoveries during the search. To avoid redundancy, we propose to update our expect- ations whenever we discover a significant pattern. To efficiently consider the exponentially sized search space, we employ an easy- to-compute upper bound on significance, and propose an effective search strategy for sets of significant patterns. Through an extens- ive set of experiments on synthetic data, we show that our method, Spass, recovers the ground truth reliably, does so efficiently, and without redundancy. On real-world data we show it works well on both single and multiple classes, on low and high dimensional data, and through case studies that it discovers meaningful results.

Conference Paper

ACM International Conference on Knowledge Discovery and Data Mining (KDD)

Date published

2022-08-14

Date last modified

2024-11-15