2019-03-01

Approximating Algorithmic Conditional Independence for Discrete Data

Summary

Testing for conditional independence is a core part of constraint-based causal discovery. It is mainly used to find associations, but can in special cases also be applied to infer causal arrows. We specifically focus on discrete data. Although commonly used tests are perfect in theory, due to data sparsity they often fail to reject independence in practice—especially when conditioning on multiple variables. We propose a new conditional independence test based on the notion of algorithmic independence. To instantiate this ideal formulation in practice, we use stochastic complexity. We show that our proposed test SCI is an asymptotically unbiased estimator for conditional mutual information (CMI) as well as L2 consistent. Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well given only limited data. Empirical evaluation shows that SCI has a lower type II error than commonly used tests, which leads to a higher recall when we use it in causal discovery algorithms; without compromising the precision.