What is it that makes an app malicious? One important factor is that malicious apps treat sensitive data differently from benign apps. To capture such differences, we mined 2,866 benign Android applications for their data flow from sensitive sources, and compare these flows against those found in malicious apps. We find that (a) for every sensitive source, the data ends up in a small number of typical sinks; (b) these sinks differ considerably between benign and malicious apps; (c) these differences can be used to flag malicious apps due to their abnormal data flow; and (d) malicious apps can be identified by their abnormal data flow alone, without requiring known malware samples. In our evaluation, our mudflow prototype correctly identified 86.4% of all novel malware, and 90.1% of novel malware leaking sensitive data.
Proceedings of the 37th International Conference on Software Engineering