2024-07-22

Down to earth! Guidelines for DGA-based Malware Detection

Summary

Successful malware campaigns rely on Command-and-Control (C2) infrastructure, enabling attackers to extract sensitive data and give instructions to bots. As a resilient mechanism to obtain C2 endpoints, attackers can employ Domain Generation Algorithms (DGAs), which automatically generate C2 domains instead of relying on static ones. Thus, researchers have proposed network-level detection approaches that reveal DGA usage by differentiating between non-DGA and generated domains. Recent approaches train machine learning (ML) models to recognize DGA domains using pattern recognition at the domain's character level. In this paper, we review network-level DGA detection from a meta-perspective. In particular, we survey 38 DGA detection papers in light of nine popular assumptions that are critical for the approaches to be practical. The assumptions range from foundational ones to assumptions about experiments and deployment of the detection systems. We then revisit if these assumptions hold, showing that most DGA detection approaches operate on a fragile basis. To prevent these issues in the future, we describe the technical security concepts underlying each assumption and indicate best practices for obtaining more reliable results.