2026-07-06

SoK: A Systematic Review of Integration and Reproducibility of Fuzzing Research into AFL++

Zusammenfassung

Fuzzing has become one of the most effective automated bug discovery techniques. Despite extensive research covering all aspects, it is difficult to assess the actual progress fuzzing has made over the years. In this paper, we present a large-scale empirical analysis of fuzzing progress using AFL++, the state-of-the-art fuzzer that continuously integrates research-driven improvements. Using 645,000 CPU-hours of experiments, we comprehensively measure how fuzzing has improved in terms of code coverage and evaluate the impact of various features over the years. Surprisingly, we find a plateau in exploring new program behavior: while some techniques yield isolated performance gains, overall progress in exercising new coverage has largely stalled. Studying whether our observations generalize to LibAFL and Fuzzilli, we find our observations hold across all three fuzzers. To better understand this stagnation, we complement our empirical study with a survey of 405 peer-reviewed fuzzing papers published between 2018 and 2024 at the leading security and software engineering venues. We identify 60 papers that extend AFL/AFL++ and study how feasible the integration into the baseline is, and if the baseline fuzzer adopted it. Surprisingly, we observe little adoption in practice, with irreproducible results, reliance on complex external dependencies, and limited practical benefit as the main barriers. Discussing our analysis results with the AFL++ maintainers, we find a growing disconnect between academic research and real-world adoption, underscoring the need for stronger reproducibility standards and a more realistic benchmarking of proposed improvements.