2022-02

From Input Coverage to Code Coverage: Systematically Covering Input Structure with k-Paths

Summary

Grammar-based testing uses a given grammar to produce syntactically valid inputs. Intuitively, to cover program features, it is necessary to also cover input features. We present a measure of input coverage called k-path coverage, which takes into account the coverage of individual syntactic elements as well as their combinations up to a given depth k. A k-path coverage with k = 1 prescribes that all individual symbols be covered; k-path coverage with k = 2 dictates that all symbols in the context of all their parents be covered; and so on. Using the k-path measure, we make a number of contributions. (1) We provide an \emph{algorithm for grammar-based production} that constructively covers a given k-path measure. In our evaluation, using k-path during production results in a significantly higher code coverage than state-of-the-art approaches that ignore input coverage. (2) We show on a selection of real-world subjects that coverage of input elements, as measured by k-path, correlates with code coverage. As a consequence, k-path coverage can also be used to predict code coverage. (3) We show that one can learn associations between individual k-path features and coverage of specific locations: "Method `distrule` is invoked whenever both `+` and `*` occur in an expression." Developers can interpret these associations to create suitable inputs that focus on selected methods, or have tools generate inputs that immediately target these methods. The above approaches have been implemented in the \tribble and \codeine prototypes, and evaluated on a number of processors for JSON, CSV, URLs, and Markdown. All tools and data are available as open source.