Program failures are often caused by faulty inputs (e.g. due to data corruption). When an input induces failure, one needs to debug the input data, i.e. isolate faults to obtain valid input data. Typically, debuggers focus on diagnosing faults in the program, rather than the input. This talk instead presents an approach that automatically repairs faults in the input data, without requiring program analysis. In addition, we present empirical data on the causes and prevalence of invalid inputs in practice, we found that four percent of inputs in the wild are invalid. We present a general-purpose algorithm called ddmax that automatically isolates faults in invalid inputs and recovers the maximal valid input data. The aim of ddmax is to (1) identify which parts of the input data prevent processing by the program, and (2) recover as much of the (valuable) input data as possible. Given a program and an invalid input, through experiments, ddmax recovers and repairs as much data as possible. The difference between the original failing input and the “maximized” passing input includes all input fragments that could not be processed, i.e. the fault. This approach is useful for automatically debugging and repairing invalid inputs.
Software Engineering (SE)
2021-02-22
2024-10-08