„If you control the input as well as the output, you control the program.“
In the programming of software, errors are frequent occurrences. To prevent these from causing system crashes or security gaps, researchers often test their programs prior to release with the help of so-called fuzzers. „These tools produce huge numbers of random inputs to see how a program will perform in live operation. However, it is difficult to produce inputs that are capable of testing the deeper programming functions“, Steinhöfel explains.
The basic structures of the data languages spoken by computers and used to phrase programming input resemble those of human languages. Which is why not only grammar, but also semantics plays an important role in all of this. Using a striking example, US-American linguistic Noam Chomsky illustrated the difference between the two in the 1950s: „Colorless green ideas sleep furiously.“ While this sentence is syntactically impeccable, it is semantically incorrect. The grammatical structure may be perfectly fine, but still the sentence makes no sense at all.
The thing about semantics
When a fuzzer produces an input that is grammatically correct but that contains no meaningful message for the program being tested, the input is rejected by the parser. The parser is a sub-program that checks if the input is intelligible for the program. If this is the case, the parser converts it into a format that is adequate for processing. If the input is unintelligible, however, the parser produces an error message and disregards it entirely. „With inputs like this, you can only test the quality of the parser but not the stability of the program itself“, Steinhöfel explains. There are fuzzers that produce smarter inputs and thus circumnavigate the parser. „This is where the process often ends because this is also where the more complicated characteristics at the semantic level come in.“
New specification language is key
ISLa, Steinhöfel’s new specification language, can become a game changer in this context. „ISLa allows us to understand inputs with a precision that was previously unknown and hence to test programs deeply and thoroughly.“ According to Steinhöfel, the key lies in a very general formalism that makes almost any program accessible. „But we do need an input description. We can write it manually or else learn it from an existing program.“ This however is complicated and oftentimes only possible in an approximate manner. „There will always be programs that are too large or too complicated to be understood completely. But we can continue to become better at it.“
For this, ISLa is a powerful tool: Not only can it generate inputs, it can also test, repair and mutate them. What is more, ISLa allows researchers to describe a program’s output. „If we can describe the input as well as the output, we can describe the behavior of the entire program. This allows us to do an awful lot: We can determine how a program is meant to behave, we can analyze how it does behave and we can force it to behave it in the way we want it. In short: If you control the input as well as the output, you control the program.“ CISPA-faculty Andreas Zeller, with whom Steinhöfel has closely collaborated, emphasizes the importance of Steinhöfel’s research: „ISLa opens up entire worlds for the testing of systems.“
From theory to practice
In the near future, Steinhöfel will draw on the basis provided by ISLa to develop practical approaches for the testing of relevant software systems. Among other things, he will focus on the learning of complex input and output descriptions and concentrate on state-based systems such as databases and servers. Further, he intends to ascertain whether it is possible to combine already established testing methodologies.
A PostDoc researcher in the research group of CISPA-faculty Professor Dr Andreas Zeller, Dominic Steinhöfel earned his degree as well as his PhD from TU Darmstadt. „Without knowing exactly where it would lead me, I have been working towards ISLa since 2021.“ In these words, the pride in his achievement is shining through. And rightly so.
ISLa plays an important role in the research project "S3 - Semantics of Software Systems". If you'd like to know more about the S3 project, which is funded by a generous ERC Grant, please follow this link.