The automated scientist

When a program crashes, it can take days and nights for programmers to find the problem. Now, researchers at CISPA have developed a robot that acts like a scientist. ALHAZEN observes the error circumstances, generates hypotheses using machine learning (ML) methods and refines and refutes them through experiments until it has determined the error causes fully automatically: "The program crashes because the street name was longer than 48 characters."

To make debugging even more potent in the future, CISPA faculty Prof. Dr. Andreas Zeller and Prof. Dr. Lars Grunske of HU Berlin, both experts in automated software engineering, plan to further develop the ALHAZEN approach in the EMPEROR project. The German Research Foundation (DFG) is funding their work with around half a million euros.

When a computer program shows an error, software developers begin the arduous search for the cause. They observe the program run, formulate hypotheses about the cause of the error, and conduct many experiments to narrow it down more precisely. "This is similar to how scientists also get to the bottom of the causes of natural phenomena - but all these observations and experiments take a lot of time," says Zeller. Together with his doctoral students Alexander Kampmann and Nikolas Havrikov, he is developing automated methods for troubleshooting. ALHAZEN is one of them and, according to Zeller, enormously promising. "How the machine learns from the machine is quite new."

The namesake for this innovative approach to automatic software debugging is a polymath who lived in the 11th century. Alhazen, actually Abu Ali al-Hasan ibn al-Hasan al-Haitham, was primarily concerned with mathematics, optics, and astronomy but was also interested in chemistry, physics, medicine, and music and poetics. He is considered by many to be the founder of the experimental method in science. "We borrowed his name quite unabashedly," says Zeller. "precisely because the tool, like the inventor, takes a scientific, systematic approach." Not only provides ALHAZEN the exact error conditions but also an arbitrarily large number of tests that programmers can use to check whether their program corrections work.

Computer programs work with input. This can be, for example, the classic mouse click of a user, which triggers the execution of a specific task, or an input coming from another computer. During debugging, ALHAZEN compares successful inputs with inputs where the error occurs. The robot then breaks down the program inputs into individual elements and determines features such as the length of an entered password or specific numeric values.

Using an ML algorithm called decision tree learner, the tool then generates so-called decision trees. These models can automatically determine which properties of the input are related to the occurrence of the error. "This is a hypothesis, just like a scientist or a programmer would posit," Zeller said. Then the robot systematically generates more inputs to experiment with to refine or disprove its hypothesis. In this way, ALHAZEN arrives at a theory of why and under what conditions a particular error occurred. "For small programs, ALHAZEN needs between 12 and 100 tests to do this. They take only milliseconds, and a complete analysis is usually done in under a minute," says Zeller.

The machine learning method used in ALHAZEN is relatively simple. In joint research with Professor Grunske, Zeller wants to use more complex ML methods, among other things, and thus improve ALHAZEN even further. After all, his human research colleagues are also constantly evolving.

translated by: Oliver Schedler

Full Paper