News 2020

Research Spotlight: Software Engineering for Machine Learning

Due to the impressive advances in Machine Learning and the unlimited availability of data, neural networks are rapidly becoming prevalent in our everyday lives, for instance by assisting in image-classification or decision-making tasks. As a result, there is growing concern regarding the reliability of neural networks in performing these tasks. In particular, it could be disastrous if an autonomous vehicle misclassifies a street sign, or if a recidivism-risk algorithm, which predicts whether a criminal is likely to re-offend, …
Due to the impressive advances in Machine Learning and the unlimited availability of data, neural networks are rapidly becoming prevalent in our everyday lives, for instance by assisting in image-classification or decision-making tasks. As a result, there is growing concern regarding the reliability of neural networks in performing these tasks. In particular, it could be disastrous if an autonomous vehicle misclassifies a street sign, or if a recidivism-risk algorithm, which predicts whether a criminal is likely to re-offend, is unfair with respect to race.

In the Practical Formal Methods group at MPI-SWS, we have recently focused on applying techniques from Software Engineering, including static analysis and test generation, to validate and verify properties of neural networks, such as robustness and fairness. In the following, we give a brief overview of three research directions we have been pursuing in this setting.

Blackbox Fuzzing of Neural Networks

By now, it is well known that even very subtle perturbations of a correctly classified image, such as a street sign, could cause a neural network to classify the new image differently. Such perturbed images are referred to as adversarial inputs and pose a critical threat to important applications of Machine Learning, like autonomous driving.

In our group, we recently developed DeepSearch [1], a blackbox-fuzzing technique that generates adversarial inputs for image-classification neural networks. Starting from a correctly classified image, DeepSearch strategically mutates its pixels such that the resulting image is more likely to be adversarial. By using spatial regularities of images, DeepSearch is able to generate adversarial inputs, while only querying the neural network very few times, which entails increased performance of our technique. Moreover, through a refinement step, DeepSearch further reduces the already subtle pixel perturbations of an adversarial input.

Adversarial-Input Detection for Neural Networks

To protect neural networks against adversarial inputs, we have developed RAID [2], a runtime-monitoring technique for detecting whether an input to a neural network is adversarial. Our technique consists of training a secondary classifier to identify differences in neuron activation values between correctly classified and adversarial inputs. RAID is effective in detecting adversarial inputs across a wide range of adversaries even when it is completely unaware of the type of adversary. In addition, we show that there is a simple extension to RAID that allows it to detect adversarial inputs even when these are generated by an adversary that has access to our detection mechanism.

Fairness Certification of Neural Networks

Several studies have recently raised concerns about the fairness of neural networks. To list a few examples, commercial recidivism-risk and health-care systems have been found to be racially biased. There is also empirical evidence of gender bias in image searches, for instance when searching for “CEO”. And facial-recognition systems, which are increasingly used in law enforcement, have been found biased with respect to both gender and race. Consequently, it is critical that we design tools and techniques for certifying fairness of neural networks or characterizing their bias.

We make an important step toward meeting these needs by designing the LIBRA static-analysis framework [3] for certifying causal fairness of neural networks used for classification of tabular data. In particular, given input features considered sensitive to bias, a neural network is causally fair if its output classification is not affected by different values of the sensitive features. On a high level, our approach combines a forward and a backward static analysis. The forward pass aims to divide the input space into independent partitions such that the backward pass is able to effectively determine fairness of each partition. For the partitions where certification succeeds, LIBRA provides definite (in contrast to probabilistic) fairness guarantees; otherwise, it describes the input space for which bias occurs. We have designed this approach to be sound and configurable with respect to scalability and precision, thus enabling pay-as-you-go fairness certification.

References

[1] Fuyuan Zhang, Sankalan Pal Chowdhury and Maria Christakis. DeepSearch: Simple and Effective Blackbox Fuzzing of Deep Neural Networks. CoRR abs/1910.06296, 2019.

[2] Hasan Ferit Eniser, Maria Christakis and Valentin Wüstholz. RAID: Randomized Adversarial-Input Detection for Neural Networks. CoRR abs/2002.02776, 2020.

[3] Caterina Urban, Maria Christakis, Valentin Wüstholz and Fuyuan Zhang. Perfectly Parallel Fairness Certification of Neural Networks. CoRR abs/1912.02499, 2019.
Read more

Research Spotlight: Logic and Learning

Software systems have become ubiquitous in our modern world and, consequently, so have bugs and glitches. While many software failures are harmless and often merely annoying, some can have catastrophic consequences. Just imagine the dire results of an autonomous car failing to stop at a red traffic light or a plane's control system becoming unresponsive during takeoff or landing.

In our research, we address these problems and develop intelligent tools that help engineers to build safe and reliable hardware, …
Software systems have become ubiquitous in our modern world and, consequently, so have bugs and glitches. While many software failures are harmless and often merely annoying, some can have catastrophic consequences. Just imagine the dire results of an autonomous car failing to stop at a red traffic light or a plane's control system becoming unresponsive during takeoff or landing.

In our research, we address these problems and develop intelligent tools that help engineers to build safe and reliable hardware, software, and cyber-physical systems. To this end, we employ a unique and promising strategy, which has recently also regained major attention in the artificial intelligence community: combining inductive techniques from the area of machine learning and deductive techniques from the area of mathematical logic (e.g., see the recent Dagstuhl seminar on "Logic and Learning", which was co-organized by one of our group members). Specifically, our research revolves around three topics, to which the remainder of this article is devoted: verification, synthesis, and formal specification languages.

Verification


Verification is an umbrella term referring to tools and techniques that formally prove that a given system satisfies its specification. In the context of software, a popular approach is deductive verification. The idea is easy to describe: first, the given program is augmented with annotations (typically loop invariants, pre-/post-conditions of method calls, and shape properties of data structures), which capture the developer's intent and facilitate the deductive reasoning in a later step; second, the program, together with its annotations, is translated into formulas in a suitable logic, called verification conditions; third, the verification conditions are checked for validity using automated theorem provers. Thanks to brilliant computer scientists, such as Edsger Dijkstra and Tony Hoare, as well as recent advances in constraint solving, the latter two steps can be (almost) entirely automated. However, the first step still remains a manual, error-prone task that requires significant training, experience, and ingenuity. In fact, this is one of the main obstacles preventing a widespread adaptation of formal verification in practice.

To also automate the challenging first step, we have developed a novel approach, called ICE learning [1], which intertwines inductive and deductive reasoning. The key idea is to pit a (deductive) program verifier against an (inductive) learning algorithm, whose goal is to infer suitable annotations from test-runs of the program and failed verification attempts. The actual learning proceeds in rounds. In each round, the learning algorithm proposes candidate annotations based on the data it has gathered so far. The program verifier then tries to prove the program correct using the proposed annotations. If the verification fails, the verifier reports data back to the learning algorithm explaining why the verification has failed. Based on this new information, the learning algorithm refines its conjecture and proceeds to the next round. The loop stops once the annotations are sufficient to prove the program correct.

ICE learning has proven to be a very powerful approach that allows fully automatic verification of a wide variety of programs, ranging from recursive and concurrent programs over numeric data types [1], to algorithms manipulating dynamically allocated data structures [2], to industry-size GPU kernels [3]. In addition, the principles underlying ICE learning can be lifted to other challenging verification tasks, such as the verification of parameterized systems [4] as well as—in ongoing research—to the verification of cyber-physical and AI-based systems. You might want to try a demo immediately in your browser.

Synthesis


Synthesis goes beyond verification and could be considered the holy grail of computer science. In contrast to checking whether a hand-crafted program meets its specification, the dream is to fully automatically generate software (or a circuit for that matter) from specifications in a correct-by-construction manner.

Although this dream is unrealistic in its whole generality, there exist various application domains in which automated synthesis techniques have been applied with great success. In our own research, for instance, we have developed techniques for synthesizing safety controllers for reactive, cyber-physical systems that have to interact with a complex–and perhaps only partially known–environment [5, 6]. Moreover, we have proposed a general framework for generating loop-free code from input-output examples and specifications written in first-order logic [7]. Similar to ICE learning, these methods combine inductive and deductive reasoning, thereby unveiling and exploiting synergies of modern machine learning algorithms and highly-optimized symbolic reasoning engines.

Formal Specification Languages


Both verification and synthesis rely on the ability to write correct formal specifications, which have to precisely capture the engineer’s intuitive understanding of the system in question. In practice, however, formalizing the requirements of a system is notoriously difficult, and it is well known that the use of standard formalisms such as temporal logics requires a level of sophistication that many users might never develop.

We have recently started a new research project to combat this serious obstacle. Its main objective is to design algorithms that learn formal specifications in interaction with human engineers. As a first step towards this goal, we have developed a learning algorithm for the specification language “Linear Temporal Logic (LTL)”, which is the de facto standard in many verification and synthesis applications. You might think of this algorithm as a recommender system for formal specifications: the human engineer provides examples of the desired and undesired behavior of the system in question, while the recommender generates a series of LTL specifications that are consistent with the given examples; the engineer can then either chose one of the generated specifications or provide additional examples and rerun the recommender.

In ongoing research, we are extending our learning algorithm to a wide range of other specification languages, including Computational Tree Logic, Signal Temporal Logic, and the Property Specification Language. Moreover, we are developing feedback mechanisms that allow for a tighter integration of the human engineer into the loop. Again, you can try our technology immediately in your browser.

References


[1] D’Souza, Deepak; Ezudheen, P.; Garg, Pranav; Madhusudan, P.; Neider, Daniel: Horn-ICE Learning for Synthesizing Invariants and Contracts. In: Proceedings of the ACM on Programming Languages (PACMPL), volume 2 issue OOPSLA, pages 131:1–131:25. ACM, 2018.

[2] Neider, Daniel; Madhusudan, P.; Garg, Pranav; Saha, Shambwaditya; Park, Daejun: Invariant Synthesis for Incomplete Verification Engines. In: 24th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2018), volume 10805 of Lecture Notes in Computer Science, pages 232–250. Springer, 2018

[3] Neider, Daniel; Saha, Shambwaditya; Garg, Pranav; Madhusudan, P.: Sorcar: Property-Driven Algorithms for Learning Conjunctive Invariants. In: 26th International Static Analysis Symposium (SAS 2019), volume 11822 of Lecture Notes in Computer Science, pages 323–346. Springer, 2019

[4] Neider, Daniel; Jansen, Nils: Regular Model Checking Using Solver Technologies and Automata Learning. In: 5th International NASA Formal Method Symposium (NFM 2013), volume 7871 of Lecture Notes in Computer Science, pages 16–31. Springer, 2013

[5] Neider, Daniel; Topcu, Ufuk: An Automaton Learning Approach to Solving Safety Games over Infinite Graphs. In: 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), volume 9636 of Lecture Notes in Computer Science, pages 204–221. Springer, 2016

[6] Neider, Daniel; Markgraf, Oliver: Learning-based Synthesis of Safety Controllers. In: 2019 International Conference on Formal Methods in Computer Aided Design (FMCAD 2019), pages 120–128. IEEE, 2019

[7] Neider, Daniel; Saha, Shambwaditya; Madhusudan, P.: Synthesizing Piece-wise Functions by Learning Classifiers. In: 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), volume 9636 of Lecture Notes in Computer Science, pages 186–203. Springer, 2016

[8] Neider, Daniel; Gavran, Ivan: Learning Linear Temporal Properties. In: 2018 International Conference on Formal Methods in Computer Aided Design (FMCAD 2018), pages 148–157. IEEE, 2018
Read more