Due to the impressive advances in Machine Learning and the unlimited availability of data, neural networks are rapidly becoming prevalent in our everyday lives, for instance by assisting in image-classification or decision-making tasks. As a result, there is growing concern regarding the reliability of neural networks in performing these tasks. In particular, it could be disastrous if an autonomous vehicle misclassifies a street sign, or if a recidivism-risk algorithm, which predicts whether a criminal is likely to re-offend,…
Due to the impressive advances in Machine Learning and the unlimited availability of data, neural networks are rapidly becoming prevalent in our everyday lives, for instance by assisting in image-classification or decision-making tasks. As a result, there is growing concern regarding the reliability of neural networks in performing these tasks. In particular, it could be disastrous if an autonomous vehicle misclassifies a street sign, or if a recidivism-risk algorithm, which predicts whether a criminal is likely to re-offend, is unfair with respect to race.
In the Practical Formal Methods group at MPI-SWS, we have recently focused on applying techniques from Software Engineering, including static analysis and test generation, to validate and verify properties of neural networks, such as robustness and fairness. In the following, we give a brief overview of three research directions we have been pursuing in this setting.
Blackbox Fuzzing of Neural Networks
By now, it is well known that even very subtle perturbations of a correctly classified image, such as a street sign, could cause a neural network to classify the new image differently. Such perturbed images are referred to as adversarial inputs and pose a critical threat to important applications of Machine Learning, like autonomous driving.
In our group, we recently developed DeepSearch , a blackbox-fuzzing technique that generates adversarial inputs for image-classification neural networks. Starting from a correctly classified image, DeepSearch strategically mutates its pixels such that the resulting image is more likely to be adversarial. By using spatial regularities of images, DeepSearch is able to generate adversarial inputs, while only querying the neural network very few times, which entails increased performance of our technique. Moreover, through a refinement step, DeepSearch further reduces the already subtle pixel perturbations of an adversarial input.
Adversarial-Input Detection for Neural Networks
To protect neural networks against adversarial inputs, we have developed RAID , a runtime-monitoring technique for detecting whether an input to a neural network is adversarial. Our technique consists of training a secondary classifier to identify differences in neuron activation values between correctly classified and adversarial inputs. RAID is effective in detecting adversarial inputs across a wide range of adversaries even when it is completely unaware of the type of adversary. In addition, we show that there is a simple extension to RAID that allows it to detect adversarial inputs even when these are generated by an adversary that has access to our detection mechanism.
Fairness Certification of Neural Networks
Several studies have recently raised concerns about the fairness of neural networks. To list a few examples, commercial recidivism-risk and health-care systems have been found to be racially biased. There is also empirical evidence of gender bias in image searches, for instance when searching for “CEO”. And facial-recognition systems, which are increasingly used in law enforcement, have been found biased with respect to both gender and race. Consequently, it is critical that we design tools and techniques for certifying fairness of neural networks or characterizing their bias.
We make an important step toward meeting these needs by designing the LIBRA static-analysis framework  for certifying causal fairness of neural networks used for classification of tabular data. In particular, given input features considered sensitive to bias, a neural network is causally fair if its output classification is not affected by different values of the sensitive features. On a high level, our approach combines a forward and a backward static analysis. The forward pass aims to divide the input space into independent partitions such that the backward pass is able to effectively determine fairness of each partition. For the partitions where certification succeeds, LIBRA provides definite (in contrast to probabilistic) fairness guarantees; otherwise, it describes the input space for which bias occurs. We have designed this approach to be sound and configurable with respect to scalability and precision, thus enabling pay-as-you-go fairness certification.
 Fuyuan Zhang, Sankalan Pal Chowdhury and Maria Christakis. DeepSearch: Simple and Effective Blackbox Fuzzing of Deep Neural Networks. CoRR abs/1910.06296, 2019.
 Hasan Ferit Eniser, Maria Christakis and Valentin Wüstholz. RAID: Randomized Adversarial-Input Detection for Neural Networks. CoRR abs/2002.02776, 2020.
 Caterina Urban, Maria Christakis, Valentin Wüstholz and Fuyuan Zhang. Perfectly Parallel Fairness Certification of Neural Networks. CoRR abs/1912.02499, 2019.