Automatically Detecting and Mitigating Issues in Program Analyzers
Numair Mansur
Max Planck Institute for Software Systems
25 Jul 2022, 1:00 pm - 2:00 pm
Virtual talk
SWS Student Defense Talks - Thesis Proposal
Recent years have seen tremendous progress in the development and industrial
adoption of
rigorous techniques and tools to automatically detect bugs in software and
ensure its correctness.
Static program analysis is one such technique to automatically reason about the
runtime behavior
of a program without actually running it. It is a powerful approach that can be
used to detect a wide
range of security vulnerabilities and enables program optimization in
compilers, automatic parallelization,
and, if accurate enough, ...
Recent years have seen tremendous progress in the development and industrial
adoption of
rigorous techniques and tools to automatically detect bugs in software and
ensure its correctness.
Static program analysis is one such technique to automatically reason about the
runtime behavior
of a program without actually running it. It is a powerful approach that can be
used to detect a wide
range of security vulnerabilities and enables program optimization in
compilers, automatic parallelization,
and, if accurate enough, correctness verification. Static program analyzers are
tools that implement
program analysis techniques and are used to analyze other programs for flaws.
Decades of research
have yielded the discovery of novel algorithms, data structures, and design
principles that make static
program analysis tools and techniques more precise, scalable, and faster than
ever before.
While the potential benefits of analyzers are obvious, their usability and effectiveness in mainstream software development workflows still often comes into question. This is because the undecidability of the halting problem makes formal reasoning about program properties difficult. Therefore, one common theme in static analysis is that to remain computable, one has to sacrifice either soundness or completeness. This means that, in practice, a static analysis technique might either miss a bug (false negative) or report correct code as having a bug (false positive), might not handle certain language features, or might only report certain types of bugs.
Practical program analyzers have to make tradeoffs (e.g., in soundness, precision, or performance) to maintain a delicate balance between returning the minimum number of false negatives/positives, scaling to very large industrial codebases, being highly automatic, and having minimum overhead for the developers. Designing, implementing, and deploying program analyzers, therefore, is an extremely challenging task. This makes them extremely complicated pieces of software with a high likelihood of having correctness bugs themselves (challenge 1) or tradeoffs not suitable for every piece of code under every usage scenario (challenge 2).
In this dissertation, we focus our attention on improving the state of the art in the above two challenge areas that can prevent typical software development teams from using these tools to their full potential. In the first part of the dissertation, we present algorithms to detect correctness bugs in fundamental program analysis components such as SMT solvers and Datalog engines. Correctness issues in program analyzers can have disastrous consequences e.g. when analyzing software used for electronic voting, financial systems, transportation, or secure communication. In the second part, we introduce techniques to reduce friction in the integration of program analyzers in everyday software development workflows. Smoothly integrating and tuning an advanced program analyzer for a particular codebase under certain resource constraints and different usage scenarios can be a challenging task for software developers, most of whom lack an advanced understanding of these analyzers.
Read more
While the potential benefits of analyzers are obvious, their usability and effectiveness in mainstream software development workflows still often comes into question. This is because the undecidability of the halting problem makes formal reasoning about program properties difficult. Therefore, one common theme in static analysis is that to remain computable, one has to sacrifice either soundness or completeness. This means that, in practice, a static analysis technique might either miss a bug (false negative) or report correct code as having a bug (false positive), might not handle certain language features, or might only report certain types of bugs.
Practical program analyzers have to make tradeoffs (e.g., in soundness, precision, or performance) to maintain a delicate balance between returning the minimum number of false negatives/positives, scaling to very large industrial codebases, being highly automatic, and having minimum overhead for the developers. Designing, implementing, and deploying program analyzers, therefore, is an extremely challenging task. This makes them extremely complicated pieces of software with a high likelihood of having correctness bugs themselves (challenge 1) or tradeoffs not suitable for every piece of code under every usage scenario (challenge 2).
In this dissertation, we focus our attention on improving the state of the art in the above two challenge areas that can prevent typical software development teams from using these tools to their full potential. In the first part of the dissertation, we present algorithms to detect correctness bugs in fundamental program analysis components such as SMT solvers and Datalog engines. Correctness issues in program analyzers can have disastrous consequences e.g. when analyzing software used for electronic voting, financial systems, transportation, or secure communication. In the second part, we introduce techniques to reduce friction in the integration of program analyzers in everyday software development workflows. Smoothly integrating and tuning an advanced program analyzer for a particular codebase under certain resource constraints and different usage scenarios can be a challenging task for software developers, most of whom lack an advanced understanding of these analyzers.