Upcoming events

What knowledge bases know (and what they don't)

Simon Razniewski
Joint Lecture Series
04 Oct 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Knowledge bases such as Wikidata YAGO or the Google Knowledge Graph are increasingly used in applications like entity resolution structured search or question answering. The correctness of these knowledge bases is generally well understood and at the center of attention in automated construction techniques. About their completeness in contrast most knowledge is anecdotal: Completeness is usually correlated with popularity of topics yet understanding whether a knowledge base has complete information on a specific topic is often difficult.

In this talk I will introduce the problem of understanding the completeness of knowledge bases. I will present three approaches: (i) data-inherent inferences about completeness using association rule mining (ii) extraction of cardinality information from text sources and (iii) various tools we developed to help knowledge base curators and consumers in mapping and understanding the completeness of knowledge bases.

Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms

Muhammad Bilal Zafar
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
09 Oct 2017, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
Algorithmic decision making systems are increasingly being used to assist or even replace human decision making in a large number of application domains. These systems rely on complex learning methods and vast amounts of training data to optimize prediction accuracy. The accuracy of these systems has been shown to even surpass human accuracy in some applications. However there is a growing concern that algorithmic decision making systems can lead even in the absence of intent to discriminatory outcomes.

Incorporating nondiscrimination goals into the design of algorithmic decision making systems (or classifiers) has proven to be quite challenging. These challenges arise mainly due to (i) the computational complexities involved in incorporating nondiscrimination goals and (ii) inadequacy of existing measures to computationally capture discrimination in certain situations. The goal of this thesis is to tackle these two problems.

First we aim to design mechanisms to incorporate two existing measures of discrimination disparate treatment and disparate impact into the design of well-known classifiers. To this end we introduce a novel and intuitive mechanism of decision boundary covariance. This mechanism can be included into the formulation of any convex boundary-based classifier in the form of convex constraints without increasing the classifier's complexity. It also allows for fine-grained control over the degree of (non)discrimination often at a small cost in terms of accuracy.

Second we propose alternative measures of discrimination that can avoid shortcomings of existing measures in certain situations. Our first proposed measure disparate mistreatment is useful in situations when the validity of historical decisions in the training data can be ascertained. The remaining two measures preferred treatment and preferred impact are useful in situations when feature and class distributions of different groups subject to the decision making are significantly different and can additionally help reduce the cost of nondiscrimination. We extend our decision boundary covariance mechanism and incorporate the newly proposed nondiscrimination measures into the formulations of convex boundary-based classifiers this time as convex-concave constraints. The resulting formulations can be solved efficiently using recent advances in convex-concave programming.

Fine-Grained Complexity: Hardness for a Big Data World

Karl Bringmann
Joint Lecture Series
08 Nov 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
For many computational problems we know some polynomial time algorithm say in quadratic time but it is open whether faster algorithms exist. In a big data world it is essential to close this gap: If the true complexity of the problem is indeed quadratic then it is intractable on data arising in areas such as DNA sequencing or social networks. On such data essentially only near-linear time algorithms are feasible. Unfortunately classic hardness assumptions such as P!=NP are too coarse to explain the gap between linear and quadratic time.

Fine-grained complexity comes to the rescue by providing conditional lower bounds via fine-grained reductions from certain hard core problems. For instance it allowed us to rule out truly subquadratic algorithms for the Longest Common Subsequence problem (used e.g. in the diff file comparison tool) assuming a certain strengthening of P!=NP. In this talk we will further discuss applications to Edit Distance Subset Sum and RNA Folding.

Recent events

Geometric Complexity Theory: An ambitious approach towards P versus NP

Christian Ikenmeyer
Joint Lecture Series
06 Sep 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Computational complexity theory is concerned with the study of the inherent complexity of computational problems. Its flagship conjecture is the famous P versus NP conjecture which is one of the seven Millennium Problems of the Clay Mathematics Institute.

To this day several thousands of computational problems are classified as NP-complete i.e. they have a polynomial time algorithm if and only if P equals NP. The practical importance of the P vs NP conjecture is at least twofold: First of all many NP-complete problems are of high practical relevance and have to be solved every day in commercial and scientific applications. Secondly if NP turned out to be P then this would break all of today's cryptographic ciphers.

Geometric complexity theory is an ambitious program initiated in 2001 by Ketan Mulmuley and Milind Sohoni towards solving the P vs NP problem by using methods from algebraic geometry and representation theory. During the last few years there has been a significant amount of research activity related to this program. In this talk I explain some basic ideas and recent developments.

Combinatorial Constructions for Effective Testing

Filip Niksic
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
14 Aug 2017, 3:00 pm - 4:00 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 029
Testing is the predominant approach to finding bugs and gaining assurance in correctness of concurrent and distributed systems with complex state spaces. In this thesis we use combinatorial methods to develop new techniques for systematic and randomized testing. We answer several theoretical questions about the size of test sets that achieve full coverage and propose a practical tool for testing actor-based programs.

We start with a general construction obtained using the probabilistic method from combinatorics that shows that whenever a random test covers a fixed testing goal with sufficiently high probability there is a small set of random tests that achieves full coverage with high probability. We use this construction in the context of testing distributed systems under network partition faults: our construction explains why random testing tools in this context achieve good coverage--and hence find bugs--quickly.

In the context of testing concurrent programs we introduce a notion of hitting families of schedules. It suffices to test schedules from a hitting family if the goal is to expose bugs of "small depth"--bugs that only depend on the relative ordering of a small number of specific events in the program. We study the size of optimal hitting families and give explicit constructions of small hitting families when the events in the program are partially ordered as a tree. In particular for balanced trees our constructions are polylogarithmic in the number of events.

Finally we propose a testing tool for finding bugs of small depth in actor-based programs. The tool is based on an existing randomized scheduler for shared-memory multithreaded programs. Adapting the scheduler to actor-based programs poses an interesting underlying theoretical problem.

Techniques to enforce security policies on untrusted applications

Anjo Vahldiek-Oberwagner
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
07 Aug 2017, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
As the dependence on ever-present computer systems increases so does the potential harm in case software or hardware deviates from user expectations. Users lose data or find illicitly leaked data. To overcome such inadvertent behavior existing reference monitors fail to (1) protect the confidentiality and integrity of persistent data and (2) efficiently and robustly mediate untrusted applications. In this proposal we present two reference monitors targeting these shortcomings. We demonstrate the design implementation and evaluation of Guardat and ERIM. The policies protecting persistent data and the mechanisms for their enforcement are spread over many software components and configuration files increasing the risk of policy violation due to bugs vulnerabilities and misconfiguration. In Guardat users developers and administrators specify file protection policies declaratively concisely and separate from code and Guardat enforces these policies by mediating I/O in the storage layer. Policy enforcement relies only on the integrity of the Guardat controller and any external policy dependencies. We show experimentally that Guardat overhead is low. While Guardat enforces at the storage layer it cannot enforce policies over in-memory state of untrusted applications. In contrast to existing techniques ERIM efficiently mediates an application's execution by isolating a reference monitor in the same address space. By using Intel Memory Protection Keys in combination with static binary rewriting ERIM isolates the monitor's state from strong malicious adversaries. We propose binary rewriting rules to harden existing executable files and detail use cases in which prior art relied on less robust protection at similar performance.