Events 2017

Formal and Compositional Reasoning about Object Capabilities

David Swasey Max Planck Institute for Software Systems
08 Dec 2017, 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
In scenarios such as web programming, where code is linked together from multiple sources, object capability patterns (OCPs) provide an essential safeguard, enabling programmers to protect the private state of their objects from corruption by unknown and untrusted code. However, the benefits of OCPs in terms of security and program verification are barely understood. Our proposed thesis bridges this gap, advancing our understanding of the security and functional properties of OCP-based programs, systems, and programming language implementations. ...
In scenarios such as web programming, where code is linked together from multiple sources, object capability patterns (OCPs) provide an essential safeguard, enabling programmers to protect the private state of their objects from corruption by unknown and untrusted code. However, the benefits of OCPs in terms of security and program verification are barely understood. Our proposed thesis bridges this gap, advancing our understanding of the security and functional properties of OCP-based programs, systems, and programming language implementations.

Building on the recently developed Iris framework for separation logic, we propose OCPL, the first formal system for compositionally specifying and verifying OCPs in a simple but representative language with closures and mutable state. The key idea of OCPL is to account for the boundary between verified and untrusted code by adopting a well-known idea from the literature on security protocol verification, namely robust safety. Programs that export only properly wrapped values to their environment can be proven robustly safe, meaning that their untrusted environment cannot violate their internal invariants. Using OCPL, we can give the first general, compositional, and machine-checked specs for several commonly-used OCPs—including the dynamic sealing, membrane, and caretaker patterns—which we can then use to verify robust safety for representative client code.

A second apsect of our thesis is that separation logic and robust safety scale to reasoning about the far more sophisticated OCPs arising in practical systems. We propose to model the Firefox web browser and its script interpreter, which uses an automatic security membrane to enforce the web's same-origin policy. After extending OCPL to account for the browser and the membrane-aware scripts it interprets, we can formalize how, under certain trust assumptions, Firefox supports the illusion (critical to most script authors) of "normal object reasoning" for membrane-unaware scripts.

The final element in our thesis is that robust safety and separation logic ease the burden of developing certified programming language implementations. To benefit from OCPs, a programmer must choose a language implementation that preserves the functional and security properties of her capability-wrapped code. We propose to develop a correct and secure compiler (targeting WebAssembly, an interesting new language for running low-level code in web browsers) based on the ideas of preserving OCPL specifications and robust safety. This is meaningful because a compiler focused on such concrete goals is easier to certify and can generate more efficient code, compared to compilers organized around generic notions of correctness and security.
Read more

Fundamental Algorithmic Problems and Challenges in Dynamical and Cyber-Physical Systems

Joël Ouaknine Max Planck Institute for Software Systems
06 Dec 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Dynamical and cyber-physical systems, whose continuous evolution is subject to differential equations, permeate vast areas of engineering, physics, mathematics, and computer science. In this talk, I consider a selection of fundamental algorithmic problems for such systems, such as reachability, invariant synthesis, and controllability. Although the decidability and complexity of many of these problems are open, some partial and conditional results are known, occasionally resting on certain number-theoretic hypotheses such as Schanuel’s conjecture. More generally, the study of algorithmic problems for dynamical and cyber-physical systems draws from an eclectic array of mathematical tools, ...
Dynamical and cyber-physical systems, whose continuous evolution is subject to differential equations, permeate vast areas of engineering, physics, mathematics, and computer science. In this talk, I consider a selection of fundamental algorithmic problems for such systems, such as reachability, invariant synthesis, and controllability. Although the decidability and complexity of many of these problems are open, some partial and conditional results are known, occasionally resting on certain number-theoretic hypotheses such as Schanuel’s conjecture. More generally, the study of algorithmic problems for dynamical and cyber-physical systems draws from an eclectic array of mathematical tools, ranging from Diophantine approximation to algebraic geometry. I will present a personal and select overview of the field and discuss areas of current active research.
Read more

Blocking Analysis of Spin Locks under Partitioned Fixed-Priority Scheduling

Alexander Wieder Max Planck Institute for Software Systems
29 Nov 2017, 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Partitioned fixed-priority scheduling is widely used in embedded multicore real-time systems. In multicore systems, spin locks are one well-known technique used to synchronize conflicting accesses from different processor cores to shared resources (e.g., data structures). The use of spin locks can cause blocking. Accounting for blocking is a crucial part of static analysis techniques to establish correct temporal behavior.

In this thesis, we consider two aspects inherent to the partitioned fixed-priority scheduling of tasks sharing resources protected by spin locks: (1) the assignment of tasks to processor cores to ensure correct timing, ...
Partitioned fixed-priority scheduling is widely used in embedded multicore real-time systems. In multicore systems, spin locks are one well-known technique used to synchronize conflicting accesses from different processor cores to shared resources (e.g., data structures). The use of spin locks can cause blocking. Accounting for blocking is a crucial part of static analysis techniques to establish correct temporal behavior.

In this thesis, we consider two aspects inherent to the partitioned fixed-priority scheduling of tasks sharing resources protected by spin locks: (1) the assignment of tasks to processor cores to ensure correct timing, and (2) the blocking analysis required to derive bounds on the blocking.

Heuristics commonly used for task assignment fail to produce assignments that ensure correct timing when shared resources protected by spin locks are used. We present an optimal approach that is guaranteed to find such an assignment if it exists (under the original MSRP analysis). Further, we present a well-performing and inexpensive heuristic.

For most spin lock types, no blocking analysis is available in prior work, which renders them unusable in real-time systems. We present a blocking analysis approach that supports eight different types and is less pessimistic than prior analyses, where available. Further, we show that allowing nested requests for FIFO- and priority-ordered locks renders the blocking analysis problem NP-hard.
Read more

Formal Proof of Polynomial-Time Complexity with Quasi-Interpretations

Hugo Férée University of Kent
22 Nov 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
Formal reasoning about computational complexity turns out to be quite cumbersome as it often requires to manipulate explicit time bounds for a specific machine model. Implicit Computational Complexity is a research area which provides techniques and complexity classes characterisations to avoid this. We have formally verified the soundness (and part of the completeness) of such a technique − called quasi-interpretations − using the Coq proof assistant. In particular, we turned this into a tool to help guarantee the polynomial complexity of programs (here, ...
Formal reasoning about computational complexity turns out to be quite cumbersome as it often requires to manipulate explicit time bounds for a specific machine model. Implicit Computational Complexity is a research area which provides techniques and complexity classes characterisations to avoid this. We have formally verified the soundness (and part of the completeness) of such a technique − called quasi-interpretations − using the Coq proof assistant. In particular, we turned this into a tool to help guarantee the polynomial complexity of programs (here, term rewriting systems).
Read more

Understanding & Controlling User Privacy in Social Media via Exposure

Mainack Mondal Max Planck Institute for Software Systems
20 Nov 2017, 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
The recent popularity of Online Social Media sites (OSM) like Facebook and Twitter have led to a renewed discussion about user privacy. In fact, numerous recent news reports and research studies on user privacy stress the OSM users' urgent need for better privacy control mechanisms. Thus, today, a key research question is: how do we provide improved privacy protection to OSM users for their social content? In my thesis, we propose a systematic approach to address this question. ...
The recent popularity of Online Social Media sites (OSM) like Facebook and Twitter have led to a renewed discussion about user privacy. In fact, numerous recent news reports and research studies on user privacy stress the OSM users' urgent need for better privacy control mechanisms. Thus, today, a key research question is: how do we provide improved privacy protection to OSM users for their social content? In my thesis, we propose a systematic approach to address this question.

We start with the access control model, the dominant privacy model in OSMs today. We show that, while useful, the access control model does not capture many theoretical and practical aspects of privacy. Thus, we propose a new model, which we term the exposure control model. We define exposure for a piece of content as the set of people who actually view the content. We demonstrate that our model is a significant improvement over access control to capture users' privacy requirements. Next, we investigate the effectiveness of our model to protect users' privacy in three real world scenarios: (1) Understanding and controlling exposure using social access control lists (SACLs) (2) Controlling exposure by limiting large-scale social data aggregators and (3) Understanding and controlling longitudinal exposure in OSMs, i.e., how users control exposure of their old OSM content. We show that, in each of these cases, the exposure control-based approach helps us to design improved privacy control mechanisms.
Read more

Program Logic for Weak Memory Concurrency

Marko Doko Max Planck Institute for Software Systems
17 Nov 2017, 3:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
Most of the verification techniques for concurrency assume the sequentially consistent memory model, i.e., the executions of a concurrent program consist of all the possible interleavings of the instructions of its threads. On the other hand, in order to improve performance or conserve energy, modern hardware implementations give us what is known as weak memory models; that is, models of concurrency that allow some executions which cannot be explained by sequential consistency. Moreover, modern programming languages provide their own language-level memory models which strive to allow all the behaviors allowed by the various hardware-level memory models and, ...
Most of the verification techniques for concurrency assume the sequentially consistent memory model, i.e., the executions of a concurrent program consist of all the possible interleavings of the instructions of its threads. On the other hand, in order to improve performance or conserve energy, modern hardware implementations give us what is known as weak memory models; that is, models of concurrency that allow some executions which cannot be explained by sequential consistency. Moreover, modern programming languages provide their own language-level memory models which strive to allow all the behaviors allowed by the various hardware-level memory models and, additionally, to allow some behaviors that can occur as a result of desired compiler optimizations.

These weak memory models are often quite intricate, and it can be quite difficult for programmers to be aware of all the possible behaviors of their programs. For this reason, it is very useful to have an abstraction layer over the model, so that we can make sure our programs are correct without understanding all the messy details of the underlying memory model. Program logics are one way of constructing such an abstraction—they can present us with clean and simple syntactical rules that we can use to reason about our programs, while not requiring any understanding of the memory model for which the logic has been proven sound.

In this thesis I develop FSL, a logic for reasoning about programs under the memory model introduced by the 2011 standards for C and C++ programming languages. FSL provides a set of clean and simple rules that abstract the intricacies of the underlying model, while still being powerful enough to reason about algorithms that appear in practice.

The main novelty of FSL is the ability to reason about synchronization via memory fences, an advanced synchronization feature of the C11 memory model that was not supported by any prior logic.
Read more

Correct Compilation of Relaxed Memory Concurrency

Soham Chakraborty Max Planck Institute for Software Systems
16 Nov 2017, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Compilation of shared memory concurrent programs faces many challenges in the presence of relaxed memory models. On the one hand, relaxed memory models enable multiple transformations on shared memory accesses and fences. On the other hand, not all transformations which are correct for sequential programs are correct in the concurrent setting. Thus, compiler writers have to perform careful analysis to determine which transformations are correct.

In my thesis, I explore the correctness of the multiple steps during the compilation of concurrent C/C++ programs (termed C11). ...
Compilation of shared memory concurrent programs faces many challenges in the presence of relaxed memory models. On the one hand, relaxed memory models enable multiple transformations on shared memory accesses and fences. On the other hand, not all transformations which are correct for sequential programs are correct in the concurrent setting. Thus, compiler writers have to perform careful analysis to determine which transformations are correct.

In my thesis, I explore the correctness of the multiple steps during the compilation of concurrent C/C++ programs (termed C11). First, I consider the correctness of C11 source to source transformations. Next, I study the compilation of C11 programs by LLVM, a state-of-the-art optimizing compiler. LLVM performs a number of optimizing transformations before generating the target code for architectures such as X86 and PowerPC. To perform these transformations, LLVM follows an informal refinement of C11 concurrency semantics. I formalize the LLVM concurrency semantics and study at the abstract level the correctness of various LLVM transformations. Finally, I develop a validator to check if LLVM optimizations are indeed performing only transformations that are correct according to the LLVM and/or the C11 model.
Read more

Toward Data-Driven Education

Rakesh Agrawal EPFL
13 Nov 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Distinguished Lecture Series
An educational program of study can be viewed as a knowledge graph consisting of learning units and relationships between them. Such a knowledge graph provides the core data structure for organizing and navigating learning experiences. We address three issues in this talk. First, how can we synthesize the knowledge graph, given a set of concepts to be covered in the study program. Next, how can we use data mining to identify and correct deficiencies in a knowledge graph. ...
An educational program of study can be viewed as a knowledge graph consisting of learning units and relationships between them. Such a knowledge graph provides the core data structure for organizing and navigating learning experiences. We address three issues in this talk. First, how can we synthesize the knowledge graph, given a set of concepts to be covered in the study program. Next, how can we use data mining to identify and correct deficiencies in a knowledge graph. Finally, how can we use data mining to form study groups with the goal of maximizing overall learning. We conclude by pointing out some open research problems.
Read more

Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning

Nina Grgić-Hlača Max Planck Institute for Software Systems
10 Nov 2017, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Qualifying Exam
With wide-spread usage of machine learning methods in numerous domains involving human subjects, several studies have raised questions about the potential for unfairness towards certain individuals or groups.

A number of recent works have proposed methods to measure and eliminate unfairness from machine learning methods. However, most of this work has focused on only one dimension of fair decision making: distributive fairness, i.e., the fairness of the decision outcomes. In this work, we leverage the rich literature on organizational justice and focus on another dimension of fair decision making: procedural fairness, ...
With wide-spread usage of machine learning methods in numerous domains involving human subjects, several studies have raised questions about the potential for unfairness towards certain individuals or groups.

A number of recent works have proposed methods to measure and eliminate unfairness from machine learning methods. However, most of this work has focused on only one dimension of fair decision making: distributive fairness, i.e., the fairness of the decision outcomes. In this work, we leverage the rich literature on organizational justice and focus on another dimension of fair decision making: procedural fairness, i.e., the fairness of the decision making process.

We propose measures for procedural fairness that consider the input features used in the decision process, and evaluate the moral judgments of humans regarding the use of these features. We operationalize these measures on two real world datasets using human surveys on the Amazon Mechanical Turk (AMT) platform, demonstrating that we capture important properties of procedurally fair decision making. We provide fast submodular mechanisms to optimize the tradeoff between procedural fairness and prediction accuracy. On our datasets, we observe empirically that procedural fairness may be achieved with little cost to outcome fairness, but that some loss of accuracy is unavoidable.
Read more

StaRVOOrS: Combined Static and Runtime Verification of Object-Oriented Software

Wolfgang Ahrendt Chalmers University
09 Nov 2017, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Static verification techniques are used to analyse and prove properties about programs before they are deployed. In particular, deductive verification techniques often work directly on the source code and are used to verify data-oriented properties of all possible executions. In contrast, runtime verification techniques have been extensively used for control-oriented properties, and analyse concrete executions that occur in the deployed program. I present an approach in which data-oriented and control-oriented properties may be stated in a single formalism amenable to both static and dynamic verification techniques. ...
Static verification techniques are used to analyse and prove properties about programs before they are deployed. In particular, deductive verification techniques often work directly on the source code and are used to verify data-oriented properties of all possible executions. In contrast, runtime verification techniques have been extensively used for control-oriented properties, and analyse concrete executions that occur in the deployed program. I present an approach in which data-oriented and control-oriented properties may be stated in a single formalism amenable to both static and dynamic verification techniques. The specification language enhances a control-oriented property language with data-oriented pre/postconditions. We show how such specifications can be analysed using a combination of the deductive verification system KeY and the runtime verification tool LARVA. Verification is performed in two steps: KeY performs fully automated proof attempts, the resulting partial proofs are analysed, and used to optimize the specification for efficient runtime checking.
Read more

Quantifying & Characterizing Information Diets of Social Media Users

Juhi Kulshrestha Max Planck Institute for Software Systems
08 Nov 2017, 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
With an increasing number of people around the world relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them, there has a been a paradigm shift in the way news and information is exchanged in our society -- from traditional mass media to online social media. This paradigm shift has led to three key changes in the way that people are exchanging information: (i) Unlike the subscribers of mass media, ...
With an increasing number of people around the world relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them, there has a been a paradigm shift in the way news and information is exchanged in our society -- from traditional mass media to online social media. This paradigm shift has led to three key changes in the way that people are exchanging information: (i) Unlike the subscribers of mass media, online social media users are not just passive consumers of information, but they are also active publishers of content on these platforms. (ii) Social media users curate personalized word of mouth channels for themselves by creating social links to their preferred sources, and therefore unlike broadcast mass media where all subscribers receive the same information, individual users on social media might receive information that is very different from what other users are consuming. (iii) Social media users often rely on automated retrieval algorithms like personalized recommendation and search systems deployed by social media platform providers to discover interesting information from the deluge of user-generated content shared on these platforms.

While the area of media studies has traditionally focused mostly on broadcast mass media, with the changing environment it's important to study the news and information consumption of social media users and to also audit how the automated algorithms are modifying what the social media users consume. In this thesis, we fulfill this high-level goal by following a two-fold approach. First, we propose the concept of information diets -- which is the composition of information being produced or consumed -- to measure and reason about the bias and diversity in information production and consumption on social media platforms. We then quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users select for themselves by creating social links, (b) recommendations that the social media platform providers give to the users, and (c) the search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic source diversity, and political perspectives.

Our work is aimed at making social media users more aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effect of these biases through better information discovery and exchange systems on social media.
Read more

Fine-Grained Complexity: Hardness for a Big Data World

Karl Bringmann MPI-INF - D1
08 Nov 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
For many computational problems we know some polynomial time algorithm, say in quadratic time, but it is open whether faster algorithms exist. In a big data world, it is essential to close this gap: If the true complexity of the problem is indeed quadratic, then it is intractable on data arising in areas such as DNA sequencing or social networks. On such data essentially only near-linear time algorithms are feasible. Unfortunately, classic hardness assumptions such as P!=NP are too coarse to explain the gap between linear and quadratic time. ...
For many computational problems we know some polynomial time algorithm, say in quadratic time, but it is open whether faster algorithms exist. In a big data world, it is essential to close this gap: If the true complexity of the problem is indeed quadratic, then it is intractable on data arising in areas such as DNA sequencing or social networks. On such data essentially only near-linear time algorithms are feasible. Unfortunately, classic hardness assumptions such as P!=NP are too coarse to explain the gap between linear and quadratic time.

Fine-grained complexity comes to the rescue by providing conditional lower bounds via fine-grained reductions from certain hard core problems. For instance, it allowed us to rule out truly subquadratic algorithms for the Longest Common Subsequence problem (used e.g. in the diff file comparison tool), assuming a certain strengthening of P!=NP. In this talk we will further discuss applications to Edit Distance, Subset Sum, and RNA Folding.
Read more

Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms

Muhammad Bilal Zafar Max Planck Institute for Software Systems
09 Oct 2017, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
Algorithmic decision making systems are increasingly being used to assist or even replace human decision making in a large number of application domains. These systems rely on complex learning methods and vast amounts of training data to optimize prediction accuracy. The accuracy of these systems has been shown to even surpass human accuracy in some applications. However, there is a growing concern that algorithmic decision making systems can lead, even in the absence of intent, to discriminatory outcomes. ...
Algorithmic decision making systems are increasingly being used to assist or even replace human decision making in a large number of application domains. These systems rely on complex learning methods and vast amounts of training data to optimize prediction accuracy. The accuracy of these systems has been shown to even surpass human accuracy in some applications. However, there is a growing concern that algorithmic decision making systems can lead, even in the absence of intent, to discriminatory outcomes.

Incorporating nondiscrimination goals into the design of algorithmic decision making systems (or, classifiers) has proven to be quite challenging. These challenges arise mainly due to (i) the computational complexities involved in incorporating nondiscrimination goals, and (ii) inadequacy of existing measures to computationally capture discrimination in certain situations. The goal of this thesis is to tackle these two problems.

First, we aim to design mechanisms to incorporate two existing measures of discrimination, disparate treatment and disparate impact, into the design of well-known classifiers. To this end, we introduce a novel and intuitive mechanism of decision boundary covariance. This mechanism can be included into the formulation of any convex boundary-based classifier in the form of convex constraints without increasing the classifier’s complexity. It also allows for fine-grained control over the degree of (non)discrimination, often at a small cost in terms of accuracy.

Second, we propose alternative measures of discrimination that can avoid shortcomings of existing measures in certain situations. Our first proposed measure, disparate mistreatment, is useful in situations when the validity of historical decisions in the training data can be ascertained. The remaining two measures, preferred treatment and preferred impact, are useful in situations when feature and class distributions of different groups subject to the decision making are significantly different, and can additionally help reduce the cost of nondiscrimination. We extend our decision boundary covariance mechanism and incorporate the newly proposed nondiscrimination measures into the formulations of convex boundary-based classifiers, this time as convex-concave constraints. The resulting formulations can be solved efficiently using recent advances in convex-concave programming.

Read more

What knowledge bases know (and what they don't)

Simon Razniewski MPI-INF - D5
04 Oct 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Knowledge bases such as Wikidata, YAGO, or the Google Knowledge Graph are increasingly used in applications like entity resolution, structured search or question answering. The correctness of these knowledge bases is generally well understood and at the center of attention in automated construction techniques. About their completeness, in contrast, most knowledge is anecdotal: Completeness is usually correlated with popularity of topics, yet understanding whether a knowledge base has complete information on a specific topic is often difficult. ...
Knowledge bases such as Wikidata, YAGO, or the Google Knowledge Graph are increasingly used in applications like entity resolution, structured search or question answering. The correctness of these knowledge bases is generally well understood and at the center of attention in automated construction techniques. About their completeness, in contrast, most knowledge is anecdotal: Completeness is usually correlated with popularity of topics, yet understanding whether a knowledge base has complete information on a specific topic is often difficult.

In this talk I will introduce the problem of understanding the completeness of knowledge bases. I will present three approaches: (i) data-inherent inferences about completeness using association rule mining, (ii) extraction of cardinality information from text sources, and (iii) various tools we developed to help knowledge base curators and consumers in mapping and understanding the completeness of knowledge bases.
Read more

Geometric Complexity Theory: An ambitious approach towards P versus NP

Christian Ikenmeyer MPI-INF - D1
06 Sep 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Computational complexity theory is concerned with the study of the inherent complexity of computational problems. Its flagship conjecture is the famous P versus NP conjecture, which is one of the seven Millennium Problems of the Clay Mathematics Institute.

To this day several thousands of computational problems are classified as NP-complete, i.e., they have a polynomial time algorithm if and only if P equals NP. The practical importance of the P vs NP conjecture is at least twofold: First of all, ...
Computational complexity theory is concerned with the study of the inherent complexity of computational problems. Its flagship conjecture is the famous P versus NP conjecture, which is one of the seven Millennium Problems of the Clay Mathematics Institute.

To this day several thousands of computational problems are classified as NP-complete, i.e., they have a polynomial time algorithm if and only if P equals NP. The practical importance of the P vs NP conjecture is at least twofold: First of all, many NP-complete problems are of high practical relevance and have to be solved every day in commercial and scientific applications. Secondly, if NP turned out to be P, then this would break all of today's cryptographic ciphers.

Geometric complexity theory is an ambitious program initiated in 2001 by Ketan Mulmuley and Milind Sohoni towards solving the P vs NP problem by using methods from algebraic geometry and representation theory. During the last few years there has been a significant amount of research activity related to this program. In this talk I explain some basic ideas and recent developments.
Read more

Combinatorial Constructions for Effective Testing

Filip Nikšić Max Planck Institute for Software Systems
14 Aug 2017, 3:00 pm - 4:00 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Testing is the predominant approach to finding bugs and gaining assurance in correctness of concurrent and distributed systems with complex state spaces. In this thesis we use combinatorial methods to develop new techniques for systematic and randomized testing. We answer several theoretical questions about the size of test sets that achieve full coverage, and propose a practical tool for testing actor-based programs.

We start with a general construction, obtained using the probabilistic method from combinatorics, ...
Testing is the predominant approach to finding bugs and gaining assurance in correctness of concurrent and distributed systems with complex state spaces. In this thesis we use combinatorial methods to develop new techniques for systematic and randomized testing. We answer several theoretical questions about the size of test sets that achieve full coverage, and propose a practical tool for testing actor-based programs.

We start with a general construction, obtained using the probabilistic method from combinatorics, that shows that whenever a random test covers a fixed testing goal with sufficiently high probability, there is a small set of random tests that achieves full coverage with high probability. We use this construction in the context of testing distributed systems under network partition faults: our construction explains why random testing tools in this context achieve good coverage--and hence, find bugs--quickly.

In the context of testing concurrent programs, we introduce a notion of hitting families of schedules. It suffices to test schedules from a hitting family if the goal is to expose bugs of "small depth"--bugs that only depend on the relative ordering of a small number of specific events in the program. We study the size of optimal hitting families, and give explicit constructions of small hitting families when the events in the program are partially ordered as a tree. In particular, for balanced trees, our constructions are polylogarithmic in the number of events.

Finally, we propose a testing tool for finding bugs of small depth in actor-based programs. The tool is based on an existing randomized scheduler for shared-memory multithreaded programs. Adapting the scheduler to actor-based programs poses an interesting underlying theoretical problem.
Read more

Techniques to enforce security policies on untrusted applications

Anjo Vahldiek-Oberwagner Max Planck Institute for Software Systems
07 Aug 2017, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Student Defense Talks - Thesis Proposal
As the dependence on ever-present computer systems increases, so does the potential harm in case software or hardware deviates from user expectations. Users lose data or find illicitly leaked data. To overcome such inadvertent behavior, existing reference monitors fail to (1) protect the confidentiality and integrity of persistent data, and (2) efficiently and robustly mediate untrusted applications. In this proposal we present two reference monitors targeting these shortcomings. We demonstrate the design, implementation, and evaluation of Guardat and ERIM. ...
As the dependence on ever-present computer systems increases, so does the potential harm in case software or hardware deviates from user expectations. Users lose data or find illicitly leaked data. To overcome such inadvertent behavior, existing reference monitors fail to (1) protect the confidentiality and integrity of persistent data, and (2) efficiently and robustly mediate untrusted applications. In this proposal we present two reference monitors targeting these shortcomings. We demonstrate the design, implementation, and evaluation of Guardat and ERIM. The policies protecting persistent data and the mechanisms for their enforcement are spread over many software components and configuration files, increasing the risk of policy violation due to bugs, vulnerabilities and misconfiguration. In Guardat users, developers and administrators specify file protection policies declaratively, concisely and separate from code, and Guardat enforces these policies by mediating I/O in the storage layer. Policy enforcement relies only on the integrity of the Guardat controller and any external policy dependencies. We show experimentally that Guardat overhead is low. While Guardat enforces at the storage layer, it cannot enforce policies over in-memory state of untrusted applications. In contrast to existing techniques, ERIM efficiently mediates an application’s execution by isolating a reference monitor in the same address space. By using Intel Memory Protection Keys in combination with static binary rewriting, ERIM isolates the monitor’s state from strong, malicious adversaries. We propose binary rewriting rules to harden existing executable files and detail use cases in which prior art relied on less robust protection at similar performance.
Read more

Towards Use-Ability in Private Data Analytics

Reinhard Munz Max Planck Institute for Software Systems
04 Aug 2017, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 105
SWS Student Defense Talks - Thesis Proposal
Private data analytics systems preferably provide required analytic accuracy to analysts and specified privacy to individuals whose data is analyzed. Devising a general system that works for a broad range of datasets and analytic scenarios has proven to be difficult. Despite the advent of differentially private systems with proven formal privacy guarantees, industry still uses their inferior ad-hoc predecessors that are able to provide better analytic accuracy. Differentially private mechanisms often need to add large amounts of noise to statistical results, ...
Private data analytics systems preferably provide required analytic accuracy to analysts and specified privacy to individuals whose data is analyzed. Devising a general system that works for a broad range of datasets and analytic scenarios has proven to be difficult. Despite the advent of differentially private systems with proven formal privacy guarantees, industry still uses their inferior ad-hoc predecessors that are able to provide better analytic accuracy. Differentially private mechanisms often need to add large amounts of noise to statistical results, which impairs their use-ability. In my thesis I follow two approaches to improve use-ability of private data analytics systems in general and differentially private systems in particular. First, I revisit ad-hoc mechanisms and explore the possibilities of systems that do not provide Differential Privacy or only a weak version thereof. In my second approach I use the insights gained before to propose UniTraX, the first differentially private analytics system that allows to analyze part of a protected dataset without affecting other parts and without having to give up on guaranteed accuracy.
Read more

Fairer and more accurate, but for whom?

Alexandra Chouldechova CMU
25 Jul 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 005
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Complex statistical models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, ...
Complex statistical models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. In this talk I will describe a subgroup analysis approach for characterizing how models differ in terms of fairness-related quantities such as racial or gender disparities. I will also talk about an ongoing collaboration with the Allegheny County Department of Health and Human Services on developing and implementing a risk assessment tool for use in child welfare referral screening.

https://arxiv.org/abs/1707.00046

http://www.post-gazette.com/local/region/2017/04/09/Allegheny-County-using-algor ithm-to-assist-in-child-welfare-screening/stories/201701290002
Read more

Decision Making and The Value of Explanation

Kathy Strandburg NYU Law
07 Jul 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 005
SWS Colloquium
Much of the policy and legal debate about algorithmic decision-making has focused on issues of accuracy and bias. Equally important, however, is the question of whether algorithmic decisions are understandable by human observers: whether the relationship between algorithmic inputs and outputs can be explained. Explanation has long been deemed a crucial aspect of accountability, particularly in legal contexts. By requiring that powerful actors explain the bases of their decisions — the logic goes — we reduce the risks of error, ...
Much of the policy and legal debate about algorithmic decision-making has focused on issues of accuracy and bias. Equally important, however, is the question of whether algorithmic decisions are understandable by human observers: whether the relationship between algorithmic inputs and outputs can be explained. Explanation has long been deemed a crucial aspect of accountability, particularly in legal contexts. By requiring that powerful actors explain the bases of their decisions — the logic goes — we reduce the risks of error, abuse, and arbitrariness, thus producing more socially desirable decisions. Decision-making processes employing machine learning algorithms complicate this equation. Such approaches promise to refine and improve the accuracy and efficiency of decision-making processes, but the logic and rationale behind each decision often remains opaque to human understanding. Indeed, at a technical level, it is not clear that all algorithms can be made explainable and, at a normative level, it is an open question when and if the costs of making algorithms explainable outweigh the benefits. This presentation will begin to map out some of the issues that must be addressed in determining in what contexts, and under what constraints, machine learning approaches to governmental decision-making are appropriate.
Read more

Your photos expose your social life - Social relation recognition in 5 social domains

Qianru Sun MPI-INF - D4
05 Jul 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Social relations are the foundation of human social life. Developing techniques to analyze such relations in visual data, such as photos, bears great potential to build machines that better understand people at a social level. Social domain-based theory from social psychology is a great starting point to systematically approach social relation recognition. The theory provides a coverage of all aspects of social relations and equally is concrete and predictive about the visual attributes and behaviors defining the relations in each social domain. ...
Social relations are the foundation of human social life. Developing techniques to analyze such relations in visual data, such as photos, bears great potential to build machines that better understand people at a social level. Social domain-based theory from social psychology is a great starting point to systematically approach social relation recognition. The theory provides a coverage of all aspects of social relations and equally is concrete and predictive about the visual attributes and behaviors defining the relations in each social domain. We proposed the first photo dataset built on this holistic conceptualization of social life that is composed of a hierarchical label space of social domains and social relations, and contributed the first models to recognize such domains and relations and find superior performance for attribute based features. Beyond the encouraging performances, we have some findings of interpretable features that are in accordance with the predictions from social psychology literature. We interleave visual recognition and social psychology theory that has the potential to complement the theoretical work in the area with empirical and data-driven models of social life.
Read more

Finding Fake News

Giovanni Luca Ciampaglia Indiana University Network Science Institute
26 Jun 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
SWS Colloquium
Two-thirds of all American adults access the news through social media. But social networks and social media recommendations lead to information bubbles, and personalization and recommendations, by maximizing the click-through rate, lead to ideological polarization. Consequently, rumors, false news, conspiracy theories, and now even fake news sites are an increasingly worrisome phenomena. While media organizations (Snopes.com, PolitiFact, FactCheck.org, et al.) have stepped up their efforts to verify news, political scientists tell us that fact-checking efforts may be ineffective or even counterproductive. ...
Two-thirds of all American adults access the news through social media. But social networks and social media recommendations lead to information bubbles, and personalization and recommendations, by maximizing the click-through rate, lead to ideological polarization. Consequently, rumors, false news, conspiracy theories, and now even fake news sites are an increasingly worrisome phenomena. While media organizations (Snopes.com, PolitiFact, FactCheck.org, et al.) have stepped up their efforts to verify news, political scientists tell us that fact-checking efforts may be ineffective or even counterproductive. To address some of these challenges, researchers at Indiana University are working on an open platform for the automatic tracking of both online fake news and fact-checking on social media. The goal of the platform, named Hoaxy, is to reconstruct the diffusion networks induced by hoaxes and their corrections as they are shared online and spread from person to person.
Read more

Discrimination in Online Advertising

Till Speicher Max Planck Institute for Software Systems
14 Jun 2017, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Qualifying Exam
Online targeted advertising has been subject to many complaints for allowing advertisers to discriminate users belonging to sensitive groups, that led for instance Facebook to disallow the use of sensitive features such as ethnic affinity in certain types of ads. In this talk I argue that such measures are not sufficient and that the problem of discrimination in targeted advertising is much more pernicious. I show how a malicious advertiser can create discriminatory ads without using sensitive features. ...
Online targeted advertising has been subject to many complaints for allowing advertisers to discriminate users belonging to sensitive groups, that led for instance Facebook to disallow the use of sensitive features such as ethnic affinity in certain types of ads. In this talk I argue that such measures are not sufficient and that the problem of discrimination in targeted advertising is much more pernicious. I show how a malicious advertiser can create discriminatory ads without using sensitive features. I argue that discrimination measures should be based on the targeted population and not on the attributes used for targeting and I propose ways to mitigate discriminatory targeting.
Read more

Comprehensive and Practical Policy Compliance in Data Retrieval Systems

Eslam Elnikety Max Planck Institute for Software Systems
14 Jun 2017, 1:00 pm - 2:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Student Defense Talks - Thesis Proposal
Data use policies govern how data retrieval systems process data items coming from many different sources, each subject to its own integrity and confidentiality requirements. Ensuring compliance with these requirements despite bugs, misconfigurations, and operator errors in a large, complex, and fast evolving system is a major challenge.

In this thesis, we present comprehensive and practical compliance systems to address this challenge. To be comprehensive, compliance systems must be able to enforce policies specific to individual data items or to a particular client's data items, ...
Data use policies govern how data retrieval systems process data items coming from many different sources, each subject to its own integrity and confidentiality requirements. Ensuring compliance with these requirements despite bugs, misconfigurations, and operator errors in a large, complex, and fast evolving system is a major challenge.

In this thesis, we present comprehensive and practical compliance systems to address this challenge. To be comprehensive, compliance systems must be able to enforce policies specific to individual data items or to a particular client's data items, the service provider's own policies, and policies that capture legal requirements. To be practical, compliance systems need to meet stringent requirements: runtime overhead must be low; existing applications can run with little modifications; and bugs, misconfigurations, compromises in application code, or actions by unprivileged operators cannot violate policies.

We present the design and implementation of two comprehensive and practical compliance systems: Thoth and Shai. At a high-level, data use policies are stated in a declarative language separate from application code, and a small reference monitor ensures compliance with these policies. Thoth and Shai differ in enforcement techniques. Thoth tracks data flows through the system at runtime by intercepting I/O at processes' boundaries, and enforces the associated policies. Shai, on the other hand, combines static flow analysis and light-weight runtime monitoring (sandboxes and capabilities) to ensure compliance of data flows. We demonstrate the practicality of these systems using a prototype search engine based on the popular Apache Lucene.
Read more

Towards an Approximating Compiler for Numerical Computations

Eva Darulova Max Planck Institute for Software Systems
07 Jun 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Computing resources are fundamentally limited and sometimes an exact solution may not even exist. Thus, when implementing real-world systems, approximations are inevitable, as are the errors introduced by them. The magnitude of errors is problem-dependent but higher accuracy generally comes at a cost in terms of memory, energy or runtime, effectively creating an accuracy-efficiency tradeoff. To take advantage of this tradeoff, we need to ensure that the computed results are sufficiently accurate, otherwise we risk disastrously incorrect results or system failures. ...
Computing resources are fundamentally limited and sometimes an exact solution may not even exist. Thus, when implementing real-world systems, approximations are inevitable, as are the errors introduced by them. The magnitude of errors is problem-dependent but higher accuracy generally comes at a cost in terms of memory, energy or runtime, effectively creating an accuracy-efficiency tradeoff. To take advantage of this tradeoff, we need to ensure that the computed results are sufficiently accurate, otherwise we risk disastrously incorrect results or system failures. Unfortunately, the current way of programming with approximations is mostly manual, and consequently costly, error prone and often produces suboptimal results.

In this talk I will present our vision and efforts so far towards an approximating compiler for numerical computations. Such a compiler would take as input exact high-level code with an accuracy specification and automatically synthesize an approximated implementation which is as efficient as possible, but verifiably computes accurate enough results.
Read more

Quantifying and Reducing Polarization on Social media

Kiran Garimella Aalto University
10 May 2017, 9:45 am - 11:15 am
Saarbrücken building E1 5, room 005
SWS Colloquium
Social media has brought a revolution on how people get exposed to information and how they are consuming news. Beyond the undoubtedly large number of advantages and capabilities brought by social-media platforms, a point of criticism has been the creation of filter bubbles or echo chambers, caused by social homophily as well as by algorithmic personalisation and recommendation in content delivery. In this talk, I will present the methods we developed to (i) detect and quantify the existence of polarization on social media, ...
Social media has brought a revolution on how people get exposed to information and how they are consuming news. Beyond the undoubtedly large number of advantages and capabilities brought by social-media platforms, a point of criticism has been the creation of filter bubbles or echo chambers, caused by social homophily as well as by algorithmic personalisation and recommendation in content delivery. In this talk, I will present the methods we developed to (i) detect and quantify the existence of polarization on social media, (ii) monitor the evolution of polarisation over time, and finally, (iii) devise methods to overcome the effects caused by increased polarization. We build on top of existing studies and ideas from social science with principles from graph theory to design algorithms which are language independent, domain agnostic and scalable to large number of users.
Read more

An Effectful Way to Eliminate Addiction to Dependence

Pierre-Marie Pédrot University of Ljubljana
08 May 2017, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
We define a syntactic monadic translation of type theory, called the weaning translation, that allows for a large range of effects in dependent type theory, such as exceptions, non-termination, non-determinism or writing operation. Through the light of a call-by-push-value decomposition, we explain why the traditional approach fails with type dependency and justify our approach. Crucially, the construction requires that the universe of algebras of the monad forms itself an algebra. The weaning translation applies to a version of the Calculus of Inductive Constructions with a restricted version of dependent elimination, ...
We define a syntactic monadic translation of type theory, called the weaning translation, that allows for a large range of effects in dependent type theory, such as exceptions, non-termination, non-determinism or writing operation. Through the light of a call-by-push-value decomposition, we explain why the traditional approach fails with type dependency and justify our approach. Crucially, the construction requires that the universe of algebras of the monad forms itself an algebra. The weaning translation applies to a version of the Calculus of Inductive Constructions with a restricted version of dependent elimination, dubbed Baclofen Type Theory, which we conjecture is the proper generic way to mix effects and dependence. This provides the first effectful version of CIC, which can be implemented as a Coq plugin.
Read more

Intelligent Control Systems

Sebastian Trimpe Max Planck Institute for Intelligent Systems
04 May 2017, 11:00 am - 12:00 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Due to modern computer and data technology, we can today collect, store, process, and share more data than every before. This data revolution opens fundamentally new ways to think about the classical concept of feedback control as a basis for building future (artificial) intelligent systems, which interact with the physical world. In this talk, I will provide an overview of our recent research on intelligent control systems, which leverages machine learning and modern communication networks for control. ...
Due to modern computer and data technology, we can today collect, store, process, and share more data than every before. This data revolution opens fundamentally new ways to think about the classical concept of feedback control as a basis for building future (artificial) intelligent systems, which interact with the physical world. In this talk, I will provide an overview of our recent research on intelligent control systems, which leverages machine learning and modern communication networks for control. I will present algorithms that enable systems to (i) autonomously learn from data, (ii) interconnect in cooperative networks, and (iii) use their resources efficiently. Throughout the talk, the developed algorithms and theory are highlighted with experiments on humanoid robots and a self-balancing dynamic sculpture.
Read more

Digital Knowledge: From Facts to Rules and Back

Daria Stepanova MPI-INF - D5
03 May 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Knowledge Graphs (KGs) are huge collections of primarily encyclopedic facts, which are automatically extracted from the Web. Prominent examples of KGs include Yago, DBPedia, Google Knowledge Graph. We all use KGs when posing simple queries like "capital of Saarland" to Google. Internally, such queries are translated into machine readable representations, which are then issued against the KG stored at the backend of the search engine. Instead of syntactically relevant Web pages, the actual answer to the above query, ...
Knowledge Graphs (KGs) are huge collections of primarily encyclopedic facts, which are automatically extracted from the Web. Prominent examples of KGs include Yago, DBPedia, Google Knowledge Graph. We all use KGs when posing simple queries like "capital of Saarland" to Google. Internally, such queries are translated into machine readable representations, which are then issued against the KG stored at the backend of the search engine. Instead of syntactically relevant Web pages, the actual answer to the above query, "Saarbrücken", is then output to the user as a result.

However, since KGs are automatically constructed, they are often inaccurate and incomplete. In this talk, I will investigate how deductive and inductive reasoning services could be used to address these crucially important issues. More specifically, first, I will present an approach for repairing inconsistencies in hybrid logical systems that can be built on top of KGs. Second, I will describe a method for inductive learning of rules with exceptions from KGs and show how these are applied for deriving missing facts.
Read more

On Rationality of Nonnegative Matrix Factorization

James Worrell University of Oxford
02 May 2017, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 112
SWS Distinguished Lecture Series
The nonnegative rank of a nonnegative m x n matrix M is the smallest number d such that M can be written as the product M = WH of a nonnegative m x d matrix W and a nonnegative d x n matrix H.  The notions of nonnegative rank and nonnegative matrix factorization have a wide variety of applications, including bioinformatics, computer vision, communication complexity, document clustering, and recommender systems. A longstanding open problem is whether, ...
The nonnegative rank of a nonnegative m x n matrix M is the smallest number d such that M can be written as the product M = WH of a nonnegative m x d matrix W and a nonnegative d x n matrix H.  The notions of nonnegative rank and nonnegative matrix factorization have a wide variety of applications, including bioinformatics, computer vision, communication complexity, document clustering, and recommender systems. A longstanding open problem is whether, when M is a rational matrix, the factors W and H in a rank decomposition M=WH can always be chosen to be rational.  In this talk we resolve this problem negatively and discuss consequences of this result for the computational complexity of computing nonnegative rank.

This is joint work with Dmitry Chistikov, Stefan Kiefer, Ines Marusic, and Mahsa Shirmohammadi.
Read more

Securing enclaves with formal verification

Andrew Baumann Microsoft Research, Redmond
26 Apr 2017, 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112
SWS Colloquium
Moore's Law may be slowing, but, perhaps as a result, other measures of processor complexity are only accelerating. In recent years, Intel's architects have turned to an alphabet soup of instruction set extensions such as MPX, SGX, MPK, and CET as a way to sell CPUs through new security features. SGX in particular promises powerful security: user-mode "enclaves" protected against both physical attacks and privileged software adversaries. To achieve this, SGX's designers implemented an isolation mechanism approaching the complexity of an OS microkernel in the ISA, ...
Moore's Law may be slowing, but, perhaps as a result, other measures of processor complexity are only accelerating. In recent years, Intel's architects have turned to an alphabet soup of instruction set extensions such as MPX, SGX, MPK, and CET as a way to sell CPUs through new security features. SGX in particular promises powerful security: user-mode "enclaves" protected against both physical attacks and privileged software adversaries. To achieve this, SGX's designers implemented an isolation mechanism approaching the complexity of an OS microkernel in the ISA, using an inscrutable mix of silicon and microcode. However, while CPU-based security mechanisms are harder to attack, they are also harder to deploy and update, and already a number of significant flaws have been identified. Worse, the availability of new SGX features is dependent on the slowing deployment of new CPUs.

In this talk, I'll describe an alternative approach to providing SGX-like enclaves: decoupling the underlying hardware mechanisms such as memory encryption, address-space isolation and attestation from a privileged software monitor which implements enclaves. The monitor's trustworthiness is guaranteed by a machine-checked proof of both functional correctness and high-level security properties. The ultimate goal is to achieve security that is equivalent or better than SGX while decoupling enclave features from the underlying hardware.
Read more

Comprehensive deep linking for mobile apps

Oriana Riva Microsoft Research, Redmond
10 Apr 2017, 10:30 am
Kaiserslautern building G26, room 112
simultaneous videocast to Saarbrücken building E1 5, room 005
SWS Colloquium
Web deep links are instrumental to many fundamental user experiences such as navigating to one web page from another, bookmarking a page, or sharing it with others. Such experiences are not possible with individual pages inside mobile apps, since historically mobile apps did not have links equivalent to web deep links. Mobile deep links, introduced in recent years, still lack many important properties of web deep links. Unlike web links, mobile deep links must be explicitly built into apps by developers, ...
Web deep links are instrumental to many fundamental user experiences such as navigating to one web page from another, bookmarking a page, or sharing it with others. Such experiences are not possible with individual pages inside mobile apps, since historically mobile apps did not have links equivalent to web deep links. Mobile deep links, introduced in recent years, still lack many important properties of web deep links. Unlike web links, mobile deep links must be explicitly built into apps by developers, cover a small number of predefined pages, and are defined statically to navigate to a page for a given link, but not to dynamically generate a link for a given page. Another problem with state-of-the-art deep links is that, once exposed, they are hard to discover, thus limiting their usage in both first and third-party experiences.

In this talk, I'll give an overview of two new deep linking mechanisms that address these problems. First, we implemented an application library that transparently tracks data- and UI-event-dependencies of app pages and encodes the information in links to the pages; when a link is invoked, the information is utilized to recreate the target page quickly and accurately. Second, using static and dynamic analysis we prototyped a tool that can automatically discover links that are exposed by an app; in addition, it can discover many links that are not explicitly exposed. The goal is to obtain links to every page in an app automatically and precisely.
Read more

Graph Decomposition Problems in Image Analysis

Björn Andres D2
05 Apr 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
A large part of image analysis is about breaking things into pieces. Decompositions of a graph are a mathematical abstraction of the possible outcomes. Breaking things (like an image itself) into pieces in a controlled way is hard. Optimization problems whose feasible solutions define decompositions of a graph are a mathematical abstraction of this task. One example is the correlation clustering problem whose feasible solutions relate one-to-one to the decompositions of a graph, and whose objective function puts a cost or reward on neighboring nodes ending up in distinct components. ...
A large part of image analysis is about breaking things into pieces. Decompositions of a graph are a mathematical abstraction of the possible outcomes. Breaking things (like an image itself) into pieces in a controlled way is hard. Optimization problems whose feasible solutions define decompositions of a graph are a mathematical abstraction of this task. One example is the correlation clustering problem whose feasible solutions relate one-to-one to the decompositions of a graph, and whose objective function puts a cost or reward on neighboring nodes ending up in distinct components. This talk shows work by the Combinatorial Image Analysis research group that applies this problem and proposed generalizations to diverse image analysis tasks. It sketches the algorithms introduced by the group for finding feasible solutions for large instances in practice, solutions that are often superior in the metrics of application-specific benchmarks. Finally, it sketches the algorithm introduced by the group for finding lower bounds and points to new findings and open problems of polyhedral geometry in this context.
Read more

Local Reasoning for Concurrency, Distribution and Web Programming

Azalea Raad Imperial College
03 Apr 2017, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
In this talk I will present my research in developing local reasoning techniques in both concurrent and sequential settings.

On the concurrency side, I’ll present my work on the program logic of CoLoSL (Concurrent Local Subjective Logic) and its application to various fine-grained concurrent algorithms. A key difficulty in verifying concurrent programs is reasoning compositionally about each thread in isolation. CoLoSL is the first program logic to introduce the general composition and framing of interference relations (describing how shared resources may be manipulated by each thread) in the spirit of resource composition and framing in separation logic. ...
In this talk I will present my research in developing local reasoning techniques in both concurrent and sequential settings.

On the concurrency side, I’ll present my work on the program logic of CoLoSL (Concurrent Local Subjective Logic) and its application to various fine-grained concurrent algorithms. A key difficulty in verifying concurrent programs is reasoning compositionally about each thread in isolation. CoLoSL is the first program logic to introduce the general composition and framing of interference relations (describing how shared resources may be manipulated by each thread) in the spirit of resource composition and framing in separation logic. This in turn enables local reasoning and allows for more concise specifications and proofs.

I will then present preliminary work on extending CoLoSL to reason about distributed database applications running under the snapshot isolation (SI) consistency model. SI is a commonly used consistency model for transaction processing, implemented by most distributed databases. The existing work focuses on the SI semantics and verification techniques for client-side applications remain unexplored. To fill this gap, I look into extending CoLoSL towards a program logic for client-side reasoning under SI.

On the sequential side, I’ll briefly discuss my work on specification and verification of web programs. My research in this area include: a compositional specification of the DOM (Document Object Model) library in separation logic; integrating this DOM specification with the JaVerT (JavaScript Verification Toolchain) framework for automated DOM/JavaScript client verification; as well as ongoing work towards extending JaVerT to reason about higher-order JavaScript programs.
Read more

Understanding & Controlling User Privacy in Social Media via Exposure

Mainack Mondal Max Planck Institute for Software Systems
31 Mar 2017, 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
The popularity of online social media sites (OSM) like Facebook and Twitter have led to a renewed discussion about user privacy concerns today. These sites now mediate the sharing of personal information, photos, status updates, and contacts of billions of users around the world; some OSMs even serve as the de-facto internet portal for a significant fraction of the world's population. However, in OSMs, users themselves are burdened to manage the privacy for the large number of social content they post. ...
The popularity of online social media sites (OSM) like Facebook and Twitter have led to a renewed discussion about user privacy concerns today. These sites now mediate the sharing of personal information, photos, status updates, and contacts of billions of users around the world; some OSMs even serve as the de-facto internet portal for a significant fraction of the world's population. However, in OSMs, users themselves are burdened to manage the privacy for the large number of social content they post. As a result, often, the user specified privacy settings do not match user expectation. Consequently, in recent years numerous news media reports as well as research studies captured users' concern with protecting privacy of their social media content. These reports and studies boldly underline the users' urgent need for better privacy control mechanisms. Thus, today, the key research challenge in this space is: How do we provide improved privacy protection to OSM users for their social content? In this thesis, we propose approaches to address this question.

In this talk, I will first motivate my thesis research with privacy violation examples from real world which are not captured by access control, the dominant privacy model today. Then I will propose exposure control, a more inclusive privacy model to capture such violations. We define exposure for a content as the set of people who actually view the content. Next I will present an brief overview of our work on understanding and controlling exposure into three different real world scenarios -- (1) Understanding and controlling exposure using social access control lists (SACLs) (2) Controlling exposure by limiting large-scale social data aggregators and (3) Understanding and controlling longitudinal exposure in OSMs. i.e., how users control exposure of their old content. Finally I will expand upon the understanding and controlling longitudinal exposure in OSMs and propose our ongoing work on designing mechanisms to better control the longitudinal exposure.
Read more

Combining Computing, Communications and Controls in Safety Critical Systems

Professor Richard Murray Caltech
31 Mar 2017, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Distinguished Lecture Series
Flight critical subsystems in aerospace vehicles must achieve probability of failure rates of less than 1 failure in 10^9 flight hours (i.e. less than 1 failure per 100,000 years of operation).  Systems that achieve this level of reliability are hard to design, hard to verify, and hard to validate, especially if software is involved.  In this talk, I will talk about some of the challenges that the aerospace community faces in designing systems with this level of reliability and how tools from formal methods and control theory might help.  ...
Flight critical subsystems in aerospace vehicles must achieve probability of failure rates of less than 1 failure in 10^9 flight hours (i.e. less than 1 failure per 100,000 years of operation).  Systems that achieve this level of reliability are hard to design, hard to verify, and hard to validate, especially if software is involved.  In this talk, I will talk about some of the challenges that the aerospace community faces in designing systems with this level of reliability and how tools from formal methods and control theory might help.  I will also describe some of the my group’s work in synthesis of reactive protocols for hybrid systems and its applications to design of safety critical systems.
Read more

Hardening cloud and datacenter systems against configuration errors

Tianyin Xu University of California
22 Mar 2017, 10:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Configuration errors are among the dominant causes of service-wide, catastrophic failures in today's cloud and datacenter systems. Despite the wide adoption of fault-tolerance and recovery techniques, these large-scale software systems still fail to effectively deal with configuration errors. In fact, even tolerance/recovery mechanisms are often misconfigured and thus crippled in reality.

In this talk, I will present our research efforts towards hardening cloud and datacenter systems against configuration errors. I will start with work that seeks for understanding the fundamental causes of misconfigurations. ...
Configuration errors are among the dominant causes of service-wide, catastrophic failures in today's cloud and datacenter systems. Despite the wide adoption of fault-tolerance and recovery techniques, these large-scale software systems still fail to effectively deal with configuration errors. In fact, even tolerance/recovery mechanisms are often misconfigured and thus crippled in reality.

In this talk, I will present our research efforts towards hardening cloud and datacenter systems against configuration errors. I will start with work that seeks for understanding the fundamental causes of misconfigurations. I will then focus on two of my approaches, PCheck and Spex, that enable software systems to anticipate and defend against configuration errors. PCheck generates checking code to help systems detect configuration errors early, and Spex exposes bad system reactions to configuration errors based on constraints inferred from source code.
Read more

A New Approach to Network Functions

Aurojit Panda University of Calefornia, Berkely,
17 Mar 2017, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Modern networks do far more than  just deliver packets, and provide network functions -- including  firewalls, caches, and WAN optimizers — that are crucial for scaling networks, ensuring security and enabling new applications. Network functions were traditionally implemented using dedicated hardware middleboxes, but in recent years they are increasingly being deployed as VMs on commodity servers. While many herald this move towards network function virtualization (NFV) as a great step forward, I argue that accepted virtualization techniques are ill-suited to network functions. ...
Modern networks do far more than  just deliver packets, and provide network functions -- including  firewalls, caches, and WAN optimizers — that are crucial for scaling networks, ensuring security and enabling new applications. Network functions were traditionally implemented using dedicated hardware middleboxes, but in recent years they are increasingly being deployed as VMs on commodity servers. While many herald this move towards network function virtualization (NFV) as a great step forward, I argue that accepted virtualization techniques are ill-suited to network functions. In this talk I describe NetBricks — a new approach to building and running virtualized network functions that speeds development and increases performance. I end the talk by discussing the implications of being able to easily create and insert new network functions. 
Read more

A formal model for capability machines: Towards secure compilation to CHERI

Akram El-Korashy Max Planck Institute for Software Systems
16 Mar 2017, 3:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Qualifying Exam
Vulnerabilities in computer systems often arise due to a mismatch between the abstractions that a high-level programming language offers and the actual low-level environment in which the compiled machine code executes. A low-level environment for example may offer only lax memory safety compared to the memory safety guarantee that the high-level programming language gives.

To address this mismatch, architectures with hardware or instruction-level support for protection mechanisms can be useful. One trend in computer systems protection is hardware-supported enforcement of security guarantees/policies. ...
Vulnerabilities in computer systems often arise due to a mismatch between the abstractions that a high-level programming language offers and the actual low-level environment in which the compiled machine code executes. A low-level environment for example may offer only lax memory safety compared to the memory safety guarantee that the high-level programming language gives.

To address this mismatch, architectures with hardware or instruction-level support for protection mechanisms can be useful. One trend in computer systems protection is hardware-supported enforcement of security guarantees/policies. Capability-based machines are one instance of hardware-based protection mechanisms.

In this talk, I discuss a formal model of the CHERI architecture, a capability machine architecture, with the aim of formal reasoning about the security guarantees that it offers. I discuss the notions of capability unforgeability, and memory compartmentalization which we showed can be achieved with the help of CHERI's protection mechanisms.

The motivation for this work is to highlight the potential of using CHERI as a target architecture for secure compilation, i.e., compilation that preserves security properties.
Read more

Relational Cost Analysis

Ezgi Cicek Max Planck Institute for Software Systems
15 Mar 2017, 3:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
Research in the programming languages community has made great progress on statically estimating the execution cost of a program. The execution cost could be the number of execution steps or any other abstract metric of resource usage. Static execution cost analysis techniques based on program analysis, type systems and abstract interpretation are well-studied. However, if we are interested in establishing bounds on the execution cost of a program relative to another program's cost, naively combining worst- and best-case execution costs of the two programs does not work well in many cases. ...
Research in the programming languages community has made great progress on statically estimating the execution cost of a program. The execution cost could be the number of execution steps or any other abstract metric of resource usage. Static execution cost analysis techniques based on program analysis, type systems and abstract interpretation are well-studied. However, if we are interested in establishing bounds on the execution cost of a program relative to another program's cost, naively combining worst- and best-case execution costs of the two programs does not work well in many cases. The reason is that such unary analysis techniques analyze the two programs in isolation, ignoring the similarities between the programs and their inputs.

In this thesis proposal, I propose the first relational cost analysis framework that establishes the relative cost of one program with respect to another, possibly similar program by taking advantage of the similarities in their inputs and their code. Among other things, such a relational cost analysis can be used to prove statements like: 1) Of two similar algorithms to solve a problem, one is faster than the other, without having to establish the exact complexity of either algorithm; 2) The execution cost of a program is independent of a secret input, so nothing can be inferred about the secret input by observing the cost of executing the program; and 3) The cost of a program varies very little with input changes; this can aid resource allocation and scheduling. I show that in cases where the two programs and their inputs are closely related, relational cost analysis is superior to non-relational analysis both in terms of precision and simplicity. Specifically, I show that a refinement type and effect system provides a precise and compact mechanism for proving the relative cost of two programs. The idea behind relational cost analysis can be extended to other nontrivial domains. In particular, I show that relational cost analysis can be adapted to prove upper bounds on the update costs of incremental programs.  Finally, I briefly present a bidirectional technique to typecheck relational cost bounds.
Read more

Event time series analysis and its applications to social media analysis

Ryota Kobayashi National Institute of Informatics, Japan
14 Mar 2017, 2:09 pm - 3:39 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
We first present an approach for analyzing the event time series, namely, the times of event occurrences. Interestingly, the event time series appears in various fields, including neuroscience (action potentials of a neuron), social media analysis (Twitter, Facebook), and so on. Then, we develop a framework for forecastingretweet activity through the analysis of Twitter dataset(Kobayashi & Lambiotte, ICWSM, 2016). This work is joint work with Renaud Lambiotte.

Privacy as a Service

Raymond Cheng University of Washington
13 Mar 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Current cloud services are vulnerable to hacks and surveillance programs that undermine user privacy and personal liberties. In this talk, I present how we can build practical systems for protecting user privacy from powerful persistent threats. I will discuss Talek, a private publish-subscribe protocol. With Talek, application developers can send and synchronize data through the cloud without revealing any information about data contents or communication patterns of application users. The protocol is designed with provable security guarantees and practical performance, ...
Current cloud services are vulnerable to hacks and surveillance programs that undermine user privacy and personal liberties. In this talk, I present how we can build practical systems for protecting user privacy from powerful persistent threats. I will discuss Talek, a private publish-subscribe protocol. With Talek, application developers can send and synchronize data through the cloud without revealing any information about data contents or communication patterns of application users. The protocol is designed with provable security guarantees and practical performance, 3-4 orders of magnitude better throughput than other systems with comparable security goals. I will also discuss Radiatus, a security-focused web framework to protect web apps against external intrusions, and uProxy, a Internet censorship circumvention tool in deployment today.
Read more

Randomized Algorithms Meets Formal Verification

Justin Hsu University of Pennsylvania
08 Mar 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Algorithms and formal verification are two classical areas of computer science. The two fields apply rigorous mathematical proof to seemingly disparate ends---on the one hand, analyzing computational efficiency of algorithms; on the other, designing techniques to mechanically show that programs are correct.

In this talk, I will present a surprising confluence of ideas from these two areas. First, I will show how coupling proofs, used to analyze random walks and Markov chains, correspond to proofs in the program logic pRHL (probabilistic Relational Hoare Logic). ...
Algorithms and formal verification are two classical areas of computer science. The two fields apply rigorous mathematical proof to seemingly disparate ends---on the one hand, analyzing computational efficiency of algorithms; on the other, designing techniques to mechanically show that programs are correct.

In this talk, I will present a surprising confluence of ideas from these two areas. First, I will show how coupling proofs, used to analyze random walks and Markov chains, correspond to proofs in the program logic pRHL (probabilistic Relational Hoare Logic). This connection enables formal verification of novel probabilistic properties, and provides an structured understanding of proofs by coupling. Then, I will show how an approximate version of pRHL, called apRHL, points to a new, approximate version of couplings closely related to differential privacy. The corresponding proof technique---proof by approximate coupling---enables cleaner proofs of differential privacy, both for humans and for formal verification. Finally, I will share some directions towards a possible "Theory AB", blending ideas from both worlds.
Read more

Constrained Counting and Sampling: Bridging the Gap between Theory and Practice

Kuldeep Meel Rice University
07 Mar 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints. In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. In this talk, ...
Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints. In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. In this talk, I will introduce a novel algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in Boolean reasoning over the past two decades. This has allowed us to obtain breakthrough results in constrained sampling and counting, providing a new algorithmic toolbox in machine learning, probabilistic reasoning, privacy, and design verification. I will demonstrate the utility of the above techniques on various real applications including probabilistic inference, design verification and our ongoing collaboration in estimating the reliability of critical infrastructure networks during natural disasters.
Read more

Variational Bayes In Private Settings

Mijung Park Amsterdam Machine Learning Lab
03 Mar 2017, 10:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Bayesian methods are frequently used for analysing privacy-sensitive datasets, including medical records, emails, and educational data, and there is a growing need for practical Bayesian inference algorithms that protect the privacy of individuals' data. To this end, we provide a general framework for privacy-preserving variational Bayes (VB) for a large class of probabilistic models, called the conjugate exponential (CE) family. Our primary observation is that when models are in the CE family, we can privatise the variational posterior distributions simply by perturbing the expected sufficient statistics of the complete-data likelihood. ...
Bayesian methods are frequently used for analysing privacy-sensitive datasets, including medical records, emails, and educational data, and there is a growing need for practical Bayesian inference algorithms that protect the privacy of individuals' data. To this end, we provide a general framework for privacy-preserving variational Bayes (VB) for a large class of probabilistic models, called the conjugate exponential (CE) family. Our primary observation is that when models are in the CE family, we can privatise the variational posterior distributions simply by perturbing the expected sufficient statistics of the complete-data likelihood. For widely used non-CE models with binomial likelihoods (e.g., logistic regression), we exploit the Polya-Gamma data augmentation scheme to bring such models into the CE family, such that inferences in the modified model resemble the original (private) variational Bayes algorithm as closely as possible. The iterative nature of variational Bayes presents a further challenge for privacy preservation, as each iteration increases the amount of noise needed. We overcome this challenge by combining: (1) a relaxed notion of differential privacy, called concentrated differential privacy, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation (LDA), Bayesian logistic regression, and Sigmoid Belief Networks (SBNs), evaluated on real-world datasets. >
Read more

The Diffix Framework: Noise Revisited, Again

Paul Francis Max Planck Institute for Software Systems
01 Mar 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
A longstanding open problem in Computer Science is that of how to get high quality statistics through direct queries to databases containing information about individuals without revealing information specific to those individuals.  It has long been recognized that the key to making this work is to add noise to query answers.  The problem has been how to do this without either adding a great deal of noise to answers or limiting the number of answers an analyst can obtain.  ...
A longstanding open problem in Computer Science is that of how to get high quality statistics through direct queries to databases containing information about individuals without revealing information specific to those individuals.  It has long been recognized that the key to making this work is to add noise to query answers.  The problem has been how to do this without either adding a great deal of noise to answers or limiting the number of answers an analyst can obtain.  This talk presents Diffix, a new framework for anonymous database query.  Diffix adds noise in such a way that repeated answers produce the same noise: it cannot be averaged away.  This "fixed noise" mechanism, however, creates new opportunities for attacks.  Diffix pro-actively tests potential alternate queries to discover and prevent these attacks.  In this talk, we describe the Diffix framework and present a system design that provides basic query logic and statistical operations.  We will give a brief demo of a more advanced Diffix system that operates as a commercial product.  
Read more

Learning With and From People

Adish Singla ETH Zürich
28 Feb 2017, 10:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
People are becoming an integral part of computational systems, fueled primarily by recent technological advancements as well as deep-seated economic and societal changes. Consequently, there is a pressing need to design new data science and machine learning frameworks that can tackle challenges arising from human participation (e.g. questions about incentives and users’ privacy) and can leverage people’s capabilities (e.g. ability to learn).

In this talk, I will share my research efforts at the confluence of people and computing to address real-world problems. ...
People are becoming an integral part of computational systems, fueled primarily by recent technological advancements as well as deep-seated economic and societal changes. Consequently, there is a pressing need to design new data science and machine learning frameworks that can tackle challenges arising from human participation (e.g. questions about incentives and users’ privacy) and can leverage people’s capabilities (e.g. ability to learn).

In this talk, I will share my research efforts at the confluence of people and computing to address real-world problems. Specifically, I will focus on collaborative consumption systems (e.g. shared mobility systems and sharing economy marketplaces like Airbnb) and showcase the need to actively engage users for shaping the demand who would otherwise act primarily in their own interest. The main idea of engaging users is to incentivize them to switch to alternate choices that would improve the system’s effectiveness. To offer optimized incentives, I will present novel multi-armed bandit algorithms and online learning methods in structured spaces for learning users’ costs for switching between different pairs of available choices. Furthermore, to tackle the challenges of data sparsity and to speed up learning, I will introduce hemimetrics as a structural constraint over users’ preferences. I will show experimental results of applying the proposed algorithms on two real-world applications: incentivizing users to explore unreviewed hosts on services like Airbnb and tackling the imbalance problem in bike sharing systems. In collaboration with an ETH Zurich spinoff and a public transport operator in the city of Mainz, Germany, we deployed these algorithms via a smartphone app among users of a bike sharing system. I will share the findings from this deployment.
Read more

A New Verified Compiler Backend for CakeML

Magnus Myreen Chalmers University of Technology, Göteborg
22 Feb 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112
SWS Colloquium
The CakeML project has recently produced a verified compiler which we believe to be the most realistic verified compiler for a functional programming language to date. In this talk I'll give an overview of the CakeML project with focus on the new compiler, in particular how the compiler is structured, how the intermediate languages are designed and how the proofs are carried out. The talk will stay at a fairly high-level, but I am happy to dive into details for any of the parts that I know well. ...
The CakeML project has recently produced a verified compiler which we believe to be the most realistic verified compiler for a functional programming language to date. In this talk I'll give an overview of the CakeML project with focus on the new compiler, in particular how the compiler is structured, how the intermediate languages are designed and how the proofs are carried out. The talk will stay at a fairly high-level, but I am happy to dive into details for any of the parts that I know well.

The CakeML project is currently a collaboration between six sites across three continents. The new compiler is due to: Anthony Fox (Cambridge, UK), Ramana Kumar (Data61, Sydney Australia), Magnus Myreen (Chalmers, Sweden), Michael Norrish (Data61, Canberra Australia), Scott Owens (Kent, UK), and Yong Kiam Tan (CMU, USA).
Read more

Safe, Real-Time Software Reference Architectures for Cyber-Physical Systems

Renato Mancuso University of Illinois at Urbana- Champaign
21 Feb 2017, 10:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
There has been an uptrend in the demand and need for complex Cyber-Physical Systems (CPS), such as self-driving cars, unmanned aerial vehicles (UAVs), and smart manufacturing systems for Industry 4.0. CPS  often need to accurately sense the surrounding environment by using high-bandwidth acoustic, imaging and other types of sensors; and to take coordinated decisions and issue time critical actuation commands. Hence, temporal predictability in sensing, communication, computation, and actuation is a fundamental attribute. Additionally, CPS must operate safely even in spite of software and hardware misbehavior to avoid catastrophic failures. ...
There has been an uptrend in the demand and need for complex Cyber-Physical Systems (CPS), such as self-driving cars, unmanned aerial vehicles (UAVs), and smart manufacturing systems for Industry 4.0. CPS  often need to accurately sense the surrounding environment by using high-bandwidth acoustic, imaging and other types of sensors; and to take coordinated decisions and issue time critical actuation commands. Hence, temporal predictability in sensing, communication, computation, and actuation is a fundamental attribute. Additionally, CPS must operate safely even in spite of software and hardware misbehavior to avoid catastrophic failures. To satisfy the increasing demand for performance, modern computing platforms have substantially increased in complexity; for instance, multi-core systems are now mainstream, and partially re-programmable system-on-chip (SoC) have just entered production. Unfortunately, extensive and unregulated sharing of hardware resources directly undermines the ability of guaranteeing strong temporal determinism in modern computing platforms. Novel software architectures are needed to restore temporal correctness of complex CPS when using these platforms. My research vision is to design and implement software architectures that can serve as a reference for the development of high-performance CPS, and that embody two main requirements: temporal predictability and robustness. In this talk, I will address the following questions concerning modern multi-core systems: Why application timing can be highly unpredictable? What techniques can be used to enforce safe temporal behaviors on multi-core platforms? I will also illustrate possible approaches for time-aware fault tolerance to maximize CPS functional safety. Finally, I will review the challenges faced by the embedded industry when trying to adopt emerging computing platforms, and I will highlight some novel directions that can be followed to accomplish my research vision.
Read more

Computational fair division and mechanism design

Simina Branzei Hebrew University of Jerusalem
20 Feb 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
The emergence of online platforms has brought about a fundamental shift in economic thinking: the design of economic systems is now a problem that computer science can tackle. For the first time we are able to move from the study of economic systems as natural systems to carefully designing and executing them on a computer. Prominent examples of digital market mechanisms include auctions for ads (run by companies such as Google) and electromagnetic spectrum (used by the US government). ...
The emergence of online platforms has brought about a fundamental shift in economic thinking: the design of economic systems is now a problem that computer science can tackle. For the first time we are able to move from the study of economic systems as natural systems to carefully designing and executing them on a computer. Prominent examples of digital market mechanisms include auctions for ads (run by companies such as Google) and electromagnetic spectrum (used by the US government). I will discuss several recent developments in fair division and mechanism design. I will start with a dictatorship theorem for fair division (cake cutting), showing that requiring truthfulness gives rise to a dictator. Afterwards, I will discuss the theme of simplicity and complexity in mechanism design, and more generally the interplay between economics and computation and learning.
Read more

Adventures in Systems Reliability: Replication and Replay

Ali Mashtizadeh Stanford University
17 Feb 2017, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
The past decade has seen a rapid acceleration in the development of new and transformative applications in many areas including transportation, medicine, finance, and communication. Most of these applications are made possible by the increasing diversity and scale of hardware and software systems.

While this brings unprecedented opportunity, it also increases the probability of failures and the difficulty of diagnosing them. Increased scale and transience has also made management increasingly challenging. Devices can come and go for a variety of reasons including mobility, ...
The past decade has seen a rapid acceleration in the development of new and transformative applications in many areas including transportation, medicine, finance, and communication. Most of these applications are made possible by the increasing diversity and scale of hardware and software systems.

While this brings unprecedented opportunity, it also increases the probability of failures and the difficulty of diagnosing them. Increased scale and transience has also made management increasingly challenging. Devices can come and go for a variety of reasons including mobility, failure and recovery, and scaling capacity to meet demand.

In this talk, I will be presenting several systems that I built to address the resulting challenges to reliability, management, and security.

Ori is a reliable distributed file system for devices at the network edge. Ori automates many of the tasks of storage reliability and recovery through replication, taking advantage of fast LANs and low cost local storage in edge networks.

Castor is record/replay system for multi-core applications with predictable and consistently low overheads. This makes it practical to leave record/replay on in production systems, to reproduce difficult bugs when they occur, and to support recovering from hardware failures through fault tolerance.

Cryptographic CFI (CCFI) is a dynamic approach to control flow integrity. Unlike previous CFI systems that rely purely on static analysis, CCFI can classify pointers based on dynamic and runtime characteristics. This limits the attacks to only actively used code paths, resulting in a substantially smaller attack surface.
Read more

Type-Driven Program Synthesis

Nadia Polikarpova MIT CSAIL, Cambridge USA
15 Feb 2017, 10:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Modern programming languages safeguard developers from many typical errors, yet more subtle errors—such as violations of security policies—still plague software. Program synthesis has the potential to eliminate such errors, by generating executable code from concise and intuitive high-level specifications. Traditionally, program synthesis failed to scale to specifications that encode complex behavioral properties of software: these properties are notoriously hard to check even for a given program, and so it’s not surprising that finding the right program within a large space of candidates has been considered very challenging. ...
Modern programming languages safeguard developers from many typical errors, yet more subtle errors—such as violations of security policies—still plague software. Program synthesis has the potential to eliminate such errors, by generating executable code from concise and intuitive high-level specifications. Traditionally, program synthesis failed to scale to specifications that encode complex behavioral properties of software: these properties are notoriously hard to check even for a given program, and so it’s not surprising that finding the right program within a large space of candidates has been considered very challenging. My work tackles this challenge through the design of synthesis-friendly program verification mechanisms, which are able to check a large set of candidate programs against a complex specification at once, whereby efficiently pruning the search space.

Based on this principle, I developed Synquid, a program synthesizer that accepts specifications in the form of expressive types and uses a specialized type checker as its underlying verification mechanism. Synquid is the first synthesizer powerful enough to automatically discover provably correct implementations of complex data structure manipulations, such as insertion into Red-Black Trees and AVL Trees, and normal-form transformations on propositional formulas. Each of these programs is synthesized in under a minute. Going beyond textbook algorithms, I created a language called Lifty, which uses type-driven synthesis to automatically rewrite programs that violate information flow policies. In our case study, Lifty was able to enforce all required policies in a prototype conference management system.
Read more

On the Security and Scalability of Proof of Work Blockchains

Arthur Gervais ETH Zurich
08 Feb 2017, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
The security properties of blockchain technology allow for the shifting of trust assumptions, e.g., to remove trusted third parties; they, however, create new challenges for security and scalability, which have not yet been fully understood and that we investigate in this talk. The blockchain’s security, for example, affects the ability of participants to exchange monetary value or to participate in the network communication and the consensus process. Our first contribution provides a quantitative framework to objectively compare the security and performance characteristics of Proof of Work-based blockchains under adversaries with optimal strategies. ...
The security properties of blockchain technology allow for the shifting of trust assumptions, e.g., to remove trusted third parties; they, however, create new challenges for security and scalability, which have not yet been fully understood and that we investigate in this talk. The blockchain’s security, for example, affects the ability of participants to exchange monetary value or to participate in the network communication and the consensus process. Our first contribution provides a quantitative framework to objectively compare the security and performance characteristics of Proof of Work-based blockchains under adversaries with optimal strategies. Our work allows us to increase Bitcoin’s transaction throughput by a factor of ten, given only one parameter change and without deteriorating the security of the underlying blockchain. In our second contribution, we highlight previously unconsidered impacts of the PoW blockchain’s scalability on its security and propose design modifications that are now implemented in the primary Bitcoin client. Because blockchain technology is still in its infancy, we conclude the talk with an outline of future work towards an open, scalable, privacy-preserving and decentralized blockchain.
Read more

Guiding program analyzers toward unsafe executions

Dr. Maria Christakis University of Kent
06 Feb 2017, 10:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Most static program analysis techniques do not fully verify all possible executions of a program. They leave executions unverified when they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions, such as the absence of arithmetic overflow. In the first part of the talk, I will present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. ...
Most static program analysis techniques do not fully verify all possible executions of a program. They leave executions unverified when they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions, such as the absence of arithmetic overflow. In the first part of the talk, I will present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. We annotate programs to reflect which executions have been verified, and under which assumptions. These annotations are then used to guide dynamic symbolic execution toward unverified program executions, leading to smaller and more effective test suites.

In the second part of the talk, I will describe a new program simplification technique, called program trimming. Program trimming is a program pre-processing step to remove execution paths while retaining equi-safety (that is, the generated program has a bug if and only if the original program has a bug). Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve their effectiveness. I will show that program trimming has a considerable positive impact on the effectiveness of two program analysis techniques, abstract interpretation and dynamic symbolic execution.
Read more

Inverse Rendering

Shida Beigpour MPI-INF - D4
01 Feb 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Photographs as well as video frames are two-dimensional projections of three-dimensional real-world scenes. Photorealistic Computer Graphic Imagery (CGI) is usually generated by "rendering" a sophisticated three-dimensional model of a realistic scene on to a virtual camera sensor. Such process is not invertible due to the high amount of data loss. Still both physically captured photographs and synthesized photos (CGI) contain considerable amount of information about the characteristics of the scene itself. Even naïve human viewers are able to infer important information about the scene, ...
Photographs as well as video frames are two-dimensional projections of three-dimensional real-world scenes. Photorealistic Computer Graphic Imagery (CGI) is usually generated by "rendering" a sophisticated three-dimensional model of a realistic scene on to a virtual camera sensor. Such process is not invertible due to the high amount of data loss. Still both physically captured photographs and synthesized photos (CGI) contain considerable amount of information about the characteristics of the scene itself. Even naïve human viewers are able to infer important information about the scene, e.g., shape, material, illumination, distance, and actions solely from a single image. This is indeed much more difficult for computer systems. Inverse rendering is one of the hot-topics in computer vision in which the goal is to estimate and model the three-dimensional scene and its illumination automatically given only one or a sparse set of images from the scene. This is by nature a highly under-constraint problem which makes it very complex to solve. Yet the advancements in computation and imaging technologies (e.g., depth sensors and light-field cameras) open new horizons in this field. Inverse rendering makes many interesting applications possible including: creating novel views of the scene/objects, re-lighting, detecting and altering the object materials in the photographs, and augmented reality.  This talk provides a brief overview of some of the state-of-the-art inverse rendering techniques and datasets with a strong focus on addressing the complexities of real-world scenarios.
Read more

Time for Text Mining and Information Retrieval

Jannik Strötgen MPI-INF - D5
11 Jan 2017, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Temporal information is an important aspect in any information space and has important characteristics that make it highly eligible to be exploited in diverse text mining and information retrieval scenarios. For quite a long time, only the metadata of text documents (e.g., publication dates) has been considered, but with recent improvements in natural language processing (NLP), temporal expressions occurring in the content of documents (e.g., "January 2016", "next week") can be extracted and interpreted efficiently with high quality. ...
Temporal information is an important aspect in any information space and has important characteristics that make it highly eligible to be exploited in diverse text mining and information retrieval scenarios. For quite a long time, only the metadata of text documents (e.g., publication dates) has been considered, but with recent improvements in natural language processing (NLP), temporal expressions occurring in the content of documents (e.g., "January 2016", "next week") can be extracted and interpreted efficiently with high quality. This is particularly valuable as in many types of documents, temporal expressions do not only occur frequently but also play a crucial role, e.g., to anchor events in time.

In this talk, after explaining the basics and challenges of the NLP task "temporal tagging", I present our approach to enrich documents with temporal information. Then, I showcase several application scenarios in which we exploit automatically extracted temporal annotations for search and exploration tasks. These range from temporal information retrieval for news articles, via Wikipedia-based event extraction, up to the exploration of fictitious happenings in literary texts in the context of digital humanities research.
Read more