Events 2018

Quantifying & Characterizing Information Diets of Social Media Users

Juhi Kulshrestha Max Planck Institute for Software Systems
20 Dec 2018, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media.

With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. ...
An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media.

With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives.

Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.
Read more

Language dynamics in social media

Animesh Mukherjee Indian Institute of Technology, Kharagpur
13 Dec 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 105
SWS Colloquium
In this talk I shall outline a summary of our five year long initiative studying the temporal dynamics of various human language-like entities over the social media. Some of the topics that I plan to cover are (a)  how opinion conflicts could be effectively used for incivility detection in Twitter [CSCW 2018], (b) how word borrowings can be automatically identified from social signals [EMNLP 2017] and (c)  how hashtags in Twitter form compounds like natural language words (e.g., ...
In this talk I shall outline a summary of our five year long initiative studying the temporal dynamics of various human language-like entities over the social media. Some of the topics that I plan to cover are (a)  how opinion conflicts could be effectively used for incivility detection in Twitter [CSCW 2018], (b) how word borrowings can be automatically identified from social signals [EMNLP 2017] and (c)  how hashtags in Twitter form compounds like natural language words (e.g., #Wikipedia+#Blackout=#WikipediaBlackout) that become way more popular than the individual constituent hashtags [CSCW 2016, Honorable Mention].
Read more

Post-quantum Challenges in Secure Computation

Nico Döttling CISPA
05 Dec 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In the early 1990s cryptography went into a foundational crisis when efficient quantum algorithms were discovered which could break almost all public key encryption schemes known at the time. Since then, a enormous research effort has been invested into basing public key cryptography, and secure computation in general, on problems which are conjectured to be hard even for quantum computers. While this research program has been resoundingly successful, even leading up the way to cryptographic milestones such as fully homomorphic encryption, ...
In the early 1990s cryptography went into a foundational crisis when efficient quantum algorithms were discovered which could break almost all public key encryption schemes known at the time. Since then, a enormous research effort has been invested into basing public key cryptography, and secure computation in general, on problems which are conjectured to be hard even for quantum computers. While this research program has been resoundingly successful, even leading up the way to cryptographic milestones such as fully homomorphic encryption, there are still important cryptographic primitives for which no post-quantum secure protocols are known. Until very recently, one such primitive was 2-message oblivious transfer, a fundamental primitive in the field of secure two- and multi-party computation. I will discuss a novel construction of this primitive from the Learning With Errors (LWE) assumption, a lattice-based problem which is known to be as hard as worst-case lattice problems and conjectured to be post-quantum secure. The security of our construction relies on a fundamental Fourier-analytic property of lattices, namely the transference principle: Either a lattice or its dual must have short vectors.
Read more

Privacy-Compliant Mobile Computing

Paarijaat Aditya Max Planck Institute for Software Systems
03 Dec 2018, 4:30 pm - 5:30 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Sophisticated mobile computing, sensing and recording devices like smartphones, smartwatches, and wearable cameras are carried by their users virtually around the clock, blurring the distinction between the online and offline worlds. While these devices enable transformative new applications and services, they also introduce entirely new threats to users' privacy, because they can capture a complete record of the user's location, online and offline activities, and social encounters, including an audiovisual record. Such a record of users' personal information is highly sensitive and is subject to numerous privacy risks. ...
Sophisticated mobile computing, sensing and recording devices like smartphones, smartwatches, and wearable cameras are carried by their users virtually around the clock, blurring the distinction between the online and offline worlds. While these devices enable transformative new applications and services, they also introduce entirely new threats to users' privacy, because they can capture a complete record of the user's location, online and offline activities, and social encounters, including an audiovisual record. Such a record of users' personal information is highly sensitive and is subject to numerous privacy risks. In this thesis, we have investigated and built systems to mitigate two such privacy risks: 1) privacy risks due to ubiquitous digital capture, where bystanders may inadvertently be captured in photos and videos recorded by other nearby users, 2) privacy risks to users' personal information introduced by a popular class of apps called `mobile social apps'. In this thesis, we present two systems, called I-Pic and EnCore, built to mitigate these two privacy risks.

Both systems aim to put the users back in control of what personal information is being collected and shared, while still enabling innovative new applications. We built working prototypes of both systems and evaluated them through actual user deployments. Overall we demonstrate that it is possible to achieve privacy-compliant digital capture and it is possible to build privacy-compliant mobile social apps, while preserving their intended functionality and ease-of-use. Furthermore, we also explore how the two solutions can be merged into a powerful combination, one which could enable novel workflows for specifying privacy preferences in image capture that do not currently exist.
Read more

Survey Equivalence: An Information-theoretic Measure of Classifier Accuracy When the Ground Truth is Subjective

Paul Resnick University of Michigan, School of Information
27 Nov 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Distinguished Lecture Series
Many classification tasks have no objective ground truth. Examples include: which content or explanation is "better" according to some community? is this comment toxic? what is the political leaning of this news article? The traditional modeling approach assumes each item has an objective true state that is perceived by humans with some random error. It fails to account for the fact that people have greater agreement on some items than others. I will describe an alternative model where the true state is a distribution over labels that raters from a specified population would assign to an item. ...
Many classification tasks have no objective ground truth. Examples include: which content or explanation is "better" according to some community? is this comment toxic? what is the political leaning of this news article? The traditional modeling approach assumes each item has an objective true state that is perceived by humans with some random error. It fails to account for the fact that people have greater agreement on some items than others. I will describe an alternative model where the true state is a distribution over labels that raters from a specified population would assign to an item. This leads to information gain (mutual information) as a theoretically justified and computationally tractable measure of a classifier's quality, and an intuitive interpretation of information gain in terms of the sample size for a survey that would yield the same expected error rate.
Read more

The Reachability Problem for Vector Addition Systems is Not Elementary

Wojciech Czerwiński University of Warsaw
22 Nov 2018, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
I will present a recent non-elementary lower bound for the complexity of reachability problem for Vector Addition Systems. I plan to show the main insights of the proof. In particular I will present a surprising equation on fractions, which is the core of the new source of hardness found in VASes.

More Realistic Scheduling Models and Analyses for Advanced Real-Time Embedded Systems

Georg von der Brueggen TU Dortmund
22 Nov 2018, 2:30 pm - 3:30 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
In real-time embedded systems, for each task the compliance to timing constraints has to be guaranteed in addition to the functional correctness. The first part of the talk considers the theoretical comparison of scheduling algorithms and schedulability tests by evaluating speedup factors for non-preemptive scheduling, which leads to a discussion about general problems of resource augmentation bounds. In addition, it is explained how utilization bounds can be parametrized, resulting in better bounds for specific scenarios, i.e., ...
In real-time embedded systems, for each task the compliance to timing constraints has to be guaranteed in addition to the functional correctness. The first part of the talk considers the theoretical comparison of scheduling algorithms and schedulability tests by evaluating speedup factors for non-preemptive scheduling, which leads to a discussion about general problems of resource augmentation bounds. In addition, it is explained how utilization bounds can be parametrized, resulting in better bounds for specific scenarios, i.e., when analyzing non-preemptive Rate-Monotonic scheduling as well as task sets inspired by automotive applications.

In the second part, a setting similar to mixed-criticality systems is considered and the criticism on previous work in this area is detailed. Hence, a new system model that allows a better applicability to realistic scenarios, namely Systems with Dynamic Real-Time Guarantees, is explained. This model is extended to a multiprocessor scenario, considering CPU overheating as a possible cause for mixed-criticality behaviour. Finally, a way to determine the deadline-miss probability for such systems is described that drastically reduces the runtime of such calculations.

The third part discusses tasks with self-suspension behaviour, explains a fixed-relative-deadline strategy for segmented self-suspension tasks with one suspension interval, and details how this approach can be exploited in a resource-oriented partitioned scheduling. Furthermore, it is explained how the gap between the dynamic and the segmented self-suspension model can be bridged by hybrid models.
Read more

Verified Secure Routing

Peter Müller ETH Zurich
19 Nov 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Distinguished Lecture Series
SCION is a new Internet architecture that addresses many of the security vulnerabilities of today’s Internet. Its clean-slate design provides, among other properties, route control, failure isolation, and multi-path communication. The verifiedSCION project is an effort to formally verify the correctness and security of SCION. It aims to provide strong guarantees for the entire architecture, from the protocol design to its concrete implementation. The project uses stepwise refinement to prove that the protocol withstands increasingly strong attackers. ...
SCION is a new Internet architecture that addresses many of the security vulnerabilities of today’s Internet. Its clean-slate design provides, among other properties, route control, failure isolation, and multi-path communication. The verifiedSCION project is an effort to formally verify the correctness and security of SCION. It aims to provide strong guarantees for the entire architecture, from the protocol design to its concrete implementation. The project uses stepwise refinement to prove that the protocol withstands increasingly strong attackers. The refinement proofs assume that all network components such as routers satisfy their specifications. This property is then verified separately using deductive program verification in separation logic. This talk will give an overview of the verifiedSCION project and explain, in particular, how we verify code-level properties such as memory safety, I/O behavior, and information flow security.
Read more

Feedback Control for Predictable Cloud Computing

Dr. Martina Maggio Lund University
14 Nov 2018, 10:30 am - 12:00 pm
Saarbrücken building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Cloud computing gives the illusion of infinite computational capacity and allows for on-demand resource provisioning. As a result, over the last few years, the cloud computing model has experienced widespread industrial adoption and companies like Netflix offloaded their entire infrastructure to the cloud. However, with even the largest datacenter being of a finite size, cloud infrastructures have experienced overload due to overbooking or transient failures. In essence, this is an excellent opportunity for the design of control solutions, ...
Cloud computing gives the illusion of infinite computational capacity and allows for on-demand resource provisioning. As a result, over the last few years, the cloud computing model has experienced widespread industrial adoption and companies like Netflix offloaded their entire infrastructure to the cloud. However, with even the largest datacenter being of a finite size, cloud infrastructures have experienced overload due to overbooking or transient failures. In essence, this is an excellent opportunity for the design of control solutions, that tackle the problem of mitigating overload peaks, using feedback from the computing infrastructure. Exploiting control-theoretical principles and taking advantage of the knowledge and the analysis capabilities of control tools, it is possible to provide formal guarantees on the predictability of the cloud platform. This talk introduces recent research advances on feedback control in the cloud computing domain. This talk discusses control solutions and future research for both cloud application development, and infrastructure management. In particular, it covers application brownout, control-based load-balancing, and autoscaling.
Read more

Learning from the People: From Normative to Descriptive Solutions to Problems in Security, Privacy & Machine Learning

Elissa Redmiles University of Maryland
13 Nov 2018, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
A variety of experts -- computer scientists, policy makers, judges -- constantly make decisions about best practices for computational systems. They decide which features are fair to use in a machine learning classifier predicting whether someone will commit a crime, and which security behaviors to recommend and require from end-users. Yet, the best decision is not always clear. Studies have shown that experts often disagree with each other and, perhaps more importantly, with the people for whom they are making these decisions: the users. ...
A variety of experts -- computer scientists, policy makers, judges -- constantly make decisions about best practices for computational systems. They decide which features are fair to use in a machine learning classifier predicting whether someone will commit a crime, and which security behaviors to recommend and require from end-users. Yet, the best decision is not always clear. Studies have shown that experts often disagree with each other and, perhaps more importantly, with the people for whom they are making these decisions: the users.

This raises a question: Is it possible to learn best practices directly from the users? The field of moral philosophy suggests yes, through the process of descriptive decision-making, in which we observe people’s preferences and then infer best practice rather than using experts’ normative (prescriptive) determinations to define best practice. In this talk, I will explore the benefits and challenges of applying such a descriptive approach to making computationally relevant decisions regarding: (i) selecting security prompts for an online system; (ii) determining which features to include in a classifier for jail sentencing; (iii) defining standards for ethical virtual reality content.
Read more

Fairness for Sequential Decision Making Algorithms

Hoda Heidari ETH Zurich
12 Nov 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112
SWS Colloquium
Fairness considerations in settings where decisions are made by supervised learning algorithms (e.g. criminal risk assessment) has received considerable attention, recently. As the fairness literature continues to expand mostly around this canonical learning task, it is important to recognize that many real-world applications of ML fall outside the category of supervised, one-shot learning. In this presentation, I will talk about two scenarios in which algorithmic decisions are made in a sequential manner and over time. I will argue that in such settings, ...
Fairness considerations in settings where decisions are made by supervised learning algorithms (e.g. criminal risk assessment) has received considerable attention, recently. As the fairness literature continues to expand mostly around this canonical learning task, it is important to recognize that many real-world applications of ML fall outside the category of supervised, one-shot learning. In this presentation, I will talk about two scenarios in which algorithmic decisions are made in a sequential manner and over time. I will argue that in such settings, being fair---at a minimum---requires decisions to be "consistent" across individuals who arrive at different time steps, that is, similar individuals must be treated similarly. I will then talk about how such consistency constraints affect learning. 

In the first part of the talk, I will introduce a generic sequential decision making framework, in which at each time step the learning algorithm receives data corresponding to a new individual (e.g. a new job application) and must make an irrevocable decision about him/her (e.g. whether to hire the applicant) based on observations it has made so far. I propose a general framework for post-processing predictions made by a black-box learning model, so that the resulting sequence of outcomes is guaranteed to be consistent. I show, both theoretically and via simulations, that imposing consistency constraints will not significantly slow down learning.

In the second part of the talk, I will focus on fairness considerations in a particular type of market---namely, combinatorial prediction markets---where traders can submit limit orders on various security bundles, and the market maker is tasked with executing these orders in a fair manner. The main challenge in running such market is that executing one order can potentially change the price of every other order in the book. I define the notion of a "fair trading path", which at a high level guarantees similar orders are executed similarly: no order executes at a price above its limit, and every order executes when its market price falls below its limit price. I present a market algorithm that respects these fairness conditions, and evaluate it using real combinatorial predictions made during the 2008 U.S. Presidential election. 

I will conclude by comparing my work with previous papers on fairness for online learning, and a list of directions for future work.
Read more

Generating Software Tests

Andreas Zeller Fachrichtung Informatik - Saarbrücken/CISPA
07 Nov 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Software has bugs. What can we do to find as many of these as possible? In this talk, I show how to systematically test software by generating such tests automatically, starting with simple random "fuzzing" generators and then proceeding to more effective grammar-based and coverage-guided approaches. Being fully automatic and easy to deploy, such fuzzers run at little cost, yet are very effective in finding bugs: Our own Langfuzz grammar-based test generator for JavaScript runs around the clock for the Firefox, ...
Software has bugs. What can we do to find as many of these as possible? In this talk, I show how to systematically test software by generating such tests automatically, starting with simple random "fuzzing" generators and then proceeding to more effective grammar-based and coverage-guided approaches. Being fully automatic and easy to deploy, such fuzzers run at little cost, yet are very effective in finding bugs: Our own Langfuzz grammar-based test generator for JavaScript runs around the clock for the Firefox, Chrome, and Edge web browsers and so far has found more than 2,600 confirmed bugs. Our latest test generator prototypes are even able to automatically learn the input language of a given program, which allows to generate highly effective tests for arbitrary programs without any particular setup.
Read more

Compiling Dynamical Systems for Efficient Simulation on Reconfigurable Analog Computers

Sara Achour MIT
22 Oct 2018, 10:30 am - 12:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Programmable analog devices are a powerful new computing substrate that are especially appropriate for performing computationally intensive simulations of dynamical systems. Until recently, the de-facto standard for programming these devices required hardware specialists to manually program the analog device to model the dynamical system of interest. In this talk, I will present Arco, a compiler that automatically configures analog devices to perform dynamical system simulation, and Jaunt, a compilation technique that that scales dynamical system parameters to change the speed of the simulation and render the resulting simulation physically realizable given the operating constraints of the analog hardware platform. ...
Programmable analog devices are a powerful new computing substrate that are especially appropriate for performing computationally intensive simulations of dynamical systems. Until recently, the de-facto standard for programming these devices required hardware specialists to manually program the analog device to model the dynamical system of interest. In this talk, I will present Arco, a compiler that automatically configures analog devices to perform dynamical system simulation, and Jaunt, a compilation technique that that scales dynamical system parameters to change the speed of the simulation and render the resulting simulation physically realizable given the operating constraints of the analog hardware platform. These techniques capture the domain knowledge required to fully exploit the capabilities of reconfigurable analog devices, eliminating a key obstacle to the widespread adoption of these devices.
Read more

Justified representation in multiwinner voting: axioms and algorithms

Edith Elkind University of Oxford
19 Oct 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Distinguished Lecture Series
Suppose that a group of voters wants to select k \ge 1 alternatives from a given set, and each voter indicates which of the alternatives are acceptable to her: the alternatives could be conference submissions, applicants for a scholarship or locations for a fast food chain. In this setting it is natural to require that the winning set represents the voters fairly, in the sense that large groups of voters with similar preferences have at least some of their approved alternatives in the winning set. ...
Suppose that a group of voters wants to select k \ge 1 alternatives from a given set, and each voter indicates which of the alternatives are acceptable to her: the alternatives could be conference submissions, applicants for a scholarship or locations for a fast food chain. In this setting it is natural to require that the winning set represents the voters fairly, in the sense that large groups of voters with similar preferences have at least some of their approved alternatives in the winning set. We describe several ways to formalize this idea, and show how to use it to classify voting rules. For one of our axioms, the only known voting rule that satisfies it is not polynomial-time computable, and it was conjectured that no voting rule that satisfies this axiom can be polynomial-time computable; however, we will show that this is not the case.
Read more

AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation

Dana Drachsler Cohen ETH Zurich
15 Oct 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
In this talk, I will present AI2, a sound and scalable analyzer for deep neural networks. Based on overapproximation, AI2 can automatically prove safety properties (e.g., robustness) of realistic neural networks (e.g., convolutional neural networks). The key insight behind AI2 is to phrase reasoning about safety and robustness of neural networks in terms of classic abstract interpretation, enabling to leverage decades of advances in that area. To this end, I will introduce abstract transformers that capture the behavior of fully connected and convolutional neural network layers with rectified linear unit activations (ReLU), ...
In this talk, I will present AI2, a sound and scalable analyzer for deep neural networks. Based on overapproximation, AI2 can automatically prove safety properties (e.g., robustness) of realistic neural networks (e.g., convolutional neural networks). The key insight behind AI2 is to phrase reasoning about safety and robustness of neural networks in terms of classic abstract interpretation, enabling to leverage decades of advances in that area. To this end, I will introduce abstract transformers that capture the behavior of fully connected and convolutional neural network layers with rectified linear unit activations (ReLU), as well as max pooling layers. This allows to handle real-world neural networks, which are often built out of these types of layers. I will also empirically demonstrate that (i) AI2 is precise enough to prove useful specifications (e.g., robustness), (ii) AI2 can be used to certify the effectiveness of state-of-the-art defenses for neural networks, and (iii) AI2 is significantly faster than existing analyzers based on symbolic analysis, which often take hours to verify simple fully connected networks.
Read more

Complexity in Question Answering

Rishiraj Saha Roy MPI-INF - D5
10 Oct 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Answering natural-language questions is a remarkable ability of search engines. However, brittleness in the state-of-the-art is exposed when handling complexity in question formulation. Such complexity has two distinct dimensions: (i) diversity of expressions which convey the same information need, and, (ii) complexity in the information need itself. We propose initial solutions to these challenges: (i) syntactic differences in question formulation can be tackled with a continuous-learning framework that extends template-based answering with semantic similarity and user feedback; ...
Answering natural-language questions is a remarkable ability of search engines. However, brittleness in the state-of-the-art is exposed when handling complexity in question formulation. Such complexity has two distinct dimensions: (i) diversity of expressions which convey the same information need, and, (ii) complexity in the information need itself. We propose initial solutions to these challenges: (i) syntactic differences in question formulation can be tackled with a continuous-learning framework that extends template-based answering with semantic similarity and user feedback; (ii) complexity in information needs can be addressed by stitching pieces of evidence from multiple documents to build a noisy graph, within which answers can be detected using optimal interconnections. The talk will discuss results for these proposals, and conclude with promising open directions in question answering.
Read more

Improving the energy efficiency of virtualized datacenters

Vlad Nitu Toulouse University
21 Sep 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
SWS Colloquium
Energy consumption is an important concern for cloud datacenters. Its cost represents about 80% of the total cost of ownership and it is estimated that in 2020, the US datacenters alone will spend about $13 billion on energy bills. Generally, the servers are manufactured in such a way that they achieve high energy efficiency at high utilizations. Thereby for a low cost per computation all datacenter servers should push the utilization as high as possible. In order to fight the historically low utilization, ...
Energy consumption is an important concern for cloud datacenters. Its cost represents about 80% of the total cost of ownership and it is estimated that in 2020, the US datacenters alone will spend about $13 billion on energy bills. Generally, the servers are manufactured in such a way that they achieve high energy efficiency at high utilizations. Thereby for a low cost per computation all datacenter servers should push the utilization as high as possible. In order to fight the historically low utilization, cloud computing adopted server virtualization. This technology enables a cloud provider to pack (consolidate) the entire set of virtual machines (VMs) on a small set of physical servers and thereby, reduce the number of active servers. Even so, the datacenter servers rarely reach utilizations higher than 50% which means that they operate with a set of long-term unused resources (called 'holes'). Our first contribution is a cloud management system that dynamically splits/fusions VMs such that they can better fill the holes. However the datacenter resource fragmentation has a more fundamental problem. Over time, cloud applications demand more and more memory but the physical servers provide more an more CPU. In nowadays datacenters, the two resources are strongly coupled since they are bounded to a physical sever. Our second contribution is a practical way to decouple the CPU-memory tuple that can simply be applied to a commodity server. The underutilization observed on physical servers is also true for virtual machines. It has been shown that VMs consume only a small fraction of the allocated resources because the cloud customers are not able to correctly estimate the resource amount necessary for their applications. Our third contribution is a system that estimates the memory consumption (i.e. the working set size) of a VM, with low overhead and high accuracy. Thereby, we can now consolidate the VMs on based on their working set size (not the booked memory). However, the drawback of this approach is the risk of memory starvation. If one or multiple VMs have an sharp increase in memory demand, the physical server may run out of memory. This event is undesirable because the cloud platform is unable to provide the client with the memory he paid for. Our fourth contribution is a system that allows a VM to use remote memory provided by a different rack server. Thereby, in the case of a peak memory demand, our system allows the VM to allocate memory on a remote physical server.
Read more

Gradually Typed Symbolic Expressions: an Approach for Developing Embedded Domain-Specific Modeling Languages

David Broman KTH Royal Institute of Technology
13 Sep 2018, 2:30 pm - 3:30 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Embedding a domain-specific language (DSL) in a general purpose host language is an efficient way to develop a new DSL. Various kinds of languages and paradigms can be used as host languages, including object-oriented, functional, statically typed, and dynamically typed variants, all having their pros and cons. For deep embedding, statically typed languages enable early checking and potentially good DSL error messages, instead of reporting runtime errors. Dynamically typed languages, on the other hand, enable flexible transformations, ...
Embedding a domain-specific language (DSL) in a general purpose host language is an efficient way to develop a new DSL. Various kinds of languages and paradigms can be used as host languages, including object-oriented, functional, statically typed, and dynamically typed variants, all having their pros and cons. For deep embedding, statically typed languages enable early checking and potentially good DSL error messages, instead of reporting runtime errors. Dynamically typed languages, on the other hand, enable flexible transformations, thus avoiding extensive boilerplate code. In this talk, I will discuss the concept of gradually typed symbolic expressions that mix static and dynamic typing for symbolic data. The key idea is to combine the strengths of dynamic and static typing in the context of deep embedding of DSLs. Moreover, I will briefly outline an ongoing research effort of developing a new framework for developing heterogenous domain-specific languages.
Read more

Timed C: An Extension to the C Programming Language for Real-Time Systems

Saranya Natarajan KTH Royal Institute of Technology
12 Sep 2018, 3:30 pm - 4:30 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbrücken building E1 5, room 422
SWS Colloquium
The design and implementation of real-time systems require that both the logical and the temporal behaviour are correct. There exist several specialized languages and tools that use the notion of logical time, as well as industrial strength languages such as Ada and RTJS that incorporate direct handling of real time. Although these languages and tools have shown to be good alternatives for safety-critical systems, most commodity real-time and embedded systems are today implemented in the standard C programming language. ...
The design and implementation of real-time systems require that both the logical and the temporal behaviour are correct. There exist several specialized languages and tools that use the notion of logical time, as well as industrial strength languages such as Ada and RTJS that incorporate direct handling of real time. Although these languages and tools have shown to be good alternatives for safety-critical systems, most commodity real-time and embedded systems are today implemented in the standard C programming language. Such systems are typically targeting proprietary bare-metal platforms, standard POSIX compliant platforms, or open-source operating systems. It is, however, error prone to develop large, reliable, and portable systems based on these APIs. In this talk, I will talk about an extension to the C programming language, called Timed C, with a minimal set of language primitives, and show how a retargetable source-to-source compiler can be used to compile and execute simple, expressive, and portable programs
Read more

PROSA: A Foundation for Verified Schedulability Analysis

Felipe Augusto Queiroz de Cerqueira Max Planck Institute for Software Systems
10 Sep 2018, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 422
SWS Student Defense Talks - Thesis Proposal
Real-time systems comprise the class of systems that are subject to physical timing constraints, i.e., in order to be considered sound, they must be checked for both functional and temporal correctness. Regarding the latter, apart from choosing appropriate scheduling policies to be applied at runtime, system designers must confirm that the temporal requirements of the system are met by using methods known as schedulability analyses.

As opposed to the more measurement-based, reactive resource provisioning in general-purpose systems, ...
Real-time systems comprise the class of systems that are subject to physical timing constraints, i.e., in order to be considered sound, they must be checked for both functional and temporal correctness. Regarding the latter, apart from choosing appropriate scheduling policies to be applied at runtime, system designers must confirm that the temporal requirements of the system are met by using methods known as schedulability analyses.

As opposed to the more measurement-based, reactive resource provisioning in general-purpose systems, schedulability analyses rely on statically deriving properties that characterize the temporal worst-case behavior of the system, and thus heavily rely on mathematical foundations.

In recent years, however, the trust in those foundations has been put to challenge, since a considerable number of published analyses were found to be flawed due to incorrect pen&paper proofs. Instead of attributing those issues solely to human error, we argue that there is a fundamental problem to be addressed in real-time scheduling theory: system models and analyses have become increasingly complex and difficult to verify, placing a heavy burden on authors and reviewers. Since many real-time systems target safety-critical applications, it is crucial for such systems to be grounded on more robust mathematical foundations.

To address this lack of trust in state-of-the-art schedulability analyses, I will present Prosa, a long-term project for the formalization and verification of real-time schedulability analysis, which aims to provide the theoretical support for the next generation of certified real-time systems. To ensure full correctness of the proofs, Prosa is written in and mechanically verified with Coq, a powerful and mature proof assistant with support for higher-order logic and computations.
Read more

HAMS: Harnessing AutoMobiles for Safety

Venkat Padmanabhan Microsoft Research India
24 Aug 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Distinguished Lecture Series
Road safety is a major public health issue, with road accidents accounting for an estimated 1.25 million fatalities, and many more injuries, the world over, each year. The problem is particularly acute in India, with nearly a quarter of a million fatalities every year, i.e., 20% of the world’s total. Besides the heavy human cost, road accidents also impose a significant economic cost. The crux of the problem is that the major factors impacting safety — vehicles, ...
Road safety is a major public health issue, with road accidents accounting for an estimated 1.25 million fatalities, and many more injuries, the world over, each year. The problem is particularly acute in India, with nearly a quarter of a million fatalities every year, i.e., 20% of the world’s total. Besides the heavy human cost, road accidents also impose a significant economic cost. The crux of the problem is that the major factors impacting safety — vehicles, roads, and people — have virtually no ongoing monitoring today. In the HAMS project at Microsoft Research India, we employ a dashboard-mounted smartphone, and the array of sensors it includes, as a virtual harness, with a view to monitoring drivers and their driving. We focus primarily on the camera sensors of the smartphone, both the front camera, which faces the driver, and the back camera, which looks out to the front of the vehicle. We address the challenges arising from our choice of low-cost generic sensors instead of more expensive specialized sensors, the need for efficient processing on a smartphone, and demands of robust operation in uncontrolled environments. HAMS has been piloted as part of a driver training program, with promising early results.
Read more

Not Your Typical Objects: Made from Raw Materials Augmented with Sensing and Computation

Phillip Stanley-Marbell University of Cambridge
21 Aug 2018, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Materials and manufacturing processes can be made to adapt to the way in which their end-products are used, by imbuing raw materials with sensing and computation elements. By making the algorithms that run on these computing elements aware of the physics of the objects in which they are embedded, computation-and-sensing augmented materials could change the way we think about the border between inanimate objects and computing systems. One way to exploit knowledge about the physical world in the programs running on (or in) these objects is to exploit the fact that these programs can often tolerate noise and other deviations from correctness in their input data. ...
Materials and manufacturing processes can be made to adapt to the way in which their end-products are used, by imbuing raw materials with sensing and computation elements. By making the algorithms that run on these computing elements aware of the physics of the objects in which they are embedded, computation-and-sensing augmented materials could change the way we think about the border between inanimate objects and computing systems. One way to exploit knowledge about the physical world in the programs running on (or in) these objects is to exploit the fact that these programs can often tolerate noise and other deviations from correctness in their input data. This talk will highlight research that builds on these observations.
Read more

Computing with Simple Dynamics and Biological Applications

Emanuele Natale MPI-INF - D1
01 Aug 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In this talk, I will present my work on simple distributed protocols for fundamental coordination problems such as plurality and valid consensus, rumor spreading and distributed clustering in stochastic models of interaction. I will discuss applications in the research area of Biological Distributed Algorithms, which aims to understand through the algorithmic lens, the collective behavior of biological systems, such as social insects. This setting has recently motivated the study of some interesting new problems, such as the investigation of systems where communication is affected by noise in a way which cannot be handled by error correction codes. ...
In this talk, I will present my work on simple distributed protocols for fundamental coordination problems such as plurality and valid consensus, rumor spreading and distributed clustering in stochastic models of interaction. I will discuss applications in the research area of Biological Distributed Algorithms, which aims to understand through the algorithmic lens, the collective behavior of biological systems, such as social insects. This setting has recently motivated the study of some interesting new problems, such as the investigation of systems where communication is affected by noise in a way which cannot be handled by error correction codes. More recently, as a fellow of the Simons Institute for the Theory of Computing, I have been working at the interface between Theoretical Computer Science and Computational Neuroscience. I will provide a brief overview of some research topics in this direction.
Read more

Practical Program Analysis

Maria Christakis Max Planck Institute for Software Systems
04 Jul 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Sound static analysis over-approximates the set of all possible executions in a given program in order to prove the absence of errors in the program. Due to this over-approximation, sound static analysis may generate spurious warnings about executions that are not wrong or even possible in the program. To become more practical, many static analyzers give up soundness by design. This means that they do not check certain properties or that they check them under certain unsound assumptions, ...
Sound static analysis over-approximates the set of all possible executions in a given program in order to prove the absence of errors in the program. Due to this over-approximation, sound static analysis may generate spurious warnings about executions that are not wrong or even possible in the program. To become more practical, many static analyzers give up soundness by design. This means that they do not check certain properties or that they check them under certain unsound assumptions, such as the absence of arithmetic overflow. At the other end of the soundness spectrum, we have automatic test-case generation, which typically under-approximates the set of possible program executions. The goal of test-case generation is not to prove the absence of errors in the program but, rather, their existence. In this talk, I will present an overview of my research on combining these program analysis techniques to improve their overall automation, performance, and accuracy on real programs.
Read more

On the expressive power of user-defined effects: Effect handlers, monadic reflection, delimited continuations

Sam Lindley University of Edinburgh
29 Jun 2018, 2:00 pm - 3:30 pm
Saarbrücken building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
We compare the expressive power of three programming abstractions for user-defined computational effects: Plotkin and Pretnar’s effect handlers, Filinski’s monadic reflection, and delimited control. This comparison allows a precise discussion about the relative expressiveness of each programming abstraction. It also demonstrates the sensitivity of the relative expressiveness of user-defined effects to seemingly orthogonal language features.

We present each notion as an extension of a simply-typed core lambda-calculus with an effect type system. Using Felleisen’s notion of a macro translation, ...
We compare the expressive power of three programming abstractions for user-defined computational effects: Plotkin and Pretnar’s effect handlers, Filinski’s monadic reflection, and delimited control. This comparison allows a precise discussion about the relative expressiveness of each programming abstraction. It also demonstrates the sensitivity of the relative expressiveness of user-defined effects to seemingly orthogonal language features.

We present each notion as an extension of a simply-typed core lambda-calculus with an effect type system. Using Felleisen’s notion of a macro translation, we show that these abstractions can macro-express each other, providing we disregard types. Alas, not all of the translations are type-preserving; moreover, no alternative type-preserving macro translations exist. We show that if we add suitable notions of polymorphism to the core calculus and its extensions then all of the translations can be adapted to preserve typing.

(based on joint work with Yannick Forster, Ohad Kammar, and Matija Pretnar)
Read more

Program Invariants

Joël Ouaknine Max Planck Institute for Software Systems
29 Jun 2018, 11:30 am - 12:30 pm
Saarbrücken building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Automated invariant generation is a fundamental challenge in program analysis and verification, going back many decades, and remains a topic of active research. In this talk I'll present a select overview and survey of work on this problem, and discuss unexpected connections to other fields including algebraic geometry, group theory, and quantum computing. (No previous knowledge of these fields will be assumed.) This is joint work with Ehud Hrushovski, Amaury Pouly, and James Worrell.

Designing a System for Heterogeneous Compute Units

Nils Asmussen TU Dresden
29 Jun 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
SWS Colloquium
The ongoing trend to more heterogeneity forces us to rethink the design of systems. In this talk, I will present a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling 1) isolation and secure communication between all types of CUs, 2) a direct interaction of all CUs to remove the conventional CPU from the critical path, ...
The ongoing trend to more heterogeneity forces us to rethink the design of systems. In this talk, I will present a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling 1) isolation and secure communication between all types of CUs, 2) a direct interaction of all CUs to remove the conventional CPU from the critical path, and 3) access to OS services such as file systems and network stacks for all CUs. To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. In my talk, I will show how this approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores.
Read more

Boosting human capabilities on perceptual categorization tasks

Michael Mozer University of Colorado, Boulder
26 Jun 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Distinguished Lecture Series
We are developing methods to improve human learning and performance on challenging perceptual categorization tasks, e.g., bird species identification, diagnostic dermatology. Our approach involves inferring psychological embeddings -- internal representations that individuals use to reason about a domain. Using predictive cognitive models that operate on an embedding, we perform surrogate-based optimization to determine efficient and effective means of training domain novices as well as amplifying an individual's capabilities at any stage of training. Our cognitive models leverage psychological theories of: similarity judgement and generalization, ...
We are developing methods to improve human learning and performance on challenging perceptual categorization tasks, e.g., bird species identification, diagnostic dermatology. Our approach involves inferring psychological embeddings -- internal representations that individuals use to reason about a domain. Using predictive cognitive models that operate on an embedding, we perform surrogate-based optimization to determine efficient and effective means of training domain novices as well as amplifying an individual's capabilities at any stage of training. Our cognitive models leverage psychological theories of: similarity judgement and generalization, contextual and sequential effects in choice, attention shifts among embedding dimensions.  Rather than searching over all possible training policies, we focus our search on policy spaces motivated by the training literature, including manipulation of exemplar difficulty and the sequencing of category labels. We show that our models predict human behavior not only in the aggregate but at the level of individual learners and individual exemplars, and preliminary experiments show the benefits of surrogate-based optimization on learning and perform ance.

This work was performed in close collaboration with Brett Roads at University College London.
Read more

Heap-based reasoning about asynchronous programs

Johannes Kloos Max Planck Institute for Software Systems
14 Jun 2018, 5:00 pm - 6:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Asynchronous concurrency is a wide-spread way of writing programs that deal with many short tasks. It is the programming model behind event-driven concurrency, as exemplified by GUI applications, where the tasks correspond to event handlers, web applications based around JavaScript, the implementation of web browsers, but also of server-side software or operating systems.

This model is widely used because it provides the performance benefits of concurrency together with easier programming than multi-threading. While there is ample work on how to implement asynchronous programs, ...
Asynchronous concurrency is a wide-spread way of writing programs that deal with many short tasks. It is the programming model behind event-driven concurrency, as exemplified by GUI applications, where the tasks correspond to event handlers, web applications based around JavaScript, the implementation of web browsers, but also of server-side software or operating systems.

This model is widely used because it provides the performance benefits of concurrency together with easier programming than multi-threading. While there is ample work on how to implement asynchronous programs, and significant work on testing and model checking, little research has been done on handling asynchronous programs that involve heap manipulation, nor on how to automatically optimize code for asynchronous concurrency.

This thesis addresses the question of how we can reason about asynchronous programs while considering the heap, and how to use this to optimize programs. The work is organized along the main questions:

(i) How can we reason about asynchronous programs, without ignoring the heap? (ii) How can we use such reasoning techniques to optimize programs involving asynchronous behavior? (iii) How can we transfer these reasoning and optimization techniques to other settings?

The unifying idea behind all the results in the thesis is the use of an appropriate model encompassing global state and a promise-based model of asynchronous concurrency. For the first question, we start from refinement type systems for sequential programs and extend them to perform precise resource-based reasoning in terms of heap contents, known outstanding tasks and promises. This extended type system is known as Asynchronous Liquid Separation Types, or ALST for short. We implement ALST in for OCaml programs using the Lwt library.

For the second question, we consider a family of possible program optimizations, described by a set of rewriting rules, the DWFM rules. The rewriting rules are type-driven: We only guarantee soundness for programs that are well-typed under ALST. We give a soundness proof based on a semantic interpretation of ALST that allows us to show behavior inclusion of pairs of programs.

For the third question, we address an optimization problem from industrial practice: Normally, JavaScript files that are referenced in an HTML file are be loaded synchronously, i.e., when a script tag is encountered, the browser must suspend parsing, then load and execute the script, and only after will it continue parsing HTML. But in practice, there are numerous JavaScript files for which asynchronous loading would be perfectly sound. First, we sketch a hypothetical optimization using the DWFM rules and a static analysis.

To actually implement the analysis, we modify the approach to use a dynamic analysis. This analysis, known as JSDefer, enables us to analyze real-world web pages, and provide experimental evidence for the efficiency of this transformation.
Read more

Learning-Based Hardware/Software Power and Performance Prediction

Andreas Gerstlauer University of Texas at Austin
11 Jun 2018, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 105
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Next to performance, early power and energy estimation is a key challenge in the design of computer systems. Traditional simulation-based methods are often too slow while existing analytical models are often not sufficiently accurate. In this talk, I will present our work on bridging this gap by providing fast yet accurate alternatives for power and performance modeling of software and hardware. In the past, we have pioneered so-called source-level and host-compiled simulation techniques that are based on back-annotation of source code with semi-analytically estimated target metrics. ...
Next to performance, early power and energy estimation is a key challenge in the design of computer systems. Traditional simulation-based methods are often too slow while existing analytical models are often not sufficiently accurate. In this talk, I will present our work on bridging this gap by providing fast yet accurate alternatives for power and performance modeling of software and hardware. In the past, we have pioneered so-called source-level and host-compiled simulation techniques that are based on back-annotation of source code with semi-analytically estimated target metrics. More recently, we have studied alternative approaches in which we employ advanced machine learning techniques to synthesize analytical proxy models that can extract latent correlations and accurately predict time-varying power and performance of an application running on a target platform purely from data obtained while executing the application natively on a completely different host machine. We have developed such learning-based approaches for both hardware and software. On the hardware side, learning-based models for white-box and black-box hardware accelerators reach simulation speeds of 1 Mcycles/s at 97% accuracy. On the software side, depending on the granularity at which prediction is performed, cross-platform prediction can achieve more than 95% accuracy at more than 3 GIPS ofequivalent simulation throughput.
Read more

Compositional Compiler Correctness Via Parametric Simulations

Georg Neis Max Planck Institute for Software Systems
07 Jun 2018, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Compiler verification is essential for the construction of fully verified software, but most prior work (such as CompCert) has focused on verifying whole-program compilers. To support separate compilation and to enable linking of results from different verified compilers, it is important to develop a compositional notion of compiler correctness that is modular (preserved under linking), transitive (supports multi-pass compilation), and flexible (applicable to compilers that use different intermediate languages or employ non-standard program transformations). In this thesis, ...
Compiler verification is essential for the construction of fully verified software, but most prior work (such as CompCert) has focused on verifying whole-program compilers. To support separate compilation and to enable linking of results from different verified compilers, it is important to develop a compositional notion of compiler correctness that is modular (preserved under linking), transitive (supports multi-pass compilation), and flexible (applicable to compilers that use different intermediate languages or employ non-standard program transformations). In this thesis, we formalize such a notion of correctness based on parametric simulations, thus developing a novel approach to compositional compiler verification.
Read more

The Internet: A Complex System at its Limits

Anja Feldmann MPI-INF - INET
06 Jun 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
tbd

Accountability in the Governance of Machine Learning

Joshua Kroll School of Information, University of California, Berkeley
07 May 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
As software systems, especially those based on machine learning and data analysis, become ever more deeply engrained in modern society and take on increasingly powerful roles in shaping people's lives, concerns have been raised about the fairness, equity, and other embedded values of these systems. Many definitions of "fairness" have been proposed, and the technical definitions capture a variety of desirable statistical invariants. However, such invariants may not address fairness for all stakeholders, may be in tension with each other or other desirable properties, ...
As software systems, especially those based on machine learning and data analysis, become ever more deeply engrained in modern society and take on increasingly powerful roles in shaping people's lives, concerns have been raised about the fairness, equity, and other embedded values of these systems. Many definitions of "fairness" have been proposed, and the technical definitions capture a variety of desirable statistical invariants. However, such invariants may not address fairness for all stakeholders, may be in tension with each other or other desirable properties, and may not be recognized by people as capturing the correct notion of fairness. In addition, requirements that serve fairness, in practice, often are enacted by prohibiting a set of practices considered unfair rather than fully modeling a particular definition of fairness.

For these reasons, we attack the goal of producing fair systems from a different starting point. We argue that a focus on accountability and transparency in the design of a computer system is a stronger basis for reasoning about fairness. We outline a research agenda in responsible system design based on this approach, attacking both technical and non-technical open questions. Technology can help realize human values - including fairness - in computer systems, but only if it is supported by appropriate organizational best practices and a new approach to the system design life cycle.

As a first step toward realizing this agenda, we present a cryptographic protocol for accountable algorithms, which uses a combination of commitments and zero-knowledge proofs to construct audit logs for automated decision-making systems that are publicly verifiable for integrity. Such logs comprise an integral record of the behavior of a computer system, providing evidence for future interrogation, oversight, and review while also providing immediate public assurance of important procedural regularity properties, such as the guarantee that all decisions were made under the same policy. Further, the existence of such evidence provides a strong incentive for the system's designers to capture the right human values by making deviations from those values apparent and undeniable. Finally, we describe how such logs can be extended to demonstrate the existence of key fairness and transparency properties in machine-learning settings. For example, we consider how to demonstrate that a model was trained on particular data, that it operates without considering particular sensitive inputs, or that it satisfies particular fairness invariants of the type considered in the machine-learning fairness literature. This approach leads to a better, more complete, and more flexible outcome from the perspective of preventing unfairness.
Read more

#DebateNight :The Role and Influence of Socialbots on Twitter During the 1st U.S. Presidential Debate

Marian-Andrei Rizoiu Australian National University
03 May 2018, 11:15 am - 12:15 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Serious concerns have been raised about the role of `socialbots' in manipulating public opinion and influencing the outcome of elections by retweeting partisan content to increase its reach. Here we analyze the role and influence of socialbots on Twitter by determining how they contribute to retweet diffusions. We collect a large dataset of tweets during the 1st U.S. presidential debate in 2016 and we analyze its 1.5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness. ...
Serious concerns have been raised about the role of `socialbots' in manipulating public opinion and influencing the outcome of elections by retweeting partisan content to increase its reach. Here we analyze the role and influence of socialbots on Twitter by determining how they contribute to retweet diffusions. We collect a large dataset of tweets during the 1st U.S. presidential debate in 2016 and we analyze its 1.5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness. First, we define a measure of user influence based on the user's active contributions to information diffusions, i.e. their tweets and retweets. Given that Twitter does not expose the retweet structure -- it associates all retweets with the original tweet -- we model the latent diffusion structure using only tweet time and user features, and we implement a scalable novel approach to estimate influence over all possible unfoldings. Next, we use partisan hashtag analysis to quantify user political polarization and engagement. Finally, we use the BotOrNot API to measure user botness (the likelihood of being a bot). We build a two-dimensional "polarization map" that allows for a nuanced analysis of the interplay between botness, partisanship and influence. We find that not only are socialbots more active on Twitter -- starting more retweet cascades and retweeting more -- but they are 2.5 times more influential than humans, and more politically engaged. Moreover, pro-Republican bots are both more influential and more politically engaged than their pro-Democrat counterparts. However we caution against blanket statements that software designed to appear human dominates politics-related activity on Twitter. Firstly, it is known that accounts controlled by teams of humans (e.g. organizational accounts) are often identified as bots. Secondly, we find that many highly influential Twitter users are in fact pro-Democrat and that most pro-Republican users are mid-influential and likely to be human (low botness).
Read more

Capturing and Learning Digital Humans

Gerard Pons-Moll MPI-INF - D2
02 May 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
The world is shifting towards a digitization of everything -- music, books, movies and news in digital form are common in our everyday lives. Digitizing human beings would redefine the way we think and communicate (with other humans and with machines), and it is necessary for many applications -- for example, to transport people into virtual and augmented reality, for entertainment and special effects in movies, and for medicine and psychology. Currently, digital people models typically lack realism or require time-consuming manual editing of physical simulation parameters. ...
The world is shifting towards a digitization of everything -- music, books, movies and news in digital form are common in our everyday lives. Digitizing human beings would redefine the way we think and communicate (with other humans and with machines), and it is necessary for many applications -- for example, to transport people into virtual and augmented reality, for entertainment and special effects in movies, and for medicine and psychology. Currently, digital people models typically lack realism or require time-consuming manual editing of physical simulation parameters. Our hypothesis is that better and more realistic models of humans and clothing can be learned directly by capturing real people using 4D scans, images, and depth and inertial sensors. Combining statistical machine learning techniques and geometric optimization, we create realistic statistical models from the captured data. To be able to digitize people from low-cost ubiquitous sensors (RGB cameras, depth or small number of wearable inertial sensors), we leverage the learned statistical models -- which are robust to noise and missing data.
Read more

Probabilistic Program Equivalence for NetKAT

Alexandra Silva University College, London
02 May 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
We tackle the problem of deciding whether two probabilistic programs are equivalent in the context of Probabilistic NetKAT, a formal language for reasoning about the behavior of packet-switched networks. The main challenge lies in reasoning about iteration, which we address by a reduction to finite-state absorbing Markov chains. Building on this approach, we develop an effective decision procedure based on stochastic matrices. Through an extended case study with a real-world data center network, we show how to use these techniques to automatically verify various properties of interests, ...
We tackle the problem of deciding whether two probabilistic programs are equivalent in the context of Probabilistic NetKAT, a formal language for reasoning about the behavior of packet-switched networks. The main challenge lies in reasoning about iteration, which we address by a reduction to finite-state absorbing Markov chains. Building on this approach, we develop an effective decision procedure based on stochastic matrices. Through an extended case study with a real-world data center network, we show how to use these techniques to automatically verify various properties of interests, including resilience in the presence of failures. This is joint work with Steffen Smolka, Praveen Kumar, Nate Foster, Justin Hsu, David Kahn, and Dexter Kozen.
Read more

Measurements, predictions, and the puzzle of machine learning: what data from 10 million hosts can teach us about security

Tudor Dumitras University of Maryland
20 Apr 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
What are the odds that you will get hacked tomorrow? To answer this question, it is not enough to reason about the state of your host -- we must also understand how easy it is for adversaries to exploit software vulnerabilities and what helps them distribute malware around the world. Moreover, the machine learning techniques that drive the success of such prediction tasks in non-adversarial domains, like computer vision or autonomous driving, face new challenges in security.  ...
What are the odds that you will get hacked tomorrow? To answer this question, it is not enough to reason about the state of your host -- we must also understand how easy it is for adversaries to exploit software vulnerabilities and what helps them distribute malware around the world. Moreover, the machine learning techniques that drive the success of such prediction tasks in non-adversarial domains, like computer vision or autonomous driving, face new challenges in security. 

In this talk I will discuss my work, combining machine learning with global-scale measurements, that has exposed critical security threats and has guided industrial practices. First, I will present the Worldwide Intelligence Network Environment (WINE), an analytics platform that has enabled systematic security measurements across more than 10 million hosts from around the world. Second, I will use WINE as a vehicle for exploring open research questions, such as the duration and impact of zero-day attacks, the weaknesses in public key infrastructures (PKIs) that allow malware to masquerade as reputable software, and how we can use machine learning to predict certain security incidents. I will conclude by discussing the impact of these predictions on the emerging cyber insurance industry and the lessons we learned about using machine learning in the security domain.
Read more

Power to the People. Verified.

Holger Hermanns Fachrichtung Informatik - Saarbrücken
11 Apr 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Twenty years ago we were able to repair cars at home. Nowadays customer services repair coffee machines by installing software updates. Soon you will no longer be able to repair your bike.

Embedded software innovations boost our society; they help us tremendously in our daily life. But we do not understand what the software embedded therein actually does, regardless of how well educated or smart we are. Proprietary embedded software has become an opaque layer between functionality and user. ...
Twenty years ago we were able to repair cars at home. Nowadays customer services repair coffee machines by installing software updates. Soon you will no longer be able to repair your bike.

Embedded software innovations boost our society; they help us tremendously in our daily life. But we do not understand what the software embedded therein actually does, regardless of how well educated or smart we are. Proprietary embedded software has become an opaque layer between functionality and user. That layer is thick enough to possibly induce malicious or unintended behaviour, as it happened massively in our diesel cars. From the outside, there is no understanding of how decisions are made inside and across these software-driven black boxes. The outcomes are often not designed to be accessible, verified or evaluated by humans, limiting our ability to identify if, when, where, and why the software produced harm — and worse still — redress this harm. Proprietary embedded software locks us out of the products we own.

The presentation of Holger Hermanns will sketch the main cornerstones of a research agenda which targets the foundations of open and hence customisable embedded software. A minor customisation might well have strong unexpected impact, for instance on the longevity of an embedded battery, or the safety of the battery charging process. Means to detect, quantify and prevent such implications are needed. Those are delivered by quantitative verification technology for system-level correctness, safety, dependability and performability. Hermanns will link foundational achievements to concrete results in the area of power management for electric mobility and for satellite operation. Electric power is intricate to handle by software, is safety-critical, but vital for mobile devices and their longevity. Since ever more tools, gadgets, and vehicles run on batteries and use power harvesting, electric power management is itself a pivot of the future.
Read more

Lovasz meets Weisfeiler-Leman

Prof. Martin Grohe RWTH Aachen University
04 Apr 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
SWS Colloquium
I will speak about an unexpected correspondence between a beautiful theory, due to Lovasz, about homomorphisms and graph limits and a popular heuristic for the graph isomorphism problem known as the Weisfeiler-Leman algorithm. I will also relate this to graph kernels in machine learning. Indeed, the context of this work is to desgin and understand similarity measures between graphs and discrete structures.

Failures-In-Time (FIT) Analysis for Fault-Tolerant Distributed Real-Time Systems

Arpan Gujarati Max Planck Institute for Software Systems
28 Mar 2018, 4:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Distributed real-time (DiRT) systems are widely deployed in contemporary cyber-physical systems (CPS). Many of these systems are safety-critical, since their failure or malfunction can result in death or serious injuries to the people and/or severe damage to the environment involved, e.g., human spaceflight vehicles, surgical robots, air traffic and nuclear reactor control systems, drive-by-wire and fly-by-wire systems, railway signaling systems, etc.

Safety-certification standards mandate that the failure rate of safety-critical systems in the presence of any unpreventable and intolerable errors due to environmentally-induced transient faults (such as due to electromagnetic, ...
Distributed real-time (DiRT) systems are widely deployed in contemporary cyber-physical systems (CPS). Many of these systems are safety-critical, since their failure or malfunction can result in death or serious injuries to the people and/or severe damage to the environment involved, e.g., human spaceflight vehicles, surgical robots, air traffic and nuclear reactor control systems, drive-by-wire and fly-by-wire systems, railway signaling systems, etc.

Safety-certification standards mandate that the failure rate of safety-critical systems in the presence of any unpreventable and intolerable errors due to environmentally-induced transient faults (such as due to electromagnetic, thermal, and radiation sources) must be under a certain threshold.

In this regard, prior work on the reliability analysis of DiRTs in the presence of environmentally-induced transient faults does not target all possible error scenarios (such as Byzantine errors). This is mainly because the likelihood of a complex error scenario is extremely low and/or because the workloads for safety-critical systems have traditionally been simple, with sufficient slack to tolerate fault-induced failures and with mechanical backups to tolerate complete software failures.

However, a majority of CPS devices are expected to be fully autonomous in future, thus requiring stronger reliability guarantees with fail-operational semantics. In addition, since the workloads used for safety-critical systems are becoming more and more complex (e.g., deep learning neural networks are being used in self-driving cars) and since there is a push towards the use of cheaper community hardware, the likelihood of complex Byzantine errors is going to increase. Therefore, it is imperative that we revisit the existing techniques for analyzing and building safety-critical DiRTs.

To address this issue, we propose analyses to derive a safe upper-bound on the failure rates of safety-critical DiRTs in the presence of Byzantine errors due to environmentally-induced transient faults. We focus on DiRTs based on Controller Area Network that are commonly used in today's cyber-physical systems, and on Ethernet-based DiRTs that are expected to be at the core of next-generation cyber-physical systems.
Read more

High-Throughput and Predictable VM Scheduling for High-Density Workloads

Manohar Vanga Max Planck Institute for Software Systems
26 Mar 2018, 3:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
In the increasingly competitive public-cloud marketplace, improving the efficiency of data centers is a major concern. One way to improve efficiency is to consolidate as many virtual machines (VMs) onto as few physical cores as possible, provided that customers' performance expectations are not violated. However, as a prerequisite for supporting increased VM densities, the hypervisor’s VM scheduler must allocate processor time efficiently and in a timely fashion. Unfortunately, we show that contemporary VM schedulers leave substantial room for improvements in both regards when facing challenging high-VM-density workloads that frequently trigger the VM scheduler. ...
In the increasingly competitive public-cloud marketplace, improving the efficiency of data centers is a major concern. One way to improve efficiency is to consolidate as many virtual machines (VMs) onto as few physical cores as possible, provided that customers' performance expectations are not violated. However, as a prerequisite for supporting increased VM densities, the hypervisor’s VM scheduler must allocate processor time efficiently and in a timely fashion. Unfortunately, we show that contemporary VM schedulers leave substantial room for improvements in both regards when facing challenging high-VM-density workloads that frequently trigger the VM scheduler.

We identify the root causes of this inability to support high-density VM scenarios to be (i) high runtime overheads and (ii) unpredictable scheduling heuristics. To better support high VM densities, we propose Tableau, a VM scheduler that guarantees a minimum processor share and a maximum bound on scheduling delay for every VM in the system. Tableau achieves this by combining a low-overhead, core-local, table-driven dispatcher within the hypervisor with a fast on-demand table-generation procedure (triggered asynchronously upon VM creation and teardown) that employs scheduling techniques typically used in hard real-time systems.

In an evaluation comparing Tableau against three current Xen schedulers on a 16-core Intel Xeon machine, Tableau is shown to improve both tail latency (e.g., a 17x reduction in maximum ping latency compared to Credit, Xen's default scheduler) and throughput (e.g., 1.6x peak web server throughput compared to Xen's real-time RTDS scheduler when serving 1 KiB files with a 100 ms SLA). While Tableau solves one piece of the unpredictability puzzle, namely the VM scheduler, there are other sources of unpredictability that arise in a shared, high-density setting. We therefore propose extensions of Tableau to deal with two other major sources of unpredictability: LLC interference caused by other VMs co-located on the same CPU socket, and delays that arise due to I/O scheduling.
Read more

Data Science for Human Well-being

Tim Althoff Stanford University
26 Mar 2018, 10:30 am - 11:30 am
Saarbrücken building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
The popularity of wearable and mobile devices, including smartphones and smartwatches, has generated an explosion of detailed behavioral data. These massive digital traces provide us with an unparalleled opportunity to realize new types of scientific approaches that provide novel insights about our lives, health, and happiness. However, gaining valuable insights from these data requires new computational approaches that turn observational, scientifically "weak" data into strong scientific results and can computationally test domain theories at scale. In this talk, ...
The popularity of wearable and mobile devices, including smartphones and smartwatches, has generated an explosion of detailed behavioral data. These massive digital traces provide us with an unparalleled opportunity to realize new types of scientific approaches that provide novel insights about our lives, health, and happiness. However, gaining valuable insights from these data requires new computational approaches that turn observational, scientifically "weak" data into strong scientific results and can computationally test domain theories at scale. In this talk, I will describe novel computational methods that leverage digital activity traces at the scale of billions of actions taken by millions of people. These methods combine insights from data mining, social network analysis, and natural language processing to generate actionable insights about our physical and mental well-being. Specifically, I will describe how massive digital activity traces reveal unknown health inequality around the world, and how personalized predictive models can target personalized interventions to combat this inequality. I will demonstrate that modelling how fast we are using search engines enables new types of insights into sleep and cognitive performance. Further, I will describe how natural language processing methods can help improve counseling services for millions of people in crisis. I will conclude the talk by sketching interesting future directions for computational approaches that leverage digital activity traces to better understand and improve human well-being.
Read more

CANCELLED: Human-Centric Tools for Software Maintenance

Austin Henley University of Memphis
22 Mar 2018, 1:30 pm - 2:30 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
All software failures are fundamentally the fault of humans--either the design was ill-formed or the implementation contained a bug. By designing more human-centric development tools, developers can produce higher quality code in less time with lower cognitive load. In this talk, I will present tools for supporting two common activities in software maintenance: code navigation and code reviewing. Since navigating code is known to be time consuming and problematic, I designed Patchworks, a novel code editor interface that enables efficient juxtaposition and rapid scanning of code. ...
All software failures are fundamentally the fault of humans--either the design was ill-formed or the implementation contained a bug. By designing more human-centric development tools, developers can produce higher quality code in less time with lower cognitive load. In this talk, I will present tools for supporting two common activities in software maintenance: code navigation and code reviewing. Since navigating code is known to be time consuming and problematic, I designed Patchworks, a novel code editor interface that enables efficient juxtaposition and rapid scanning of code. Two empirical studies found that developers using Patchworks navigated faster and made fewer navigation mistakes than with traditional code editors. Once developers have made changes to the code, other developers must review and discuss the changes to ensure the code is of sufficient quality. To aid developers in this tedious task of reviewing code, I designed CFar, an automated code reviewer that provides feedback using program analysis. A laboratory study and field deployment involving 98 developers at Microsoft found that using CFar increased communication, productivity, and review quality.
Read more

Characterizing the Space of Adversarial Examples in Machine Learning

Nicolas Papernot Pennsylvania State University
22 Mar 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
There is growing recognition that machine learning (ML) exposes new security and privacy vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited but expanding. In this talk, I explore the threat model space of ML algorithms, and systematically explore the vulnerabilities resulting from the poor generalization of ML models when they are presented with inputs manipulated by adversaries. This characterization of the threat space prompts an investigation of defenses that exploit the lack of reliable confidence estimates for predictions made. ...
There is growing recognition that machine learning (ML) exposes new security and privacy vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited but expanding. In this talk, I explore the threat model space of ML algorithms, and systematically explore the vulnerabilities resulting from the poor generalization of ML models when they are presented with inputs manipulated by adversaries. This characterization of the threat space prompts an investigation of defenses that exploit the lack of reliable confidence estimates for predictions made. In particular, we introduce a promising new approach to defensive measures tailored to the structure of deep learning. Through this research, we expose connections between the resilience of ML to adversaries, model interpretability, and training data privacy.
Read more

Catch-22: Isolation and Efficiency in Datacenters

Mihir Nanavati UBC
19 Mar 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
The datacenters behind popular cloud services are extremely resource-dense. A typical deployment has thousands of cores, terabytes of memory, gigabits of bandwidth, and petabytes of storage available per-rack. Datacenter economics require that providers share these resources across multiple customers for efficient utilization and as a means of providing price-competitive offerings. Shared infrastructure, however, risks cross-client interference and can result in degraded performance or data leaks, leading to outages and breaches. My work explores this tension with systems that provide security and performance isolation on shared hardware, ...
The datacenters behind popular cloud services are extremely resource-dense. A typical deployment has thousands of cores, terabytes of memory, gigabits of bandwidth, and petabytes of storage available per-rack. Datacenter economics require that providers share these resources across multiple customers for efficient utilization and as a means of providing price-competitive offerings. Shared infrastructure, however, risks cross-client interference and can result in degraded performance or data leaks, leading to outages and breaches. My work explores this tension with systems that provide security and performance isolation on shared hardware, while enabling efficient utilization and preserving the underlying performance of the devices.

This talk describes two such systems dealing with different resources: the first, Plastic, transparently mitigates poor scalability on multi-core systems caused by insufficient cache line isolation, which results in unnecessary memory contention and wasted compute cycles. Another one, Decibel, provides isolation in shared non-volatile storage and allows clients to remotely access high-speed devices at latencies comparable to local devices while guaranteeing throughput, even in the face of competing workloads.
Read more

Static Program Analysis for a Software-Driven Society

Dr. Caterina Urban ETH Zurich
15 Mar 2018, 10:30 am - 12:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
As we rely more and more on computer software for automating processes and making decisions, the range of software that is considered critical goes well beyond the avionics and nuclear power industries: nowadays software plays an increasingly important role in convicting criminals, trading on the financial markets, autonomously driving cars, and performing medical diagnoses, to name a few applications. It is now paramount to ensure the reliability and security of such software, and expectations about software fairness and transparency are rapidly rising. ...
As we rely more and more on computer software for automating processes and making decisions, the range of software that is considered critical goes well beyond the avionics and nuclear power industries: nowadays software plays an increasingly important role in convicting criminals, trading on the financial markets, autonomously driving cars, and performing medical diagnoses, to name a few applications. It is now paramount to ensure the reliability and security of such software, and expectations about software fairness and transparency are rapidly rising. To meet these needs, we need new mathematical models of software behavior that capture the aspects relevant for a particular dependability property, and new algorithmic approaches to effectively navigate this mathematical space and decide whether the software behaves as desired. This talk gives an overview of the steps I have taken towards addressing these challenges. Starting from a series of works on deciding software termination, I show that the insights from this domain are transferable to other formal methods and properties. These results pave the way for a unified framework for deciding increasingly advanced software dependability properties. I discuss the first results that I obtained in this more general direction, which in particular bring new conceptual clarity to the synergies with deciding security properties of software. Finally, I conclude with an outlook to the future and discuss the potential impact of this research on our personal, civic, and economic life.
Read more

Language Support for Distributed Systems in Scala

Heather Miller Northeastern University and EPFL
12 Mar 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Recent years have seen a rise in distributed systems for interactive, large-scale data processing. Cutting-edge systems focus on reducing latency and increasing expressiveness in order to provide an interactive and rich experience to more and varied users coming from emerging fields such as data science. Meanwhile, the languages and runtimes underlying such systems face numerous challenges in the context of the severely demanding needs of these new distributed systems; popular languages and runtimes like Scala and the JVM (a) limit the customizability of fundamental operations like serialization, ...
Recent years have seen a rise in distributed systems for interactive, large-scale data processing. Cutting-edge systems focus on reducing latency and increasing expressiveness in order to provide an interactive and rich experience to more and varied users coming from emerging fields such as data science. Meanwhile, the languages and runtimes underlying such systems face numerous challenges in the context of the severely demanding needs of these new distributed systems; popular languages and runtimes like Scala and the JVM (a) limit the customizability of fundamental operations like serialization, and (b) expose low-level distribution-related errors to application developers and end users when trying to distribute core language features, such as functions. This talk presents three systems that (a) give more control over these primitives to distributed systems builders thereby enabling important optimizations, and (b) increase the reliability of distributing functions and objects. Theoretical, experimental, and empirical results are used in the validation of our work.
Read more

Self-aware Computing: Combining Learning and Control to Manage Complex, Dynamic Systems

Dr. Hank Hoffmann University of Chicago
08 Mar 2018, 10:30 am - 12:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Modern computing systems must meet multiple---often conflicting---goals; e.g., high-performance and low energy consumption. The current state-of-practice involves ad hoc, heuristic solutions to such system management problems that offer no formally verifiable behavior and must be rewritten or redesigned wholesale as new computing platforms and constraints evolve. In this talk, I will discuss my research on building self-aware computing systems that address computing system goals and constraints in a fundamental way, starting with rigorous mathematical models and ending with real software and hardware implementations that have formally analyzable behavior and can be re-purposed to address new problems as they emerge. ...
Modern computing systems must meet multiple---often conflicting---goals; e.g., high-performance and low energy consumption. The current state-of-practice involves ad hoc, heuristic solutions to such system management problems that offer no formally verifiable behavior and must be rewritten or redesigned wholesale as new computing platforms and constraints evolve. In this talk, I will discuss my research on building self-aware computing systems that address computing system goals and constraints in a fundamental way, starting with rigorous mathematical models and ending with real software and hardware implementations that have formally analyzable behavior and can be re-purposed to address new problems as they emerge.

These self-aware systems are distinguished by awareness of user goals and operating environment; they continuously monitor themselves and adapt their behavior and foundational models to ensure the goals are met despite the challenges of complexity (diverse hardware resources to be managed) and dynamics (unpredictable changes in input workload or resource availability). In this talk, I will describe how to build self-aware systems through a combination of machine learning and control theoretic techniques. I will then show how this combination enables new capabilities, like increasing system robustness, reducing application energy, and meeting latency requirements even with no prior knowledge of the application.
Read more

Incorporating Positional and Contextual Information into a Neural IR Model

Andrew Yates MPI-INF - D5
07 Mar 2018, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Ad-hoc information retrieval models consider query-document interactions to produce a document relevance score for a given query and document. Traditionally, such interactions have been modeled using handcrafted statistics that generally count term occurrences within a document and across a collection. Recently, neural models have demonstrated that they provide the instruments necessary to consider query-document interactions directly, without the need for engineering such statistics.

In this talk, I will describe how positional term information can be represented and incorporated into a neural IR model. ...
Ad-hoc information retrieval models consider query-document interactions to produce a document relevance score for a given query and document. Traditionally, such interactions have been modeled using handcrafted statistics that generally count term occurrences within a document and across a collection. Recently, neural models have demonstrated that they provide the instruments necessary to consider query-document interactions directly, without the need for engineering such statistics.

In this talk, I will describe how positional term information can be represented and incorporated into a neural IR model. The resulting model, called PACRR, performs significantly better on standard TREC benchmarks than previous neural approaches. This improvement can be attributed to the fact that PACRR can learn to match both ordered and unordered sequences of query terms in addition to the single term matches considered by prior work. Using PACRR's approach to modeling query-document interactions as a foundation, I will describe how several well-known IR problems can be addressed by incorporating contextual information into the model; the resulting Co-PACRR model significantly outperforms the original PACRR model. Finally, I will provide a brief look inside the model to illustrate the interpretability of the learned weights and to investigate how match signals are combined by the model to produce a query-document relevance score.
Read more

Observing and Controlling Distributed Systems with Cross-Cutting Tools

Jonathan Mace Brown University
05 Mar 2018, 10:30 am - 12:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Distributed systems represent some of the most interesting and successful computing applications in use today, from modern web applications and social networks, to data analytics and large-scale machine learning.  However, it is notoriously difficult to understand, troubleshoot, and enforce distributed systems behaviors, because unlike standalone programs, they lack a central point of visibility and control.  This impacts important tasks like resource management, performance, security, accounting, and more.  In this talk I will outline techniques and abstractions for re-establishing cross-component visibility and control.  ...
Distributed systems represent some of the most interesting and successful computing applications in use today, from modern web applications and social networks, to data analytics and large-scale machine learning.  However, it is notoriously difficult to understand, troubleshoot, and enforce distributed systems behaviors, because unlike standalone programs, they lack a central point of visibility and control.  This impacts important tasks like resource management, performance, security, accounting, and more.  In this talk I will outline techniques and abstractions for re-establishing cross-component visibility and control.  I will demonstrate with two cross-cutting tools that I have developed in my research: Retro, which measures resource usage and co-ordinates scheduler parameters to achieve end-to-end performance goals; and Pivot Tracing, which dynamically monitors and correlates metrics across component boundaries.  Together, these tools illustrate some of the common challenges and potential solutions when developing and deploying tools for distributed systems.
Read more

Storage mechanisms and finite-state abstractions for software verification

Georg Zetzsche Université Paris-Diderot
01 Mar 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
A popular approach to automatic program verification is to come up with an abstraction that reflects pertinent properties of the program. This abstraction is drawn from a class of formal models that is amenable to analysis. In the plethora of existing formal models, the aspects of programs that can be represented faithfully are typically determined by the infinite dimension of its state space, its storage mechanism. A central theme of my recent research is to obtain general insights into how the structure of the storage mechanism affects the analysis of a formal model. ...
A popular approach to automatic program verification is to come up with an abstraction that reflects pertinent properties of the program. This abstraction is drawn from a class of formal models that is amenable to analysis. In the plethora of existing formal models, the aspects of programs that can be represented faithfully are typically determined by the infinite dimension of its state space, its storage mechanism. A central theme of my recent research is to obtain general insights into how the structure of the storage mechanism affects the analysis of a formal model. In the first part of my talk, I will survey results on an overarching framework of storage mechanisms developed in my doctoral work. It encompasses a range of infinite-state models and permits meaningful characterizations of when a particular method of analysis is applicable. Another current focus of my work concerns finite-state abstractions of infinite-state models. On one hand, these can be over- or under-approximations that are more efficient to analyze than infinite-state systems. On the other hand, they can serve as easy-to-check correctness certificates that are produced instead of yes-or-no answers to a verification task. Thus, the second part of my talk will be concerned with results on computing downward closures and related finite-state abstractions.
Read more

Fighting Large-scale Internet Abuse

Kevin Borgolte University of California, Santa Barbara
26 Feb 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
The widespread access to the Internet and the ubiquity of web-based services make it easy to communicate and interact globally. Unfortunately, the software implementing the functionality of these services is often vulnerable to attacks. In turn, an attacker can exploit them to compromise and abuse the services for nefarious purposes. In my research, I aim to better understand, detect, and prevent these attacks.

In this talk, we first look at detecting website defacements, which can inflict significant harm on a website's owner or operator through the loss of sales, ...
The widespread access to the Internet and the ubiquity of web-based services make it easy to communicate and interact globally. Unfortunately, the software implementing the functionality of these services is often vulnerable to attacks. In turn, an attacker can exploit them to compromise and abuse the services for nefarious purposes. In my research, I aim to better understand, detect, and prevent these attacks.

In this talk, we first look at detecting website defacements, which can inflict significant harm on a website's owner or operator through the loss of sales, the loss in reputation, or because of legal ramifications. Then, we dive into how to automatically identify malware distribution campaigns, which has become a major challenge in today's Internet. Next, we look at how to mitigate the dangers of domain takeover attacks, which give attackers the same capabilities to spread misinformation or malware as vulnerabilities do, but without the actual need for a vulnerability in the affected service. Last, I will conclude by sketching interesting future directions on how to better understand, detect, and prevent Internet abuse.
Read more

High Performance Data Center TCP Packet Processing

Antoine Kaufmann University of Washington
19 Feb 2018, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
TCP is widely used for client-server communication in modern data centers. But TCP packet handling is notoriously CPU intensive, accounting for an increasing fraction of data center processing time. Techniques such as TCP segment offload, kernel bypass, and RDMA are of limited benefit for the typical small, frequent RPCs. These techniques can also compromise protocol agility, resource isolation, overall system reliability, and complicate multi-tenancy.

I propose a unique refactoring of TCP functionality that splits processing between a streamlined fast path for common operations, ...
TCP is widely used for client-server communication in modern data centers. But TCP packet handling is notoriously CPU intensive, accounting for an increasing fraction of data center processing time. Techniques such as TCP segment offload, kernel bypass, and RDMA are of limited benefit for the typical small, frequent RPCs. These techniques can also compromise protocol agility, resource isolation, overall system reliability, and complicate multi-tenancy.

I propose a unique refactoring of TCP functionality that splits processing between a streamlined fast path for common operations, and an out-of-band slow path. Protocol processing executes in the kernel on dedicated cores that enforce correctness and resource isolation. Applications asynchronously communicate with the kernel through event queues, improving parallelism and cache utilization. I show that my approach can increase RPC throughput by up to 4.1x compared to Linux. The fast-path can be offloaded to a programmable NIC to further improve performance and minimize CPU time for network processing. With hardware offload, data packets are delivered directly from application to application, while the NIC and kernel cooperate to enforce correctness and resource isolation. I show that hardware offload can increase per-core packet throughput by 10.7x compared to the Linux kernel TCP implementation.
Read more

Towards Latency Guarantees in Datacenters

Keon Jang Google
15 Feb 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
A large portion of computation is now carried out in datacenter. A single datacenter hosts several 100s or 1000s of applications which share common compute and network resources in a datacenter. Isolating each application's performance, i.e., ensuring that its performance is predictable regardless of the behavior of other applications is essential to developing and deploying data center applications, since otherwise developers need to account for co-located applications, which increases development and deployment complexity. Despite its importance current approaches to performance isolation are incomplete, ...
A large portion of computation is now carried out in datacenter. A single datacenter hosts several 100s or 1000s of applications which share common compute and network resources in a datacenter. Isolating each application's performance, i.e., ensuring that its performance is predictable regardless of the behavior of other applications is essential to developing and deploying data center applications, since otherwise developers need to account for co-located applications, which increases development and deployment complexity. Despite its importance current approaches to performance isolation are incomplete, and focus mostly on isolating computational resources. In this talk I present two schemes for isolation network performance. The first, Silo, takes a resource allocation based approach and implements mechanisms for guaranteeing an application's network latency and throughput. The second, Express Pass, takes a congestion control based approach and fairly partitions network resources across applications.  Both approaches require no hardware (ASIC) changes, and can be deployed in today's datacenter. This shows that full application performance isolation is achievable today.
Read more

Liquid Haskell: Usable Language-Based Program Verification

Niki Vazou University of Maryland
12 Feb 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Formal verification has been gaining the attention and resources of both the academic and the industrial world since it prevents critical software bugs that cost money, energy, time, and even lives. Yet, software development and formal verification are decoupled, requiring verification experts to prove properties of a template – instead of the actual – implementation ported into verification specific languages. My goal is to bridge formal verification and software development for the programming language Haskell. Haskell is a unique programming language in that it is a general purpose, ...
Formal verification has been gaining the attention and resources of both the academic and the industrial world since it prevents critical software bugs that cost money, energy, time, and even lives. Yet, software development and formal verification are decoupled, requiring verification experts to prove properties of a template – instead of the actual – implementation ported into verification specific languages. My goal is to bridge formal verification and software development for the programming language Haskell. Haskell is a unique programming language in that it is a general purpose, functional language used for industrial development, but simultaneously it stands at the leading edge of research and teaching welcoming new, experimental, yet useful features.

In this talk I am presenting Liquid Haskell, a refinement type checker in which formal specifications are expressed as a combination of Haskell’s types and expressions and are automatically checked against real Haskell code. This natural integration of specifications in the language, combined with automatic checking, established Liquid Haskell as a usable verifier, enthusiastically accepted by both industrial and academic Haskell users. Recently, I turned Liquid Haskell into a theorem prover, in which arbitrary theorems about Haskell functions would be proved within the language. As a consequence, Liquid Haskell can be used in Haskell courses to teach the principles of mechanized theorem proving.   

Turning a general purpose language into a theorem prover opens up new research questions – e.g., can theorems be used for runtime optimizations of existing real-world applications? – that I plan to explore in the future.
Read more

Scalable positioning of commodity mobile devices using audio signals

Viktor Erdélyi Max Planck Institute for Software Systems
09 Feb 2018, 4:00 pm - 4:45 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Proposal
This thesis explores the problem of computing a position map for co-located mobile devices without local infrastructure. The capability to obtain a position map can, for instance, aid spontaneous and private communication among users based on their relative position (e.g., in a given section of a room, at the same table, or in a given seat) at events like meetings, talks, or conferences. It can also facilitate interaction between speaker and audience in a lecture hall, ...
This thesis explores the problem of computing a position map for co-located mobile devices without local infrastructure. The capability to obtain a position map can, for instance, aid spontaneous and private communication among users based on their relative position (e.g., in a given section of a room, at the same table, or in a given seat) at events like meetings, talks, or conferences. It can also facilitate interaction between speaker and audience in a lecture hall, and enable the distribution of materials and the collection of feedback based on users' location. In this thesis, we first present Sonoloc, a mobile app and system that, by relying on acoustic signals, allows a set of co-located commodity smart devices to determine their relative positions without local infrastructure. Sonoloc can position any number of devices within acoustic range with a constant number of chirps emitted by a self-organized subset of devices. Our experimental evaluation shows that the system can locate up to hundreds of devices with an accuracy of tens of centimeters using no more than 15 acoustic signals emitted by dynamically selected devices, in realistic rooms and despite substantial background noise. Then, we also discuss two planned extensions to Sonoloc: (1) one that enables users to determine the correct orientation of the position map while also allowing per-user choices, and (2) one that enables the creation of a combined relative position map in meetings that span multiple locations over videoconference.
Read more

Caribou -- Intelligent Distributed Storage for the Datacenter

Zsolt Istvan ETH Zurich
08 Feb 2018, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
In the era of Big Data, datacenter and cloud architectures decouple compute and storage resources from each other for better scalability. While this design choice enables elastic scale-out, it also causes unnecessary data movements. One solution is to push parts of the computation down to storage where data can be filtered more efficiently. Systems that do this are already in use and rely either on regular server machines as storage nodes or on network attached storage devices. ...
In the era of Big Data, datacenter and cloud architectures decouple compute and storage resources from each other for better scalability. While this design choice enables elastic scale-out, it also causes unnecessary data movements. One solution is to push parts of the computation down to storage where data can be filtered more efficiently. Systems that do this are already in use and rely either on regular server machines as storage nodes or on network attached storage devices. Even though the former provide complex computation and rich functionality since there are plenty of conventional cores available to run the offloaded computation, this solution is quite inefficient because of the over-provisioning of computing capacity and the bandwidth mismatches between storage, CPU, and network.  Networked storage devices, on the other hand, are better balanced in terms of bandwidth but at the price of offering very limited options for offloading data processing.

With Caribou, we explore an alternative design that offers rich offloading functionality while matching the available line-rate processing performance of either storage or network. It also does this in a much more efficient package (size, energy consumption) than regular servers. Our FPGA-based prototype system has been designed such that the internal data management logic can saturate the network for most operation mixes, without being over-provisioned. As a result, it can extract and process data from storage at multi-GB/s rate before sending it to the computing nodes, while at the same time offering features such as replication for fault-tolerance.

Caribou has been released as open source. Its modular design and extensible processing pipeline make it a convenient platform for exploring domain-specific processing inside storage nodes.
Read more

Relational Cost Analysis

Ezgi Çiçek Max Planck Institute for Software Systems
22 Jan 2018, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Programming languages research has made great progress towards statically estimating the execution cost of a program. However, when one is interested in how the execution costs of two programs compare to each other (i.e., relational cost analysis), the use of unary techniques does not work well in many cases. In order to support a relational cost analysis, we must ultimately support reasoning about not only the executions of a single program, but also the executions of two programs, ...
Programming languages research has made great progress towards statically estimating the execution cost of a program. However, when one is interested in how the execution costs of two programs compare to each other (i.e., relational cost analysis), the use of unary techniques does not work well in many cases. In order to support a relational cost analysis, we must ultimately support reasoning about not only the executions of a single program, but also the executions of two programs, taking into account their similarities. This dissertation makes several contributions to the understanding and development of such a relational cost analysis. It shows how: 1) Refinement types and effect systems can express functional and relational quantitative properties of pairs of programs, including the difference in execution costs; 2) Relational cost analysis can be adapted to reason about dynamic stability, a measure of the update times of incremental programs as their inputs change; and 3) A sound and complete bidirectional type system can be developed (and implemented) for relational cost analysis.
Read more