Recent events

Knowledge and Information Dissemination: Models and Methods

Utkarsh Upadhyay Max Planck Institute for Software Systems
17 Oct 2019, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029 / Meeting ID: 6312
SWS Student Defense Talks - Thesis Proposal
In the past, information and knowledge dissemination was relegated to the brick-and-mortar classrooms, newspapers, radio, and television. As these processes were simple and centralized, the models behind them were well understood and so were the empirical methods for optimizing them. In today's world, the internet and social media has become a powerful tool for information and knowledge dissemination: Wikipedia gets more than 1 million edits per day, Stack Overflow has more than 17 million questions, 25% of US population visits Yahoo! …
In the past, information and knowledge dissemination was relegated to the brick-and-mortar classrooms, newspapers, radio, and television. As these processes were simple and centralized, the models behind them were well understood and so were the empirical methods for optimizing them. In today's world, the internet and social media has become a powerful tool for information and knowledge dissemination: Wikipedia gets more than 1 million edits per day, Stack Overflow has more than 17 million questions, 25% of US population visits Yahoo! News for articles and discussions, Twitter has more than 60 million active monthly users, and Duolingo has 25 million users learning languages online.

These developments have introduced a paradigm shift in the process of dissemination. Not only has the nature of the task moved from being centralized to decentralized, but the developments have also blurred the boundary between the creator and the consumer of the content, i.e., information and knowledge. These changes have made it necessary to develop new models, which are better suited to understanding and analysing the dissemination, and to develop new methods to optimize them.

At a broad level, we can view the participation of users in the process of dissemination as falling in one of two settings: collaborative or competitive. In the collaborative setting, the participants work together in crafting knowledge online, e.g., by asking questions and contributing answers, or by discussing news or opinion pieces. In contrast, as competitors, they vie for the attention of their followers on social media. The first part of the thesis will propose models for the complexity of discussions and the evolution of expertise. The latter part of the thesis will explore the competitive setting where I will propose computational methods for measuring, and increasing, the attention received from followers on social media.
Read more

Non-Reformist Reform for Haskell Modularity

Scott Kilpatrick Max Planck Institute for Software Systems
15 Oct 2019, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6747
SWS Student Defense Talks - Thesis Defense
Module systems like that of Haskell permit only a weak form of modularity in which module implementations depend directly on other implementations and must be processed in dependency order. Module systems like that of ML, on the other hand, permit a stronger form of modularity in which explicit interfaces express assumptions about dependencies and each module can be typechecked and reasoned about independently.

In this thesis, I present Backpack, a new language for building separately-typecheckable packages on top of a weak module system like Haskell’s. …
Module systems like that of Haskell permit only a weak form of modularity in which module implementations depend directly on other implementations and must be processed in dependency order. Module systems like that of ML, on the other hand, permit a stronger form of modularity in which explicit interfaces express assumptions about dependencies and each module can be typechecked and reasoned about independently.

In this thesis, I present Backpack, a new language for building separately-typecheckable packages on top of a weak module system like Haskell’s. The design of Backpack is the first to bring the rich world of type systems to the practical world of packages via mixin modules. It’s inspired by the MixML module calculus of Rossberg and Dreyer but by choosing practicality over expressivity Backpack both simplifies that semantics and supports a flexible notion of applicative instantiation. Moreover, this design is motivated less by foundational concerns and more by the practical concern of integration into Haskell. The result is a new approach to writing modular software at the scale of packages.

The semantics of Backpack is defined via elaboration into sets of Haskell modules and binary interface files, thus showing how Backpack maintains interoperability with Haskell while retrofitting it with interfaces. In my formalization of Backpack I present a novel type system for Haskell modules and I prove a key soundness theorem to validate Backpack’s semantics.
Read more

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers

Stefan Saroiu Mircosoft Research, Redmond
07 Oct 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 113 / Meeting ID: 6312
SWS Colloquium
Cloud providers are nervous about recent research showing how Rowhammer attacks affect many types of DRAM including DDR4 and ECC-equipped DRAM.  Unfortunately, cloud providers lack a systematic way to test the DRAM present in their servers for the threat of a Rowhammer attack. Building such a methodology needs to overcome two difficult challenges: (1) devising a CPU instruction sequence that maximizes the rate of DRAM row activations on a given system, and (2) determining the adjacency of rows internal to DRAM. …
Cloud providers are nervous about recent research showing how Rowhammer attacks affect many types of DRAM including DDR4 and ECC-equipped DRAM.  Unfortunately, cloud providers lack a systematic way to test the DRAM present in their servers for the threat of a Rowhammer attack. Building such a methodology needs to overcome two difficult challenges: (1) devising a CPU instruction sequence that maximizes the rate of DRAM row activations on a given system, and (2) determining the adjacency of rows internal to DRAM. This talk will present an end-to-end methodology that overcomes these challenges to determine if cloud servers are susceptible to Rowhammer attacks. With our methodology, a cloud provider can construct worst-case testing conditions for DRAM.

We used our methodology to create worst-case DRAM testing conditions on the hardware used by a major cloud provider for a recent generation of its servers. Our findings show that none of the instruction sequences used in prior work to mount Rowhammer attacks create worst-case DRAM testing conditions. Instead, we construct an instruction sequence that issues non-explicit load and store instructions. Our new sequence leverages microarchitectural side-effects to ``hammer'' DRAM at a near-optimal rate on modern Skylake platforms. We also designed a DDR4 fault injector capable of reverse engineering row adjacency inside a DRAM device. When applied to our cloud provider's DIMMs, we find that rows inside DDR4 DRAM devices do not always follow a linear map.

Joint work with Lucian Cojocar (VU Amsterdam), Jeremie Kim, Minesh Patel, Onur Mutlu (ETH Zurich), Lily Tsai (MIT), and Alec Wolman (MSR)
Read more

Efficient Optimization for Very Large Combinatorial Problems in Computer Vision and Machine Learning

Paul Swoboda MPI-INF - D2
02 Oct 2019, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In computer vision and machine learning combinatorial optimization problems are widespread, typically NP-hard and tend to pose unique challenges due to their very large scale and problem structure. Established techniques from the mathematical optimization community cannot cope with the encountered problem sizes and do not exploit special problem characteristics. In this talk I will present several new solution paradigms for solving large scale combinatorial problems in computer vision efficiently and to high accuracy. I will discuss how these principles can be applied on classical problems of combinatorial optimization that have found wide use in computer vision, …
In computer vision and machine learning combinatorial optimization problems are widespread, typically NP-hard and tend to pose unique challenges due to their very large scale and problem structure. Established techniques from the mathematical optimization community cannot cope with the encountered problem sizes and do not exploit special problem characteristics. In this talk I will present several new solution paradigms for solving large scale combinatorial problems in computer vision efficiently and to high accuracy. I will discuss how these principles can be applied on classical problems of combinatorial optimization that have found wide use in computer vision, machine learning and computer graphics, namely inference in Markov Random Fields, the quadratic assignment problem and graph decomposition. Lastly, I will show empirical results showing the great practical performance of the presented techniques.
Read more

Toward Cognitive Security

Claude Castelluccia InRIA
02 Oct 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
Online services, devices or secret services are constantly collecting data and meta-data from users. This data collection is mostly  used to target users or customise their services. However, as illustrated by the Cambridge Analytica case, data and technologies are more and  more used to manipulate, influence or shape people's opinions online, i.e. to "hack" our brains. In this context, it is urgent to develop the field of  "Cognitive security" in order to better comprehend these attacks and provide counter-measures.  …
Online services, devices or secret services are constantly collecting data and meta-data from users. This data collection is mostly  used to target users or customise their services. However, as illustrated by the Cambridge Analytica case, data and technologies are more and  more used to manipulate, influence or shape people's opinions online, i.e. to "hack" our brains. In this context, it is urgent to develop the field of  "Cognitive security" in order to better comprehend these attacks and provide counter-measures.  This talk will introduce the concept of "Cognitive security". We will explore the different types of cognitive attacks and discuss possible research directions.
Read more

Human-Centered Design and Data Science for Good

Maria Rauschenberger Universitat Pompeu Fabra
30 Sep 2019, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112 / Meeting ID: 9312
SWS Colloquium
How can we make better applications for social impact issues? For example, the combination of Human-Centered Design (HCD) and Data Science (DS) can be the answer to avoid biases in the collection of data with online-experiments and the analysis of small data. This presentation shows how we combine HCD and DS to design applications and analyze the collected data for Good.  We will focus mainly on the project: "Early screening of dyslexia using a language-independent content game and machine learning". …
How can we make better applications for social impact issues? For example, the combination of Human-Centered Design (HCD) and Data Science (DS) can be the answer to avoid biases in the collection of data with online-experiments and the analysis of small data. This presentation shows how we combine HCD and DS to design applications and analyze the collected data for Good.  We will focus mainly on the project: "Early screening of dyslexia using a language-independent content game and machine learning". With our two designed games (MusVis and DGames), we collected data sets (313 and 137 participants) in different languages (mainly Spanish and German) and evaluated them with machine learning classifiers. For MusVis, we mainly use content that refers to one single acoustic or visual indicator, while DGames content refers to generic content related to various indicators. Our results open the possibility of low-cost and early screening of dyslexia through the Web. In this talk, we will further address the techniques used from HCD and DS to reach these results. 
Read more

Accelerating Network Applications with Stateful TCP Offloading

YoungGyoun Moon KAIST
24 Sep 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
The performance of modern key-value servers or layer-7 load balancers often heavily depends on the efficiency of the underlying TCP stack. Despite numerous optimizations such as kernel-bypassing and zero-copying, performance improvement for TCP applications is fundamentally limited due to the protocol conformance overhead for compatible TCP operations.

In this talk, I will introduce AccelTCP, a hardware-assisted TCP stack architecture that harnesses programmable network interface cards (NICs) as a TCP protocol accelerator. AccelTCP can offload complex TCP operations such as connection setup and teardown completely to NIC, …
The performance of modern key-value servers or layer-7 load balancers often heavily depends on the efficiency of the underlying TCP stack. Despite numerous optimizations such as kernel-bypassing and zero-copying, performance improvement for TCP applications is fundamentally limited due to the protocol conformance overhead for compatible TCP operations.

In this talk, I will introduce AccelTCP, a hardware-assisted TCP stack architecture that harnesses programmable network interface cards (NICs) as a TCP protocol accelerator. AccelTCP can offload complex TCP operations such as connection setup and teardown completely to NIC, which frees a significant amount of host CPU cycles for application processing. In addition, for layer-7 proxies, it supports running connection splicing on NIC so that the NIC relays all packets of the spliced connections with zero DMA overhead. We showcase the effectiveness of AccelTCP with two real-world applications: (1) Redis, a popular in-memory key-value store, and (2) HAProxy, a widely-used layer-7 load balancer. Our evaluation shows that AccelTCP improves their performance by 2.3x and 11.9x, respectively.
Read more

Synthesis from within: implementing automated synthesis inside an SMT solver

Cesare Tinelli University of Iowa
16 Sep 2019, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Kaiserslautern building E1 5, room 029 / Meeting ID: 6312
SWS Colloquium
Recent research in automated software synthesis from specifications or observations has leveraged the power of SMT solvers in order to explore the space of synthesis conjectures efficiently. In most of this work, synthesis techniques are built around a backend SMT solver which is used as a black-box reasoning engine. In this talk, I will describe a successful multiyear research effort by the developers of the SMT solver CVC4 that instead incorporates synthesis capabilities directly within the solver, …
Recent research in automated software synthesis from specifications or observations has leveraged the power of SMT solvers in order to explore the space of synthesis conjectures efficiently. In most of this work, synthesis techniques are built around a backend SMT solver which is used as a black-box reasoning engine. In this talk, I will describe a successful multiyear research effort by the developers of the SMT solver CVC4 that instead incorporates synthesis capabilities directly within the solver, and the discuss the advances in performance and scope made possible by this approach.
Read more

Computational Fabrication: 3D Printing and Beyond

Vahid Babaei MPI-INF - D4
04 Sep 2019, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
The objective of my talk is to introduce the audience to the exciting field of computational fabrication. The recent, wide availability of 3D printers has triggered considerable interest in academia and industry. Computer scientists could engage with hands-on 3D printing and very soon realize the immense but untapped potential of the manufacturing industry for computational methods. In this talk, I will explain the principles of 3D printing (also known as additive manufacturing) both from hardware and software viewpoints. …
The objective of my talk is to introduce the audience to the exciting field of computational fabrication. The recent, wide availability of 3D printers has triggered considerable interest in academia and industry. Computer scientists could engage with hands-on 3D printing and very soon realize the immense but untapped potential of the manufacturing industry for computational methods. In this talk, I will explain the principles of 3D printing (also known as additive manufacturing) both from hardware and software viewpoints. I will then show examples of recent research addressing computational problems in both 3D printing and general manufacturing. I will also discuss my main research interest, i.e. computational fabrication of visual appearance. Appearance of objects is among their most important and most complicated properties that influence or in numerous cases define their function. I show that additive manufacturing provides unprecedented opportunities to create products with novel and useful appearance properties.
Read more

A type theory for amortized resource analysis

Vineet Rajani Max Planck Institute for Software Systems
27 Aug 2019, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Student Defense Talks - Thesis Proposal
Amortized analysis is a standard algorithmic technique for estimating upper bounds on the average costs of functions, specifically operations on data structures. This thesis intends to develop λ-amor, a type-theory for amortized analysis of higher-order functional programs. A typical amortized analysis works by storing ghost resource called /potential/ with a data structure's internal state. Accordingly, the central idea in λ-amor is a type-theoretic construct to associate potential with an arbitrary type. Additionally, λ-amor relies on standard concepts from substructural and modal type systems: indexed monads, …
Amortized analysis is a standard algorithmic technique for estimating upper bounds on the average costs of functions, specifically operations on data structures. This thesis intends to develop λ-amor, a type-theory for amortized analysis of higher-order functional programs. A typical amortized analysis works by storing ghost resource called /potential/ with a data structure's internal state. Accordingly, the central idea in λ-amor is a type-theoretic construct to associate potential with an arbitrary type. Additionally, λ-amor relies on standard concepts from substructural and modal type systems: indexed monads, affine types and indexed exponential types. We show that λ-amor is not only sound (in a very elementary logical relations model), but also very expressive: It can be used to analyze both eager and lazy data structures, and it can embed existing resource analysis frameworks. In fact, λ-amor is /complete/ for the cost analysis of lazy PCF programs. Further, the basic principles behind λ-amor can be adapted (by dropping affineness and adding mutable state) to obtain an expressive type system for a completely unrelated application, namely, information flow control.

The proposal talk will cover the broad setting and the motivation of the work and a significant subset of λ-amor, but due to time constraints, it will not cover all of λ-amor or the adaptation to information flow control. Implementation of the two type theories is not in the scope of the thesis.
Read more

Modeling and Individualizing Learning in Computer-Based Environments

Tanja Käser Stanford University
21 Aug 2019, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112 / Meeting ID: 6312
SWS Colloquium
Learning technologies are becoming increasingly important in today's education. This includes game-based learning and simulations, which produce high volume output, and MOOCs (massive open online courses), which reach a broad and diverse audience at scale. The users of such systems often are of very different backgrounds, for example in terms of age, prior knowledge, and learning speed. Adaptation to the specific needs of the individual user is therefore essential. In this talk, I will present two of my contributions on modeling and predicting student learning in computer-based environments with the goal to enable individualization. …
Learning technologies are becoming increasingly important in today's education. This includes game-based learning and simulations, which produce high volume output, and MOOCs (massive open online courses), which reach a broad and diverse audience at scale. The users of such systems often are of very different backgrounds, for example in terms of age, prior knowledge, and learning speed. Adaptation to the specific needs of the individual user is therefore essential. In this talk, I will present two of my contributions on modeling and predicting student learning in computer-based environments with the goal to enable individualization. The first contribution introduces a new model and algorithm for representing and predicting student knowledge. The new approach is efficient and has been demonstrated to outperform previous work regarding prediction accuracy. The second contribution introduces models, which are able to not only take into account the accuracy of the user, but also the inquiry strategies of the user, improving prediction of future learning. Furthermore, students can be clustered into groups with different strategies and targeted interventions can be designed based on these strategies. Finally, I will also describe lines of future research.
Read more

Computer Science for Numerics

Martin Ziegler KAIST
19 Jul 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
Since introduction of the IEEE 754 floating point standard in 1985, numerical methods have become ubiquitous --- and increasingly sophisticated. With growing code complexity of numerical libraries grows the need for rigorous Software Engineering methodology: as provided by Computer Science and state of the art regarding digital processing of discrete data, but lacking in the continuous realm. We apply, adapt, and extend the classical concepts --- specification, algorithmics, analysis, complexity, verification --- from discrete bit strings, …
Since introduction of the IEEE 754 floating point standard in 1985, numerical methods have become ubiquitous --- and increasingly sophisticated. With growing code complexity of numerical libraries grows the need for rigorous Software Engineering methodology: as provided by Computer Science and state of the art regarding digital processing of discrete data, but lacking in the continuous realm. We apply, adapt, and extend the classical concepts --- specification, algorithmics, analysis, complexity, verification --- from discrete bit strings, integers, graphs etc. to real numbers, converging sequences, smooth/integrable functions, bounded operators, and compact subsets: A new paradigm formalizes mathematical structures as continuous Abstract Data Types with rigorous Turing-computable semantics but without the hassle of actual Turing machines.
Read more

Correct Compilation of Relaxed Memory Concurrency

Soham Chakraborty Max Planck Institute for Software Systems
16 Jul 2019, 1:00 pm - 2:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 005 / Meeting ID: 6312
SWS Student Defense Talks - Thesis Defense
Shared memory concurrency is the pervasive programming model for multicore architectures such as X86, Power, and ARM. Depending on the memory organization, each architecture follows a somewhat different shared memory model. All these models, however, have one common feature: they allow certain outcomes for concurrent programs that cannot be explained by interleaving execution. In addition to the complexity due to architectures, compilers like GCC and LLVM perform various program transformations, which also affect the outcomes of concurrent programs. …
Shared memory concurrency is the pervasive programming model for multicore architectures such as X86, Power, and ARM. Depending on the memory organization, each architecture follows a somewhat different shared memory model. All these models, however, have one common feature: they allow certain outcomes for concurrent programs that cannot be explained by interleaving execution. In addition to the complexity due to architectures, compilers like GCC and LLVM perform various program transformations, which also affect the outcomes of concurrent programs.

To be able to program these systems correctly and effectively, it is important to a define a formal language-level concurrency model. For efficiency, it is important that the model is weak enough to allow various compiler optimizations on shared memory accesses as well as efficient mappings to the architectures. For programmability, the model should be strong enough to disallow bogus "out- of-thin-air" executions and provide strong guarantees for well synchronized programs. Because of these conflicting requirements, defining such a formal model is very difficult. This is why, despite years of research, major programming languages such as C/C++ and Java do not yet have completely adequate formal models defining their concurrency semantics.

In this thesis, we address this challenge and develop a formal concurrency model that is very good both in terms of compilation efficiency and of programmability. Unlike most previous approaches, which were defined either operationally or axiomatically on single executions, our formal model is based on event structures, which represents multiple program executions, and thus gives us more structure to define the semantics of concurrency.

In more detail, our formalization has two variants: the weaker version, WEAKEST, and the stronger version, WEAKESTMO. The WEAKEST model simulates the promising semantics proposed by Kang et al., while WEAKESTMO is incomparable to the promising semantics. Moreover, WEAKESTMO discards certain questionable behaviors allowed by the promising semantics. We show that the proposed WEAKESTMO model resolve out-of-thin-air problem, provide standard data-race-freedom (DRF) guarantees, allow the desirable optimizations, and can be mapped to the architectures like X86, PowerPC, ARMv7. Additionally, our models are flexible enough to leverage existing results from the literature. In addition, in order to ensure the correctness of compilation by a major compiler, we developed a translation validator targeting LLVM’s "opt" transformations of concurrent C/C++ programs. Using the validator, we identified a few subtle compilation bugs, which were reported and were fixed. Additionally, we observe that LLVM concurrency semantics differs from that of C11; there are transformations which are justified in C11 but not in LLVM and vice versa. Considering the subtle aspects of LLVM concurrency, we formalized a fragment of LLVM’s concurrency semantics and integrated it into our WEAKESTMO model.
Read more

Design Problems: Trustworthy Smart Devices and 3D Printed Lace

Mary Baker HP Labs in Palo Alto
15 Jul 2019, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Distinguished Lecture Series
A growing number of domestic spaces incorporate products that collect data from cameras, microphones and other sensors, leading to privacy concerns. In this talk I report on two user studies performed to learn about perceptions of privacy and trust for sensor-enabled, connected devices such as smart home assistants. The study results suggest that users are more likely to trust devices with materially representative privacy status indicators. This means that the indicators themselves are part of what determines what sensing can take place. …
A growing number of domestic spaces incorporate products that collect data from cameras, microphones and other sensors, leading to privacy concerns. In this talk I report on two user studies performed to learn about perceptions of privacy and trust for sensor-enabled, connected devices such as smart home assistants. The study results suggest that users are more likely to trust devices with materially representative privacy status indicators. This means that the indicators themselves are part of what determines what sensing can take place. I will describe how we have applied the study results to the design of current devices and what the implications are for the physical design of future smart devices.

Time permitting, I will also talk about my other current passion -- design for additive manufacturing – and what researchers can do to ensure we reach the vastly exciting potential of this method of production. I will bring exotic 3D printed parts to help demonstrate my points.
Read more

Automated Program Repair

Abhik Roychoudhury National University of Singapore
08 Jul 2019, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029 / Meeting ID: 6312
SWS Distinguished Lecture Series
Automated program repair is an emerging and exciting field of research, which allows for automated rectification of errors and vulnerabilities. The use of automated program repair can be myriad, such as (a) improving programmer productivity (b) automated fixing of security vulnerabilities as they are detected, (c) self-healing software for autonomous devices such as drones, as well as (d) use of repair in introductory programming education by grading and providing hints for programming assignments. One of the key technical challenges in achieving automated program repair, …
Automated program repair is an emerging and exciting field of research, which allows for automated rectification of errors and vulnerabilities. The use of automated program repair can be myriad, such as (a) improving programmer productivity (b) automated fixing of security vulnerabilities as they are detected, (c) self-healing software for autonomous devices such as drones, as well as (d) use of repair in introductory programming education by grading and providing hints for programming assignments. One of the key technical challenges in achieving automated program repair, is the lack of formal specifications of intended program behavior. In this talk, we will conceptualize the use of symbolic execution approaches and tools for extracting such specifications. This is done by analyzing a buggy program against selected tests, or against reference implementations. Such specification inference capability can be combined with program synthesis techniques to automatically repair programs. The capability of specification inference also serves a novel use of symbolic execution beyond verification and navigation of large search spaces. Automated program repair via symbolic execution goes beyond search-based approaches which attempt to lift patches from elsewhere in the program. Such an approach can construct "imaginative" patches, serves as a test-bed for the grand- challenge of automated programming, and contributes to the vision of trustworthy self-healing software. Towards the end of the talk, we can put the research on automated repair in light of the overall practice of software security, by sharing some experiences gained at the Singapore Cyber-security Consortium.
Read more

The Bright and Dark Sides of Computer Vision: Challenges and Opportunities for Privacy and Security

Mario Fritz CISPA
03 Jul 2019, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Today, vast amounts of visual information is collected and often also shared online. Such images and videos can contain various types of privacy-sensitive information that can nowadays be extracted automatically at a large scale, posing a steadily growing threat to users' privacy. I'll give an overview of our efforts towards understanding and controlling privacy in visual information as well as working towards our overall vision of a Visual Privacy Advisor. More generally speaking, we have seen a quick adoption of machine learning technology in a broad range of application scenarios. …
Today, vast amounts of visual information is collected and often also shared online. Such images and videos can contain various types of privacy-sensitive information that can nowadays be extracted automatically at a large scale, posing a steadily growing threat to users' privacy. I'll give an overview of our efforts towards understanding and controlling privacy in visual information as well as working towards our overall vision of a Visual Privacy Advisor. More generally speaking, we have seen a quick adoption of machine learning technology in a broad range of application scenarios. With such broad deployment, these approaches become part of the attack surface of modern IT infrastructures and therefore new privacy and security risks emerge. Hence, we research attack vectors and defenses of such intelligent systems built on AI and machine learning technology. In particular, I will talk about our latest work on membership inference and model stealing.
Read more

Fake News During the 2016 U.S. Presidential Elections: Prevalence, Agenda, and Stickiness.

Ceren Budak University of Michigan
10 Jun 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 005
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
The spread of fake news was one of the most discussed characteristics of the 2016 U.S. Presidential Election. The concerns regarding fake news have garnered significant attention in both media and policy circles, with some journalists even going as far as claiming that results of the 2016 election were a consequence of the spread of fake news. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. …
The spread of fake news was one of the most discussed characteristics of the 2016 U.S. Presidential Election. The concerns regarding fake news have garnered significant attention in both media and policy circles, with some journalists even going as far as claiming that results of the 2016 election were a consequence of the spread of fake news. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this talk, I will address these questions by examining social media, news media, and interview data. These datasets allow examining the interplay between news media production and consumption, social media behavior, and the information the electorate retained about the presidential candidates leading up to the election.
Read more

Bridging the Performance Gap in Digital Geometry Processing

Rhaleb Zayer MPI-INF - D4
05 Jun 2019, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
As the computing landscape is being reshaped by the dramatic shift towards ubiquitous parallelism, and by the sheer scale of data, extracting performance from existing applications gives rise to formidable challenges. In digital geometry processing, the problem gets amplified by data irregularity (e.g. meshes) and the predominately serial nature of traditional algorithmic solutions.  As a results the gap between the high performance promise of modern hardware and the actual performance seems to grow wider.

In this talk, …
As the computing landscape is being reshaped by the dramatic shift towards ubiquitous parallelism, and by the sheer scale of data, extracting performance from existing applications gives rise to formidable challenges. In digital geometry processing, the problem gets amplified by data irregularity (e.g. meshes) and the predominately serial nature of traditional algorithmic solutions.  As a results the gap between the high performance promise of modern hardware and the actual performance seems to grow wider.

In this talk, I will discuss the impact of data structures and problem abstraction on performance. In particular, I will outline how high performance can be gained through a lean data representation which allows channeling parallelism through linear algebra kernels regardless of the underlying granularity. I will illustrate the impact of problem abstraction on challenging and far reaching scenarios including Voronoi diagrams (VD)/centroidal Voronoi tessellations (CVT) on surface meshes, subdivision surfaces, as well as matrix assembly in finite element analysis.
Read more

Automated Test Generation: A Journey from Symbolic Execution to Smart Fuzzing and Beyond

Koushik Sen UC Berkeley
04 Jun 2019, 10:30 am - 11:45 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029 / Meeting ID: 6312
SWS Distinguished Lecture Series
In the last two decades, automation has had a significant impact on software testing and analysis. Automated testing techniques, such as symbolic execution, concolic testing, and feedback-directed fuzzing, have found numerous critical faults, security vulnerabilities, and performance bottlenecks in mature and well-tested software systems. The key strength of automated techniques is their ability to quickly search state spaces by performing repetitive and expensive computational tasks at a rate far beyond the human attention span and computation speed. …
In the last two decades, automation has had a significant impact on software testing and analysis. Automated testing techniques, such as symbolic execution, concolic testing, and feedback-directed fuzzing, have found numerous critical faults, security vulnerabilities, and performance bottlenecks in mature and well-tested software systems. The key strength of automated techniques is their ability to quickly search state spaces by performing repetitive and expensive computational tasks at a rate far beyond the human attention span and computation speed. In this talk, I will give a brief overview of our past and recent research contributions in automated test generation using symbolic execution, program analysis, constraint solving, and fuzzing. I will also describe a new technique, called constraint-directed fuzzing, where given a pre-condition on a program as a logical formula, we can efficiently generate millions of test inputs satisfying the pre-condition.
Read more

High Performance Operating Systems in the Data Center

Tom Anderson University of Washington
31 May 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
The ongoing shift of enterprise computing to the cloud provides an opportunity to rethink operating systems for this new setting. I will discuss two specific technologies, kernel bypass for high performance networking and low latency non-volatile storage, and their implications for operating system design. In each case, delivering the performance of the underlying hardware requires novel approaches to the division of labor between hardware, the operating system kernel, and the application library.

Systematic Approach to Managing Software Defined Networks

Theopilus Benson Brown University
23 May 2019, 10:30 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
Software-defined Networks and programmable data-planes represent a shift in networking paradigm, which enables novel applications. Despite the growing interest and adoption of SDNs, SDNs remain plagued with availability and performance problems. In this talk, I discuss recent and ongoing work by my group to analyze these paradigms and create systematic abstractions that provide control over performance and availability. First, I will discuss Tardis a system that improves fault tolerance by leveraging the novel programmability provided by SDNs to identify and transform the failure-inducing event(s). …
Software-defined Networks and programmable data-planes represent a shift in networking paradigm, which enables novel applications. Despite the growing interest and adoption of SDNs, SDNs remain plagued with availability and performance problems. In this talk, I discuss recent and ongoing work by my group to analyze these paradigms and create systematic abstractions that provide control over performance and availability. First, I will discuss Tardis a system that improves fault tolerance by leveraging the novel programmability provided by SDNs to identify and transform the failure-inducing event(s). Second, I will discuss a pair of projects, Hermes and SCC, that revisits traditional storage principles and applies them to network updates. Through this work, I will demonstrate how the centralization and programmability offered by SDNs enables us to more systematically reason about traditional networking issues such as availability and performance.
Read more

On the Predictability of Heterogeneous SoC Multicore Platforms

Dr Giovani Gracioli Technical University Munich
20 May 2019, 10:30 am - 12:00 pm
Saarbrücken building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029 / Meeting ID: 6312
SWS Colloquium
Multiprocessor Systems-on-Chip (MPSoC) integrating hard processing cores with programmable logic (PL) are becoming increasingly common. While these platforms have been originally designed for high performance computing applications, their rich feature set can be exploited to efficiently implement mixed criticality domains serving both critical hard real-time tasks, as well as soft real-time tasks.

In this talk, we show how one can tailor these MPSoCs to support a mixed criticality system, where cores are strictly isolated to avoid contention on shared resources such as Last-Level Cache (LLC) and main memory. …
Multiprocessor Systems-on-Chip (MPSoC) integrating hard processing cores with programmable logic (PL) are becoming increasingly common. While these platforms have been originally designed for high performance computing applications, their rich feature set can be exploited to efficiently implement mixed criticality domains serving both critical hard real-time tasks, as well as soft real-time tasks.

In this talk, we show how one can tailor these MPSoCs to support a mixed criticality system, where cores are strictly isolated to avoid contention on shared resources such as Last-Level Cache (LLC) and main memory. We present and discuss a set of software and hardware techniques to improve the predictability using a modern MPSoC platform. We evaluate our techniques using an image processing application and show the maximum supported processing frequency.

Read more

Transparent Scaling of Deep Learning Systems through Dataflow Graph Analysis

Jinyang Li New York University
17 May 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Distinguished Lecture Series
As deep learning research pushes towards using larger and more sophisticated models, system infrastructure must use many GPUs efficiently. Analyzing the dataflow graph that represents the DNN computation is a promising avenue for optimization. By specializing execution for a given dataflow graph, we can accelerate DNN computation in ways that are transparent to programmers. In this talk, I show the benefits of dataflow graph analysis by discussing two recent systems that we've built to support large model training and low-latency inference. …
As deep learning research pushes towards using larger and more sophisticated models, system infrastructure must use many GPUs efficiently. Analyzing the dataflow graph that represents the DNN computation is a promising avenue for optimization. By specializing execution for a given dataflow graph, we can accelerate DNN computation in ways that are transparent to programmers. In this talk, I show the benefits of dataflow graph analysis by discussing two recent systems that we've built to support large model training and low-latency inference. To train very large DNN models, Tofu automatically re-writes a dataflow graph of tensor operators into an equivalent parallel graph in which each original operator can be executed in parallel across multiple GPUs.  To achieve low-latency inference, Batchmaker discovers identical sub-graph computation among different requests to enable batched execution of requests arriving at different times. 
Read more

Humans and Machines: From Data Elicitation to Helper-AI

Goran Radanovic Harvard University
16 May 2019, 2:00 pm - 3:30 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
Recent AI advances have been driven by high-quality input data, often labeled by human annotators. A fundamental challenge in eliciting high-quality information from humans is that there is often no way to directly verify the quality of the information they provide. Consider, for example, product reviews and marketing surveys where data is inherently subjective, environmental community sensing where data is highly localized, and geopolitical forecasting where the ground truth is revealed in the distant future. In these settings, …
Recent AI advances have been driven by high-quality input data, often labeled by human annotators. A fundamental challenge in eliciting high-quality information from humans is that there is often no way to directly verify the quality of the information they provide. Consider, for example, product reviews and marketing surveys where data is inherently subjective, environmental community sensing where data is highly localized, and geopolitical forecasting where the ground truth is revealed in the distant future. In these settings, data elicitation has to rely on peer-consistency mechanisms, which incentivize high-quality reporting by examining the consistency between the reports of different data providers. In this talk, I will discuss some of the recent advances in peer-consistency designs. Furthermore, I will outline some thoughts on an agenda around the design of human-AI collaborative systems.
Read more

Programming Abstractions for Verifiable Software

Damien Zufferey Max Planck Institute for Software Systems
15 May 2019, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In this talk, I will show how we can harness the synergy between programming languages and verification methods to help programmers build reliable software. First, we will look at fault-tolerant distributed algorithms. These algorithms are central to any high-availability application but they are notoriously difficult to implement due to asynchronous communication and faults. A fault- tolerant consensus algorithms which can be described in ~50 lines of pseudo code can easily turns into a few thousand lines of actual code. …
In this talk, I will show how we can harness the synergy between programming languages and verification methods to help programmers build reliable software. First, we will look at fault-tolerant distributed algorithms. These algorithms are central to any high-availability application but they are notoriously difficult to implement due to asynchronous communication and faults. A fault- tolerant consensus algorithms which can be described in ~50 lines of pseudo code can easily turns into a few thousand lines of actual code. To remediate this, I will introduce PSync a domain specific language for fault-tolerant distributed algorithms. The key insight is the use of communication-closure (logical boundaries in a program that messages should not cross) to structure the code. Communication-closure gives a syntactic scope to the communication, provides some form of logical time, and give the illusion of synchrony. These element greatly simplify the programming and verification of fault-tolerant algorithms. In the second part of the talk, we will discuss a new project exploring how advances in rapid prototyping (3D printers) may impact how we develop software for robots. These advances may soon be enable adding computational elements as part of the internal structure of objects. The goal of this project is to rethink the software/hardware boundary and integrate the two together. I will present a system we are developing where components integrate for geometry (hardware) and behavior (software). The system allows from bottom-up composition and top-down decomposition. The bottom-up composition connects components together to achieve more complex behaviors. The top-down decomposition project a global specification on the individual components and performs verification at the level of individual components.
Read more

Conclave: Secure Multi-Party Computation on Big Data

Nikolaj Volgushev Boston University
15 May 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
Secure Multi-Party Computation (MPC) allows mutually distrusting parties to run joint computations without revealing private data. Current MPC algorithms scale poorly with data size, which makes MPC on "big data" prohibitively slow and inhibits its practical use. Many relational analytics queries can maintain MPC’s end-to-end security guarantee without using cryptographic MPC techniques for all operations. Conclave is a query compiler that accelerates such queries by transforming them into a combination of data-parallel, local cleartext processing and small MPC steps. …
Secure Multi-Party Computation (MPC) allows mutually distrusting parties to run joint computations without revealing private data. Current MPC algorithms scale poorly with data size, which makes MPC on "big data" prohibitively slow and inhibits its practical use. Many relational analytics queries can maintain MPC’s end-to-end security guarantee without using cryptographic MPC techniques for all operations. Conclave is a query compiler that accelerates such queries by transforming them into a combination of data-parallel, local cleartext processing and small MPC steps. When parties trust others with specific subsets of the data, Conclave applies new hybrid MPC-cleartext protocols to run additional steps outside of MPC and improve scalability further. Our Conclave prototype generates code for cleartext processing in Python and Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave scales to data sets between three and six orders of magnitude larger than state-of-the-art MPC frameworks support on their own. Thanks to its hybrid protocols and additional optimizations, Conclave also substantially outperforms SMCQL, the most similar existing system.
Read more

A constructive proof of dependent choice in classical arithmetic via memoization

Étienne Miquey Inria, Nantes
09 May 2019, 10:30 am - 12:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
In 2012, Herbelin developed a calculus (dPAω) in which constructive proofs for the axioms of countable and dependent choices can be derived via the memoization of choice functions However, the property of normalization (and therefore the one of soundness) was only conjectured. The difficulty for the proof of normalization is due to the simultaneous presence of dependent dependent types (for the constructive part of the choice), of control operators (for classical logic), of coinductive objects (to encode functions of type ℕ→A into streams (a₀,a₁,…)) and of lazy evaluation with sharing (for these coinductive objects). …
In 2012, Herbelin developed a calculus (dPAω) in which constructive proofs for the axioms of countable and dependent choices can be derived via the memoization of choice functions However, the property of normalization (and therefore the one of soundness) was only conjectured. The difficulty for the proof of normalization is due to the simultaneous presence of dependent dependent types (for the constructive part of the choice), of control operators (for classical logic), of coinductive objects (to encode functions of type ℕ→A into streams (a₀,a₁,…)) and of lazy evaluation with sharing (for these coinductive objects). Building on previous works, we introduce a variant of dPAω presented as a sequent calculus. On the one hand, we take advantage of a variant of Krivine classical realizability we developed to prove the normalization of classical call-by-need. On the other hand, we benefit of dLtp, a classical sequent calculus with dependent types in which type safety is ensured using delimited continuations together with a syntactic restriction. By combining the techniques developed in these papers, we manage to define a realizability interpretation à la Krivine of our calculus that allows us to prove normalization and soundness. This talk will go over the whole process, starting from Herbelin’s calculus dPAω until the introduction of its sequent calculus counterpart dLPAω that we prove to be sound.
Read more

Combinatorial Constructions for Effective Testing

Filip Nikšic Max Planck Institute for Software Systems
03 May 2019, 3:30 pm - 4:45 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029 / Meeting ID: 6312
SWS Student Defense Talks - Thesis Defense
Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in com- plicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of k features for a small constant k by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. …
Large-scale distributed systems consist of a number of components, take a number of parameter values as input, and behave differently based on a number of non-deterministic events. All these features—components, parameter values, and events—interact in com- plicated ways, and unanticipated interactions may lead to bugs. Empirically, many bugs in these systems are caused by interactions of only a small number of features. In certain cases, it may be possible to test all interactions of k features for a small constant k by executing a family of tests that is exponentially or even doubly-exponentially smaller than the family of all tests. Thus, in such cases we can effectively uncover all bugs that require up to k-wise interactions of features.

In this thesis we study two occurrences of this phenomenon. First, many bugs in distributed systems are caused by network partition faults. In most cases these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. Second, bugs may occur due to unexpected schedules (interleavings) of concurrent events— concurrent exchange of messages and concurrent access to shared resources. Again, many bugs depend only on the relative ordering of a small number of events. We call the smallest number of events whose ordering causes a bug the depth of the bug. We show that in both testing scenarios we can effectively uncover bugs involving small number of nodes or bugs of small depth by executing small families of tests.

We phrase both testing scenarios in terms of an abstract framework of tests, testing goals, and goal coverage. Sets of tests that cover all testing goals are called covering families. We give a general construction that shows that whenever a random test covers a fixed goal with sufficiently high probability, a small randomly chosen set of tests is a covering family with high probability. We then introduce concrete coverage notions relating to network partition faults and bugs of small depth. In case of network partition faults, we show that for the introduced coverage notions we can find a lower bound on the probability that a random test covers a given goal. Our general construction then yields a randomized testing procedure that achieves full coverage—and hence, find bugs— quickly.

In case of coverage notions related to bugs of small depth, if the events in the program form a non-trivial partial order, our general construction may give a suboptimal bound. Thus, we study other ways of constructing convering families. We show that if the events in a concurrent program are partially ordered as a tree, we can explicitly construct a covering family of small size: for balanced trees, our construction is polylogarithmic in the number of events. For the case when the partial order of events does not have a "nice" structure, and the events and their relation to previous events are revealed while the program is running, we give an online construction of covering families. Based on the construction, we develop a randomized scheduler called PCTCP that uniformly samples schedules from a covering family and has a rigorous guarantee of finding bugs of small depth. We experiment with an implementation of PCTCP on two real-world distributed systems—Zookeeper and Cassandra—and show that it can effectively find bugs.
Read more

Sharing-Aware Resource Management for Performance and Protection

Sandhya Dwarkadas Department of Computer Science, University of Rochester
02 May 2019, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Distinguished Lecture Series
Recognizing that applications (whether in mobile, desktop, or server environments) are rarely executed in isolation today, I will discuss some practical challenges in making best use of available hardware and our approach to addressing these challenges. I will describe two independent and complementary control mechanisms using low-overhead hardware performance counters that we have developed: a sharing- and resource-aware mapper (SAM) to effect task placement with the goal of localizing shared data communication and minimizing resource contention based on the offered load; …
Recognizing that applications (whether in mobile, desktop, or server environments) are rarely executed in isolation today, I will discuss some practical challenges in making best use of available hardware and our approach to addressing these challenges. I will describe two independent and complementary control mechanisms using low-overhead hardware performance counters that we have developed: a sharing- and resource-aware mapper (SAM) to effect task placement with the goal of localizing shared data communication and minimizing resource contention based on the offered load; and an application parallelism manager (MAPPER) that controls the offered load with the goal of improving system parallel efficiency. If time permits, I will also outline our work on streamlining instruction memory management and address translation to eliminate redundancy and improve efficiency, especially in mobile environments. Our results emphasize the need for low-overhead monitoring of application behavior under changing environmental conditions in order to adapt to environment and application behavior changes.
Read more

Edge Computing in the Extreme and its Applications

Suman Banerjee University of Wisconsin-Madison
30 Apr 2019, 1:00 pm - 2:30 pm
Saarbrücken building E1 5, room 105
simultaneous videocast to Kaiserslautern building G26, room 111 / Meeting ID: 6312
SWS Colloquium
The notion of edge computing introduces new computing functions away from centralized locations and closer to the network edge and thus facilitating new applications and services. This enhanced computing paradigm is provides new opportunities to applications developers, not available otherwise. In this talk, I will discuss why placing computation functions at the extreme edge of our network infrastructure, i.e., in wireless Access Points and home set-top boxes, is particularly beneficial for a large class of emerging applications. …
The notion of edge computing introduces new computing functions away from centralized locations and closer to the network edge and thus facilitating new applications and services. This enhanced computing paradigm is provides new opportunities to applications developers, not available otherwise. In this talk, I will discuss why placing computation functions at the extreme edge of our network infrastructure, i.e., in wireless Access Points and home set-top boxes, is particularly beneficial for a large class of emerging applications. I will discuss a specific approach, called ParaDrop, to implement such edge computing functionalities, and use examples from different domains -- smarter homes, sustainability, and intelligent transportation -- to illustrate the new opportunities around this concept.
Read more