Recent events

Fundamental Algorithmic Problems and Challenges in Dynamical and Cyber-Physical Systems

Jo├źl Ouaknine
Max Planck Institute for Software Systems
Joint Lecture Series
06 Dec 2017, 12:15 pm - 1:15 pm
Saarbr├╝cken building E1 5, room 002
Dynamical and cyber-physical systems, whose continuous evolution is subject to differential equations, permeate vast areas of engineering, physics, mathematics, and computer science. In this talk, I consider a selection of fundamental algorithmic problems for such systems, such as reachability, invariant synthesis, and controllability. Although the decidability and complexity of many of these problems are open, some partial and conditional results are known, occasionally resting on certain number-theoretic hypotheses such as Schanuel's conjecture. More generally, the study of algorithmic problems for dynamical and cyber-physical systems draws from an eclectic array of mathematical tools, ranging from Diophantine approximation to algebraic geometry. I will present a personal and select overview of the field and discuss areas of current active research.

Blocking Analysis of Spin Locks under Partitioned Fixed-Priority Scheduling

Alexander Wieder
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Defense
29 Nov 2017, 4:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
Partitioned fixed-priority scheduling is widely used in embedded multicore real-time systems. In multicore systems, spin locks are one well-known technique used to synchronize conflicting accesses from different processor cores to shared resources (e.g., data structures). The use of spin locks can cause blocking. Accounting for blocking is a crucial part of static analysis techniques to establish correct temporal behavior.

In this thesis, we consider two aspects inherent to the partitioned fixed-priority scheduling of tasks sharing resources protected by spin locks: (1) the assignment of tasks to processor cores to ensure correct timing, and (2) the blocking analysis required to derive bounds on the blocking.

Heuristics commonly used for task assignment fail to produce assignments that ensure correct timing when shared resources protected by spin locks are used. We present an optimal approach that is guaranteed to find such an assignment if it exists (under the original MSRP analysis). Further, we present a well-performing and inexpensive heuristic.

For most spin lock types, no blocking analysis is available in prior work, which renders them unusable in real-time systems. We present a blocking analysis approach that supports eight different types and is less pessimistic than prior analyses, where available. Further, we show that allowing nested requests for FIFO- and priority-ordered locks renders the blocking analysis problem NP-hard.

Formal Proof of Polynomial-Time Complexity with Quasi-Interpretations

Hugo F├ęr├ęe
University of Kent
SWS Colloquium
22 Nov 2017, 10:30 am - 12:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
Formal reasoning about computational complexity turns out to be quite cumbersome as it often requires to manipulate explicit time bounds for a specific machine model. Implicit Computational Complexity is a research area which provides techniques and complexity classes characterisations to avoid this. We have formally verified the soundness (and part of the completeness) of such a technique - called quasi-interpretations - using the Coq proof assistant. In particular, we turned this into a tool to help guarantee the polynomial complexity of programs (here, term rewriting systems).

Understanding & Controlling User Privacy in Social Media via Exposure

Mainack Mondal
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Defense
20 Nov 2017, 4:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
The recent popularity of Online Social Media sites (OSM) like Facebook and Twitter have led to a renewed discussion about user privacy. In fact, numerous recent news reports and research studies on user privacy stress the OSM users' urgent need for better privacy control mechanisms. Thus, today, a key research question is: how do we provide improved privacy protection to OSM users for their social content? In my thesis, we propose a systematic approach to address this question.

We start with the access control model, the dominant privacy model in OSMs today. We show that, while useful, the access control model does not capture many theoretical and practical aspects of privacy. Thus, we propose a new model, which we term the exposure control model. We define exposure for a piece of content as the set of people who actually view the content. We demonstrate that our model is a significant improvement over access control to capture users' privacy requirements. Next, we investigate the effectiveness of our model to protect users' privacy in three real world scenarios: (1) Understanding and controlling exposure using social access control lists (SACLs) (2) Controlling exposure by limiting large-scale social data aggregators and (3) Understanding and controlling longitudinal exposure in OSMs, i.e., how users control exposure of their old OSM content. We show that, in each of these cases, the exposure control-based approach helps us to design improved privacy control mechanisms.

Program Logic for Weak Memory Concurrency

Marko Doko
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
17 Nov 2017, 3:00 pm
Kaiserslautern building G26, room 111
Most of the verification techniques for concurrency assume the sequentially consistent memory model, i.e., the executions of a concurrent program consist of all the possible interleavings of the instructions of its threads. On the other hand, in order to improve performance or conserve energy, modern hardware implementations give us what is known as weak memory models; that is, models of concurrency that allow some executions which cannot be explained by sequential consistency. Moreover, modern programming languages provide their own language-level memory models which strive to allow all the behaviors allowed by the various hardware-level memory models and, additionally, to allow some behaviors that can occur as a result of desired compiler optimizations.

These weak memory models are often quite intricate, and it can be quite difficult for programmers to be aware of all the possible behaviors of their programs. For this reason, it is very useful to have an abstraction layer over the model, so that we can make sure our programs are correct without understanding all the messy details of the underlying memory model. Program logics are one way of constructing such an abstraction--they can present us with clean and simple syntactical rules that we can use to reason about our programs, while not requiring any understanding of the memory model for which the logic has been proven sound.

In this thesis I develop FSL, a logic for reasoning about programs under the memory model introduced by the 2011 standards for C and C++ programming languages. FSL provides a set of clean and simple rules that abstract the intricacies of the underlying model, while still being powerful enough to reason about algorithms that appear in practice.

The main novelty of FSL is the ability to reason about synchronization via memory fences, an advanced synchronization feature of the C11 memory model that was not supported by any prior logic.

Correct Compilation of Relaxed Memory Concurrency

Soham Chakraborty
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
16 Nov 2017, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbr├╝cken building E1 5, room 029
Compilation of shared memory concurrent programs faces many challenges in the presence of relaxed memory models. On the one hand, relaxed memory models enable multiple transformations on shared memory accesses and fences. On the other hand, not all transformations which are correct for sequential programs are correct in the concurrent setting. Thus, compiler writers have to perform careful analysis to determine which transformations are correct.

In my thesis, I explore the correctness of the multiple steps during the compilation of concurrent C/C++ programs (termed C11). First, I consider the correctness of C11 source to source transformations. Next, I study the compilation of C11 programs by LLVM, a state-of-the-art optimizing compiler. LLVM performs a number of optimizing transformations before generating the target code for architectures such as X86 and PowerPC. To perform these transformations, LLVM follows an informal refinement of C11 concurrency semantics. I formalize the LLVM concurrency semantics and study at the abstract level the correctness of various LLVM transformations. Finally, I develop a validator to check if LLVM optimizations are indeed performing only transformations that are correct according to the LLVM and/or the C11 model.

Toward Data-Driven Education

Rakesh Agrawal
EPFL
SWS Distinguished Lecture Series
13 Nov 2017, 10:00 am - 11:30 am
Saarbr├╝cken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
An educational program of study can be viewed as a knowledge graph consisting of learning units and relationships between them. Such a knowledge graph provides the core data structure for organizing and navigating learning experiences. We address three issues in this talk. First, how can we synthesize the knowledge graph, given a set of concepts to be covered in the study program. Next, how can we use data mining to identify and correct deficiencies in a knowledge graph. Finally, how can we use data mining to form study groups with the goal of maximizing overall learning. We conclude by pointing out some open research problems.

Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning

Nina Grgic-Hlaca
Max Planck Institute for Software Systems
SWS Student Defense Talks - Qualifying Exam
10 Nov 2017, 3:00 pm - 4:00 pm
Saarbr├╝cken building E1 5, room 029
With wide-spread usage of machine learning methods in numerous domains involving human subjects, several studies have raised questions about the potential for unfairness towards certain individuals or groups.

A number of recent works have proposed methods to measure and eliminate unfairness from machine learning methods. However, most of this work has focused on only one dimension of fair decision making: distributive fairness, i.e., the fairness of the decision outcomes. In this work, we leverage the rich literature on organizational justice and focus on another dimension of fair decision making: procedural fairness, i.e., the fairness of the decision making process.

We propose measures for procedural fairness that consider the input features used in the decision process, and evaluate the moral judgments of humans regarding the use of these features. We operationalize these measures on two real world datasets using human surveys on the Amazon Mechanical Turk (AMT) platform, demonstrating that we capture important properties of procedurally fair decision making. We provide fast submodular mechanisms to optimize the tradeoff between procedural fairness and prediction accuracy. On our datasets, we observe empirically that procedural fairness may be achieved with little cost to outcome fairness, but that some loss of accuracy is unavoidable.

StaRVOOrS: Combined Static and Runtime Verification of Object-Oriented Software

Wolfgang Ahrendt
Chalmers University
SWS Colloquium
09 Nov 2017, 11:00 am - 12:30 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
Static verification techniques are used to analyse and prove properties about programs before they are deployed. In particular, deductive verification techniques often work directly on the source code and are used to verify data-oriented properties of all possible executions. In contrast, runtime verification techniques have been extensively used for control-oriented properties, and analyse concrete executions that occur in the deployed program. I present an approach in which data-oriented and control-oriented properties may be stated in a single formalism amenable to both static and dynamic verification techniques. The specification language enhances a control-oriented property language with data-oriented pre/postconditions. We show how such specifications can be analysed using a combination of the deductive verification system KeY and the runtime verification tool LARVA. Verification is performed in two steps: KeY performs fully automated proof attempts, the resulting partial proofs are analysed, and used to optimize the specification for efficient runtime checking.

Quantifying & Characterizing Information Diets of Social Media Users

Juhi Kulshrestha
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
08 Nov 2017, 4:00 pm
Saarbr├╝cken building E1 5, room 029
With an increasing number of people around the world relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them, there has a been a paradigm shift in the way news and information is exchanged in our society -- from traditional mass media to online social media. This paradigm shift has led to three key changes in the way that people are exchanging information: (i) Unlike the subscribers of mass media, online social media users are not just passive consumers of information, but they are also active publishers of content on these platforms. (ii) Social media users curate personalized word of mouth channels for themselves by creating social links to their preferred sources, and therefore unlike broadcast mass media where all subscribers receive the same information, individual users on social media might receive information that is very different from what other users are consuming. (iii) Social media users often rely on automated retrieval algorithms like personalized recommendation and search systems deployed by social media platform providers to discover interesting information from the deluge of user-generated content shared on these platforms.

While the area of media studies has traditionally focused mostly on broadcast mass media, with the changing environment it's important to study the news and information consumption of social media users and to also audit how the automated algorithms are modifying what the social media users consume. In this thesis, we fulfill this high-level goal by following a two-fold approach. First, we propose the concept of information diets -- which is the composition of information being produced or consumed -- to measure and reason about the bias and diversity in information production and consumption on social media platforms. We then quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users select for themselves by creating social links, (b) recommendations that the social media platform providers give to the users, and (c) the search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic source diversity, and political perspectives.

Our work is aimed at making social media users more aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effect of these biases through better information discovery and exchange systems on social media.

Fine-Grained Complexity: Hardness for a Big Data World

Karl Bringmann
MPI-INF - D1
Joint Lecture Series
08 Nov 2017, 12:15 pm - 1:15 pm
Saarbr├╝cken building E1 5, room 002
For many computational problems we know some polynomial time algorithm, say in quadratic time, but it is open whether faster algorithms exist. In a big data world, it is essential to close this gap: If the true complexity of the problem is indeed quadratic, then it is intractable on data arising in areas such as DNA sequencing or social networks. On such data essentially only near-linear time algorithms are feasible. Unfortunately, classic hardness assumptions such as P!=NP are too coarse to explain the gap between linear and quadratic time.

Fine-grained complexity comes to the rescue by providing conditional lower bounds via fine-grained reductions from certain hard core problems. For instance, it allowed us to rule out truly subquadratic algorithms for the Longest Common Subsequence problem (used e.g. in the diff file comparison tool), assuming a certain strengthening of P!=NP. In this talk we will further discuss applications to Edit Distance, Subset Sum, and RNA Folding.

Discrimination in Algorithmic Decision Making: From Principles to Measures and Mechanisms

Muhammad Bilal Zafar
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
09 Oct 2017, 4:00 pm - 5:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
Algorithmic decision making systems are increasingly being used to assist or even replace human decision making in a large number of application domains. These systems rely on complex learning methods and vast amounts of training data to optimize prediction accuracy. The accuracy of these systems has been shown to even surpass human accuracy in some applications. However, there is a growing concern that algorithmic decision making systems can lead, even in the absence of intent, to discriminatory outcomes.

Incorporating nondiscrimination goals into the design of algorithmic decision making systems (or, classifiers) has proven to be quite challenging. These challenges arise mainly due to (i) the computational complexities involved in incorporating nondiscrimination goals, and (ii) inadequacy of existing measures to computationally capture discrimination in certain situations. The goal of this thesis is to tackle these two problems.

First, we aim to design mechanisms to incorporate two existing measures of discrimination, disparate treatment and disparate impact, into the design of well-known classifiers. To this end, we introduce a novel and intuitive mechanism of decision boundary covariance. This mechanism can be included into the formulation of any convex boundary-based classifier in the form of convex constraints without increasing the classifier's complexity. It also allows for fine-grained control over the degree of (non)discrimination, often at a small cost in terms of accuracy.

Second, we propose alternative measures of discrimination that can avoid shortcomings of existing measures in certain situations. Our first proposed measure, disparate mistreatment, is useful in situations when the validity of historical decisions in the training data can be ascertained. The remaining two measures, preferred treatment and preferred impact, are useful in situations when feature and class distributions of different groups subject to the decision making are significantly different, and can additionally help reduce the cost of nondiscrimination. We extend our decision boundary covariance mechanism and incorporate the newly proposed nondiscrimination measures into the formulations of convex boundary-based classifiers, this time as convex-concave constraints. The resulting formulations can be solved efficiently using recent advances in convex-concave programming.

What knowledge bases know (and what they don't)

Simon Razniewski
MPI-INF - D5
Joint Lecture Series
04 Oct 2017, 12:15 pm - 1:15 pm
Saarbr├╝cken building E1 5, room 002
Knowledge bases such as Wikidata, YAGO, or the Google Knowledge Graph are increasingly used in applications like entity resolution, structured search or question answering. The correctness of these knowledge bases is generally well understood and at the center of attention in automated construction techniques. About their completeness, in contrast, most knowledge is anecdotal: Completeness is usually correlated with popularity of topics, yet understanding whether a knowledge base has complete information on a specific topic is often difficult.

In this talk I will introduce the problem of understanding the completeness of knowledge bases. I will present three approaches: (i) data-inherent inferences about completeness using association rule mining, (ii) extraction of cardinality information from text sources, and (iii) various tools we developed to help knowledge base curators and consumers in mapping and understanding the completeness of knowledge bases.

Geometric Complexity Theory: An ambitious approach towards P versus NP

Christian Ikenmeyer
MPI-INF - D1
Joint Lecture Series
06 Sep 2017, 12:15 pm - 1:15 pm
Saarbr├╝cken building E1 5, room 002
Computational complexity theory is concerned with the study of the inherent complexity of computational problems. Its flagship conjecture is the famous P versus NP conjecture, which is one of the seven Millennium Problems of the Clay Mathematics Institute.

To this day several thousands of computational problems are classified as NP-complete, i.e., they have a polynomial time algorithm if and only if P equals NP. The practical importance of the P vs NP conjecture is at least twofold: First of all, many NP-complete problems are of high practical relevance and have to be solved every day in commercial and scientific applications. Secondly, if NP turned out to be P, then this would break all of today's cryptographic ciphers.

Geometric complexity theory is an ambitious program initiated in 2001 by Ketan Mulmuley and Milind Sohoni towards solving the P vs NP problem by using methods from algebraic geometry and representation theory. During the last few years there has been a significant amount of research activity related to this program. In this talk I explain some basic ideas and recent developments.

Combinatorial Constructions for Effective Testing

Filip Niksic
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
14 Aug 2017, 3:00 pm - 4:00 pm
Kaiserslautern building G26, room 113
simultaneous videocast to Saarbr├╝cken building E1 5, room 029
Testing is the predominant approach to finding bugs and gaining assurance in correctness of concurrent and distributed systems with complex state spaces. In this thesis we use combinatorial methods to develop new techniques for systematic and randomized testing. We answer several theoretical questions about the size of test sets that achieve full coverage, and propose a practical tool for testing actor-based programs.

We start with a general construction, obtained using the probabilistic method from combinatorics, that shows that whenever a random test covers a fixed testing goal with sufficiently high probability, there is a small set of random tests that achieves full coverage with high probability. We use this construction in the context of testing distributed systems under network partition faults: our construction explains why random testing tools in this context achieve good coverage--and hence, find bugs--quickly.

In the context of testing concurrent programs, we introduce a notion of hitting families of schedules. It suffices to test schedules from a hitting family if the goal is to expose bugs of "small depth"--bugs that only depend on the relative ordering of a small number of specific events in the program. We study the size of optimal hitting families, and give explicit constructions of small hitting families when the events in the program are partially ordered as a tree. In particular, for balanced trees, our constructions are polylogarithmic in the number of events.

Finally, we propose a testing tool for finding bugs of small depth in actor-based programs. The tool is based on an existing randomized scheduler for shared-memory multithreaded programs. Adapting the scheduler to actor-based programs poses an interesting underlying theoretical problem.

Techniques to enforce security policies on untrusted applications

Anjo Vahldiek-Oberwagner
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
07 Aug 2017, 5:00 pm - 6:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
As the dependence on ever-present computer systems increases, so does the potential harm in case software or hardware deviates from user expectations. Users lose data or find illicitly leaked data. To overcome such inadvertent behavior, existing reference monitors fail to (1) protect the confidentiality and integrity of persistent data, and (2) efficiently and robustly mediate untrusted applications. In this proposal we present two reference monitors targeting these shortcomings. We demonstrate the design, implementation, and evaluation of Guardat and ERIM. The policies protecting persistent data and the mechanisms for their enforcement are spread over many software components and configuration files, increasing the risk of policy violation due to bugs, vulnerabilities and misconfiguration. In Guardat users, developers and administrators specify file protection policies declaratively, concisely and separate from code, and Guardat enforces these policies by mediating I/O in the storage layer. Policy enforcement relies only on the integrity of the Guardat controller and any external policy dependencies. We show experimentally that Guardat overhead is low. While Guardat enforces at the storage layer, it cannot enforce policies over in-memory state of untrusted applications. In contrast to existing techniques, ERIM efficiently mediates an application's execution by isolating a reference monitor in the same address space. By using Intel Memory Protection Keys in combination with static binary rewriting, ERIM isolates the monitor's state from strong, malicious adversaries. We propose binary rewriting rules to harden existing executable files and detail use cases in which prior art relied on less robust protection at similar performance.

Towards Use-Ability in Private Data Analytics

Reinhard Munz
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
04 Aug 2017, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbr├╝cken building E1 5, room 105
Private data analytics systems preferably provide required analytic accuracy to analysts and specified privacy to individuals whose data is analyzed. Devising a general system that works for a broad range of datasets and analytic scenarios has proven to be difficult. Despite the advent of differentially private systems with proven formal privacy guarantees, industry still uses their inferior ad-hoc predecessors that are able to provide better analytic accuracy. Differentially private mechanisms often need to add large amounts of noise to statistical results, which impairs their use-ability. In my thesis I follow two approaches to improve use-ability of private data analytics systems in general and differentially private systems in particular. First, I revisit ad-hoc mechanisms and explore the possibilities of systems that do not provide Differential Privacy or only a weak version thereof. In my second approach I use the insights gained before to propose UniTraX, the first differentially private analytics system that allows to analyze part of a protected dataset without affecting other parts and without having to give up on guaranteed accuracy.

Fairer and more accurate, but for whom?

Alexandra Chouldechova
CMU
SWS Colloquium
25 Jul 2017, 10:30 am - 12:00 pm
Saarbr├╝cken building E1 5, room 005
simultaneous videocast to Kaiserslautern building G26, room 111
Complex statistical models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. In this talk I will describe a subgroup analysis approach for characterizing how models differ in terms of fairness-related quantities such as racial or gender disparities. I will also talk about an ongoing collaboration with the Allegheny County Department of Health and Human Services on developing and implementing a risk assessment tool for use in child welfare referral screening.

https://arxiv.org/abs/1707.00046

http://www.post-gazette.com/local/region/2017/04/09/Allegheny-County-using-algor ithm-to-assist-in-child-welfare-screening/stories/201701290002

Decision Making and The Value of Explanation

Kathy Strandburg
NYU Law
SWS Colloquium
07 Jul 2017, 10:00 am - 11:30 am
Saarbr├╝cken building E1 5, room 005
Much of the policy and legal debate about algorithmic decision-making has focused on issues of accuracy and bias. Equally important, however, is the question of whether algorithmic decisions are understandable by human observers: whether the relationship between algorithmic inputs and outputs can be explained. Explanation has long been deemed a crucial aspect of accountability, particularly in legal contexts. By requiring that powerful actors explain the bases of their decisions -- the logic goes -- we reduce the risks of error, abuse, and arbitrariness, thus producing more socially desirable decisions. Decision-making processes employing machine learning algorithms complicate this equation. Such approaches promise to refine and improve the accuracy and efficiency of decision-making processes, but the logic and rationale behind each decision often remains opaque to human understanding. Indeed, at a technical level, it is not clear that all algorithms can be made explainable and, at a normative level, it is an open question when and if the costs of making algorithms explainable outweigh the benefits. This presentation will begin to map out some of the issues that must be addressed in determining in what contexts, and under what constraints, machine learning approaches to governmental decision-making are appropriate.

Your photos expose your social life - Social relation recognition in 5 social domains

Qianru Sun
MPI-INF - D4
Joint Lecture Series
05 Jul 2017, 12:15 pm - 1:15 pm
Saarbr├╝cken building E1 5, room 002
Social relations are the foundation of human social life. Developing techniques to analyze such relations in visual data, such as photos, bears great potential to build machines that better understand people at a social level. Social domain-based theory from social psychology is a great starting point to systematically approach social relation recognition. The theory provides a coverage of all aspects of social relations and equally is concrete and predictive about the visual attributes and behaviors defining the relations in each social domain. We proposed the first photo dataset built on this holistic conceptualization of social life that is composed of a hierarchical label space of social domains and social relations, and contributed the first models to recognize such domains and relations and find superior performance for attribute based features. Beyond the encouraging performances, we have some findings of interpretable features that are in accordance with the predictions from social psychology literature. We interleave visual recognition and social psychology theory that has the potential to complement the theoretical work in the area with empirical and data-driven models of social life.

Finding Fake News

Giovanni Luca Ciampaglia
Indiana University Network Science Institute
SWS Colloquium
26 Jun 2017, 10:30 am - 12:00 pm
Saarbr├╝cken building E1 5, room 029
Two-thirds of all American adults access the news through social media. But social networks and social media recommendations lead to information bubbles, and personalization and recommendations, by maximizing the click-through rate, lead to ideological polarization. Consequently, rumors, false news, conspiracy theories, and now even fake news sites are an increasingly worrisome phenomena. While media organizations (Snopes.com, PolitiFact, FactCheck.org, et al.) have stepped up their efforts to verify news, political scientists tell us that fact-checking efforts may be ineffective or even counterproductive. To address some of these challenges, researchers at Indiana University are working on an open platform for the automatic tracking of both online fake news and fact-checking on social media. The goal of the platform, named Hoaxy, is to reconstruct the diffusion networks induced by hoaxes and their corrections as they are shared online and spread from person to person.

Discrimination in Online Advertising

Till Speicher
Max Planck Institute for Software Systems
SWS Student Defense Talks - Qualifying Exam
14 Jun 2017, 3:00 pm - 4:00 pm
Saarbr├╝cken building E1 5, room 029
Online targeted advertising has been subject to many complaints for allowing advertisers to discriminate users belonging to sensitive groups, that led for instance Facebook to disallow the use of sensitive features such as ethnic affinity in certain types of ads. In this talk I argue that such measures are not sufficient and that the problem of discrimination in targeted advertising is much more pernicious. I show how a malicious advertiser can create discriminatory ads without using sensitive features. I argue that discrimination measures should be based on the targeted population and not on the attributes used for targeting and I propose ways to mitigate discriminatory targeting.

Comprehensive and Practical Policy Compliance in Data Retrieval Systems

Eslam Elnikety
Max Planck Institute for Software Systems
SWS Student Defense Talks - Thesis Proposal
14 Jun 2017, 1:00 pm - 2:00 pm
Saarbr├╝cken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
Data use policies govern how data retrieval systems process data items coming from many different sources, each subject to its own integrity and confidentiality requirements. Ensuring compliance with these requirements despite bugs, misconfigurations, and operator errors in a large, complex, and fast evolving system is a major challenge.

In this thesis, we present comprehensive and practical compliance systems to address this challenge. To be comprehensive, compliance systems must be able to enforce policies specific to individual data items or to a particular client's data items, the service provider's own policies, and policies that capture legal requirements. To be practical, compliance systems need to meet stringent requirements: runtime overhead must be low; existing applications can run with little modifications; and bugs, misconfigurations, compromises in application code, or actions by unprivileged operators cannot violate policies.

We present the design and implementation of two comprehensive and practical compliance systems: Thoth and Shai. At a high-level, data use policies are stated in a declarative language separate from application code, and a small reference monitor ensures compliance with these policies. Thoth and Shai differ in enforcement techniques. Thoth tracks data flows through the system at runtime by intercepting I/O at processes' boundaries, and enforces the associated policies. Shai, on the other hand, combines static flow analysis and light-weight runtime monitoring (sandboxes and capabilities) to ensure compliance of data flows. We demonstrate the practicality of these systems using a prototype search engine based on the popular Apache Lucene.