Recent events

Shaping Experience and Expression by Designing Sensorimotor Contingencies

Paul Strohmeier MPI-INF - D4
06 Mar 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Our experience of the world is mediated by a continuous feedback loop between motor activity and sensory stimuli. Manipulating these sensorimotor loops offers a powerful means to influence user experience. Such loops, updating in milliseconds, demand devices with higher temporal resolution than those typically used in conventional human-computer interaction (HCI). In this talk, I will show how tapping into sensorimotor loops with tactile feedback allows us to alter our experience of the physical world and our own bodily sensations. ...
Our experience of the world is mediated by a continuous feedback loop between motor activity and sensory stimuli. Manipulating these sensorimotor loops offers a powerful means to influence user experience. Such loops, updating in milliseconds, demand devices with higher temporal resolution than those typically used in conventional human-computer interaction (HCI). In this talk, I will show how tapping into sensorimotor loops with tactile feedback allows us to alter our experience of the physical world and our own bodily sensations. Additionally, I will discuss how we can go beyond simply augmenting experience to enhancing human expression through sensorimotor augmentation.
Read more

Hardware and Software Codesign

Marcus Pirron Max Planck Institute for Software Systems
05 Mar 2024, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Modern robotic applications consist of a variety of robotic systems that work together to achieve complex tasks. Programming these applications draws from multiple fields of knowledge and typically involves low-level imperative programming languages that provide little to no support for abstraction or reasoning. We present a unifying programming model, ranging from automated controller synthesis for individual robots to a compositional reasoning framework for inter-robot coordination. We provide novel methods on the topics of control and planning of modular robots, ...
Modern robotic applications consist of a variety of robotic systems that work together to achieve complex tasks. Programming these applications draws from multiple fields of knowledge and typically involves low-level imperative programming languages that provide little to no support for abstraction or reasoning. We present a unifying programming model, ranging from automated controller synthesis for individual robots to a compositional reasoning framework for inter-robot coordination. We provide novel methods on the topics of control and planning of modular robots, making contributions in three main areas: controller synthesis, concurrent systems, and verification. Our method synthesizes control code for serial and parallel manipulators and leverages physical properties to synthesize sensing abilities. This allows us to determine parts of the system's state that previously remained unmeasured. Our synthesized controllers are robust; we are able to detect and isolate faulty parts of the system, find alternatives, and ensure continued operation. On the concurrent systems side, we deal with dynamic controllers affecting the physical state, geometric constraints on components, and synchronization between processes. We provide a programming model for robotics applications that consists of assemblies of robotic components together with a run-time and a verifier. Our model combines message-passing concurrent processes with motion primitives and explicit modeling of geometric frame shifts, allowing us to create composite robotic systems for performing tasks that are unachievable for individual systems. We provide a verification algorithm based on model checking and SMT solvers that statically verifies concurrency-related properties (e.g. absence of deadlocks) and geometric invariants (e.g. collision-free motions). Our method ensures that jointly executed actions at end-points are communication-safe and deadlock-free, providing a compositional verification methodology for assemblies of robotic components with respect to concurrency and dynamic invariants. Our results indicate the efficiency of our novel approach and provide the foundation of compositional reasoning of robotic systems.
Read more

Beyond the Search Bar in Human-AI Interactions: Augmenting Discovery, Synthesis, and Creativity With User-Generated Context

Srishti Palani University of California, San Diego
05 Mar 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Searching and exploring information online is integral to everyday life, shaping how we learn, work, and create. As the Web paradigm evolves to include foundational AI models and beyond, we are experiencing a shift in how we search and work. With this transformation in human-AI interaction, it is important to investigate how we might present the user with the right information in the right context, the right representation, and at the right time. In this talk, ...
Searching and exploring information online is integral to everyday life, shaping how we learn, work, and create. As the Web paradigm evolves to include foundational AI models and beyond, we are experiencing a shift in how we search and work. With this transformation in human-AI interaction, it is important to investigate how we might present the user with the right information in the right context, the right representation, and at the right time. In this talk, I will share how I have explored these questions in the context of complex critical information work (such as knowledge discovery, synthesis, and creativity). I present insights about user behaviors and challenges from mixed-method studies observing how people conduct this work using today’s tools. Then, I present novel AI-powered tools and techniques that augment these cognitive processes by mining rich contextual signals from unstructured user-generated artifacts. By deepening our understanding of human cognition and behavior and building tools that understand user contexts more meaningfully, I envision a future where human-AI interactions are more personalized, context-aware, cognitively-convivial, and truly collaborative.
Read more

Data Privacy in the Decentralized Era

Amrita Roy Chowdhury University of California-San Diego
01 Mar 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Data is today generated on smart devices at the edge, shaping a decentralized data ecosystem comprising multiple data owners (clients) and a service provider (server). Clients interact with the server with their personal data for specific services, while the server performs analysis on the joint dataset. However, the sensitive nature of the involved data, coupled with inherent misalignment of incentives between clients and the server, breeds mutual distrust. Consequently, a key question arises: How to facilitate private data analytics within a decentralized data ecosystem, ...
Data is today generated on smart devices at the edge, shaping a decentralized data ecosystem comprising multiple data owners (clients) and a service provider (server). Clients interact with the server with their personal data for specific services, while the server performs analysis on the joint dataset. However, the sensitive nature of the involved data, coupled with inherent misalignment of incentives between clients and the server, breeds mutual distrust. Consequently, a key question arises: How to facilitate private data analytics within a decentralized data ecosystem, comprising multiple distrusting parties?

My research shows a way forward by designing systems that offer strong and provable privacy guarantees while preserving data functionality. I accomplish this by systematically exploring the synergy between cryptography and differential privacy, exposing their rich interconnections in both theory and practice. In this talk, I will focus on two systems, CryptE and EIFFeL, which enable privacy-preserving query analytics and machine learning, respectively.
Read more

Methods for Financial Stability Analysis

Christoph Siebenbrunner WU Vienna
28 Feb 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
We present a number of methods used by central banks and regulatory authorities to assess the stability of the financial system, including stress tests, network analysis of the interbank market and models of interbank contagion and fire sales, aiming to capture the dynamics similar to those observed during the 2008 financial crisis. We will discuss the key role of banks in the money creation process, how this relates to monetary aggregates and what the introduction of central bank digital currencies and their different implementation options may mean in this context. ...
We present a number of methods used by central banks and regulatory authorities to assess the stability of the financial system, including stress tests, network analysis of the interbank market and models of interbank contagion and fire sales, aiming to capture the dynamics similar to those observed during the 2008 financial crisis. We will discuss the key role of banks in the money creation process, how this relates to monetary aggregates and what the introduction of central bank digital currencies and their different implementation options may mean in this context.

--

Please contact the office team for link information
Read more

Computational Approaches to Narrative Analysis

Maria Antoniak Allen Institute
26 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
People use storytelling to persuade, to entertain, to inform, and to make sense of their experiences as a community—and as a form of self-disclosure, personal storytelling can strengthen social bonds, build trust, and support the storyteller’s health. But computational analysis of narratives faces challenges, including difficulty in defining stories, lack of annotated datasets, and need to generalize across diverse settings. In this talk, I’ll present work addressing these challenges that uses methods from natural language processing (NLP) to measure storytelling both across online communities and within specific social contexts. ...
People use storytelling to persuade, to entertain, to inform, and to make sense of their experiences as a community—and as a form of self-disclosure, personal storytelling can strengthen social bonds, build trust, and support the storyteller’s health. But computational analysis of narratives faces challenges, including difficulty in defining stories, lack of annotated datasets, and need to generalize across diverse settings. In this talk, I’ll present work addressing these challenges that uses methods from natural language processing (NLP) to measure storytelling both across online communities and within specific social contexts. This work has implications for NLP methods for story detection, the literary field of narratology, cooperative work in online communities, and better healthcare support. As one part of my research in NLP and cultural analytics, it also highlights how NLP methods can be used creatively and reliably to study human experiences.
Read more

Global Investigation of Network Connection Tampering

Ramakrishnan Sundara Raman University of Michigan
23 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
As the Internet's user base and criticality of online services continue to expand daily, powerful adversaries like Internet censors are increasingly monitoring and restricting Internet traffic. These adversaries, powered by advanced network technology, perform large-scale connection tampering attacks seeking to prevent users from accessing specific online content, compromising Internet availability and integrity. In recent years, we have witnessed recurring censorship events affecting Internet users globally, with far-reaching social, financial, and psychological consequences, making them important to study. ...
As the Internet's user base and criticality of online services continue to expand daily, powerful adversaries like Internet censors are increasingly monitoring and restricting Internet traffic. These adversaries, powered by advanced network technology, perform large-scale connection tampering attacks seeking to prevent users from accessing specific online content, compromising Internet availability and integrity. In recent years, we have witnessed recurring censorship events affecting Internet users globally, with far-reaching social, financial, and psychological consequences, making them important to study. However, characterizing tampering attacks at the global scale is an extremely challenging problem, given intentionally opaque practices by adversaries, varying tampering mechanisms and policies across networks, evolving environments, sparse ground truth, and safety risks in collecting data. In this talk, I will describe my research on building empirical methods to characterize connection tampering globally and investigate the network technology enabling tampering. First, I will introduce novel network measurement methods for locating and examining network devices that perform censorship. Next, I will describe a modular design for the Censored Planet Observatory that enables it to remotely and sustainably measure Internet censorship longitudinally in more than 200 countries. I will introduce time series analysis methods to detect key censorship events in longitudinal Censored Planet data, and reveal global censorship trends. Finally, I will describe exciting ongoing and future research directions, such as building intelligent measurement platforms.
Read more

High-stakes decisions from low-quality data: AI decision-making for planetary health

Lily Xu Harvard University
21 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, ...
Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, overcoming the messy data ubiquitous in these settings. Here, I’ll present technical advances in stochastic bandits, robust reinforcement learning, and restless bandits, addressing research questions that emerge from my close collaboration with the public sector. I’ll also discuss bridging the gap from research and practice, including anti-poaching field tests in Cambodia, field visits in Belize and Uganda, and large-scale deployment with SMART conservation software.
Read more

Paths to AI Accountability

Sarah Cen Massachusetts Institute of Technology
19 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
In the past decade, we have begun grappling with difficult questions related to the rise of AI, including: What rights do individuals have in the age of AI? When should we regulate AI and when should we abstain? What degree of transparency is needed to monitor AI systems? These questions are all concerned with AI accountability: determining who owes responsibility and to whom in the age of AI. In this talk, I will discuss the two main components of AI accountability, ...
In the past decade, we have begun grappling with difficult questions related to the rise of AI, including: What rights do individuals have in the age of AI? When should we regulate AI and when should we abstain? What degree of transparency is needed to monitor AI systems? These questions are all concerned with AI accountability: determining who owes responsibility and to whom in the age of AI. In this talk, I will discuss the two main components of AI accountability, then illustrate them through a case study on social media. Within the context of social media, I will focus on how social media platforms filter (or curate) the content that users see. I will review several methods for auditing social media, drawing from concepts and tools in hypothesis testing, causal inference, and LLMs.
Read more

Programming Theory in Security Analysis: A Tripartite Framework for Vulnerability Specification

Yinxi Liu Chinese University of Hong Kong
15 Feb 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Living in a computer-reliant era, we’re balancing the power of computer systems with the challenges of ensuring their functional correctness and security. Program analysis has proven successful in addressing these issues by predicting the behavior of a system when executed. However, the complexity of conducting program analysis significantly arises as modern applications employ advanced, high-level programming languages and progressively embody the structure of a composite of independent modules that interact in sophisticated manners. In this talk, ...
Living in a computer-reliant era, we’re balancing the power of computer systems with the challenges of ensuring their functional correctness and security. Program analysis has proven successful in addressing these issues by predicting the behavior of a system when executed. However, the complexity of conducting program analysis significantly arises as modern applications employ advanced, high-level programming languages and progressively embody the structure of a composite of independent modules that interact in sophisticated manners. In this talk, I will detail how to apply programming language theory to construct refined vulnerability specifications and reduce the complexity of program analysis across computational, conformational, and compositional aspects: - My primary focus will be on introducing some formal specifications that I have developed for modeling the common exponential worst-case computational complexity inherent in modern programming languages. These specifications have guided the first worst-case polynomial solution for detecting performance bugs in regexes. - I will also briefly discuss why generating inputs with complex conformation to target deep-seated bugs is a significant obstacle for existing techniques, and how I devised strategies to generate more sound input by intentionally satisfying previously unrecognized forms of dependencies. - Finally, as part of a vision to enhance security analysis in modern distributed systems, where different operations can be composed in a complex way and may interleave with each other, I will briefly discuss my efforts to establish new security notions to identify non-atomic operations in smart contracts and deter any potential attacks that might exploit their interactions.
Read more

Formal Reasoning about Relational Properties in Large-Scale Systems

Jana Hofmann Azure Research
14 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Establishing strong guarantees for security-critical systems has never been more challenging. On the one hand, systems become increasingly complex and intricate. On the other hand, many security requirements are relational, i.e., they compare several execution traces. Examples are noninterference properties, program indistinguishability, and even algorithmic fairness. Due to their relational nature, formal reasoning about such properties quickly blows up in complexity. In this talk, I discuss the necessity to scale relational reasoning to larger software and black-box systems and present two of our recent contributions to tackle this challenge. ...
Establishing strong guarantees for security-critical systems has never been more challenging. On the one hand, systems become increasingly complex and intricate. On the other hand, many security requirements are relational, i.e., they compare several execution traces. Examples are noninterference properties, program indistinguishability, and even algorithmic fairness. Due to their relational nature, formal reasoning about such properties quickly blows up in complexity. In this talk, I discuss the necessity to scale relational reasoning to larger software and black-box systems and present two of our recent contributions to tackle this challenge. In the first part, I focus on formal algorithms for white-box systems and show how to combine relational reasoning with non-relational specifications to enable the synthesis of smart contract control flows. In the second part, I focus on relational testing of black-box systems and illustrate its use in modeling and detecting microarchitectural information leakage in modern CPUs.
Read more

Algorithmic Verification of Linear Dynamical Systems

Toghrul Karimov MPI-SWS
08 Feb 2024, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an ω-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Diophantine approximation, ...
Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an ω-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Diophantine approximation, automata theory, and combinatorics on words, we prove decidability of the Model-Checking Problem for large classes of linear dynamical systems and ω regular properties. We further exploit deep connections between linear dynamical systems and contemporary number theory to show that improving any of our decidability results would amount to major mathematical breakthroughs. Our results delineate the boundaries of decision problems of linear dynamical systems that, at the present time, can be solved algorithmically.
Read more

Reliable Measurement for Machine Learning at Scale

A. Feder Cooper Cornell University
08 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
We need reliable measurement in order to develop rigorous knowledge about ML models and the systems in which they are embedded. But reliable measurement is a really hard problem, touching on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. In this talk, I will discuss the criteria needed to take reliability seriously — criteria for designing meaningful metrics, and for methodologies that ensure that we can dependably and efficiently measure these metrics at scale and in practice. ...
We need reliable measurement in order to develop rigorous knowledge about ML models and the systems in which they are embedded. But reliable measurement is a really hard problem, touching on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. In this talk, I will discuss the criteria needed to take reliability seriously — criteria for designing meaningful metrics, and for methodologies that ensure that we can dependably and efficiently measure these metrics at scale and in practice. I will give two examples of my research that put these criteria into practice: (1) large-scale evaluation of training-data memorization in large language models, and (2) evaluating latent arbitrariness in algorithmic fairness binary classification contexts. Throughout this discussion, I will emphasize how important it is to make metrics understandable for other stakeholders in order to facilitate public governance. For this reason, my work aims to design metrics that are legally cognizable — a goal that frames the both my ML and legal scholarship. I will draw on important connections that I have uncovered between ML and law: connections between (1) the generative-AI supply chain and US copyright law, and (2) ML arbitrariness and arbitrariness in legal rules. This talk reflects joint work with collaborators at The GenLaw Center, Cornell CS, Cornell Law School, Google DeepMind, and Microsoft Research.
Read more

New Tools for Text Indexing and Beyond: Substring Complexity and String Synchronizing Sets

Tomasz Kociumaka MPI-INF - D1
07 Feb 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In recent decades, the volume of sequential data processed in genomics and other disciplines has exploded. This trend escalates the challenge of designing text indexes – concise data structures that allow answering various queries efficiently. For example, pattern-matching queries ask if the text (a long string modeling the dataset) contains an occurrence of a given pattern (a short string provided at query time). The suffix array is a classic index for pattern matching and many other queries, ...
In recent decades, the volume of sequential data processed in genomics and other disciplines has exploded. This trend escalates the challenge of designing text indexes – concise data structures that allow answering various queries efficiently. For example, pattern-matching queries ask if the text (a long string modeling the dataset) contains an occurrence of a given pattern (a short string provided at query time). The suffix array is a classic index for pattern matching and many other queries, but modern applications demand space-efficient alternatives like the FM index. Still, even these data structures struggle with massive highly repetitive datasets, such as human genome collections. This scenario requires indexes whose size is comparable to the best text compression methods.

The talk will introduce two tools originating from my text indexing research. Substring complexity is a well-behaved measure of text repetitiveness that aids in comparing the performance of compression algorithms. String synchronizing sets enable consistent selection of small subsets of text substrings. Recently, these tools became the core of the asymptotically smallest compressed text index with suffix-array functionality. Furthermore, they find applications beyond text indexing: in combinatorics on words, quantum string algorithms, and more.
Read more

Causal Inference for Robust, Reliable, and Responsible NLP

Zhijing Jin MPI-IS & ETH
06 Feb 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Despite the remarkable progress in large language models (LLMs), it is well-known that natural language processing (NLP) models tend to fit for spurious correlations, which can lead to unstable behavior under domain shifts or adversarial attacks. In my research, I develop a causal framework for robust and fair NLP, which investigates the alignment of the causality of human decision-making and model decision-making mechanisms. Under this framework, I develop a suite of stress tests for NLP models across various tasks, ...
Despite the remarkable progress in large language models (LLMs), it is well-known that natural language processing (NLP) models tend to fit for spurious correlations, which can lead to unstable behavior under domain shifts or adversarial attacks. In my research, I develop a causal framework for robust and fair NLP, which investigates the alignment of the causality of human decision-making and model decision-making mechanisms. Under this framework, I develop a suite of stress tests for NLP models across various tasks, such as text classification, natural language inference, and math reasoning; and I propose to enhance robustness by aligning model learning direction with the underlying data generating direction. Using this causal inference framework, I also test the validity of causal and logical reasoning in models, with implications for fighting misinformation, and also extend the impact of NLP by applying it to analyze the causality behind social phenomena important for our society, such as causal analysis of policies, and measuring gender bias in our society. Together, I develop a roadmap towards socially responsible NLP by ensuring the reliability of models, and broadcasting its impact to various social applications.
Read more

Towards Ethical and Democratic Design of AI

Tanusree Sharma University of Illinois at Urbana Champaign
05 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Advancements in Artificial Intelligence (AI) are impacting our lives, raising concerns related to data collection, and social alignment to the resilience of AI models. A major criticism of AI development is the lack of transparency in design and decision-making about AI behavior, potentially leading to adverse outcomes such as discrimination, lack of inclusivity and representation, breaching legal rules, and privacy and security risks. Underserved populations, in particular, can be disproportionately affected by these design decisions. Conventional approaches in soliciting people’s input, ...
Advancements in Artificial Intelligence (AI) are impacting our lives, raising concerns related to data collection, and social alignment to the resilience of AI models. A major criticism of AI development is the lack of transparency in design and decision-making about AI behavior, potentially leading to adverse outcomes such as discrimination, lack of inclusivity and representation, breaching legal rules, and privacy and security risks. Underserved populations, in particular, can be disproportionately affected by these design decisions. Conventional approaches in soliciting people’s input, such as interviews, surveys, and focus groups, have limitations, such as often lacking consensus, coordination, and regular engagement. In this talk, I will present two examples of sociotechnical interventions for democratic and ethical AI. First, to address the need for ethical dataset creation for AI development, I will present a novel method "BivPriv," drawing ideas from accessible computing and computer vision in creating an inclusive private visual dataset with blind users as contributors. Then I will discuss my more recent work on "Inclusive.AI" funded by OpenAI to address concerns of social alignment by facilitating a democratic platform with decentralized governance mechanisms for scalable user interaction and integrity in the decision-making processes related to AI.
Read more

Theoretically Sound Cryptography for Key Exchange and Advanced Applications

Doreen Riepel UC San Diego
01 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Today, nearly all internet connections are established using a cryptographic key exchange protocol. Fo r these protocols, we want to guarantee security even if an adversary can control the protocol’s execution or secr ets are leaked. A security analysis takes this into account and provides a mathematical proof relying on computati onal problems that are believed to be hard to solve. In this talk, I will first give an overview of the security p roperties of authenticated key exchange protocols and how to achieve them using cryptographic building blocks. ...
Today, nearly all internet connections are established using a cryptographic key exchange protocol. Fo r these protocols, we want to guarantee security even if an adversary can control the protocol’s execution or secr ets are leaked. A security analysis takes this into account and provides a mathematical proof relying on computati onal problems that are believed to be hard to solve. In this talk, I will first give an overview of the security p roperties of authenticated key exchange protocols and how to achieve them using cryptographic building blocks. I w ill talk about tight security and the role of idealized models such as the generic group model. In addition to cla ssical Diffie-Hellman based key exchange, I will also present recent results on isogeny-based key exchange, a prom ising candidate for post-quantum secure cryptography. Finally, I will touch upon examples of advanced cryptographi c primitives like ratcheted key exchange for secure messaging.
Read more

New Algorithmic Tools for Rigorous Machine Learning Security Analysis

Teodora Baluta National University of Singapore
30 Jan 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Machine learning security is an emerging area with many open questions lacking systematic analysis. In this talk, I will present three new algorithmic tools to address this gap: (1) algebraic proofs; (2) causal reasoning; and (3) sound statistical verification. Algebraic proofs provide the first conceptual mechanism to resolve intellectual property disputes over training data. I show that stochastic gradient descent, the de-facto training procedure for modern neural networks, is a collision-resistant computation under precise definitions. These results open up connections to lattices, ...
Machine learning security is an emerging area with many open questions lacking systematic analysis. In this talk, I will present three new algorithmic tools to address this gap: (1) algebraic proofs; (2) causal reasoning; and (3) sound statistical verification. Algebraic proofs provide the first conceptual mechanism to resolve intellectual property disputes over training data. I show that stochastic gradient descent, the de-facto training procedure for modern neural networks, is a collision-resistant computation under precise definitions. These results open up connections to lattices, which are mathematical tools used for cryptography presently. I will also briefly mention my efforts to analyze causes of empirical privacy attacks and defenses using causal models, and to devise statistical verification procedures with ‘probably approximately correct’ (PAC)-style soundness guarantees.
Read more

Oblivious Algorithms for Privacy-Preserving Computations

Sajin Sasy University of Waterloo
25 Jan 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
People around the world use data-driven online services every day. However, data center attacks and data breaches have become a regular and rising phenomenon. How, then, can one reap the benefits of data-driven statistical insights without compromising the privacy of individuals' data? In this talk, I will first present an overview of three disparate approaches towards privacy-preserving computations today, namely homomorphic cryptography, distributed trust, and secure hardware. These ostensibly unconnected approaches have one unifying element: oblivious algorithms. ...
People around the world use data-driven online services every day. However, data center attacks and data breaches have become a regular and rising phenomenon. How, then, can one reap the benefits of data-driven statistical insights without compromising the privacy of individuals' data? In this talk, I will first present an overview of three disparate approaches towards privacy-preserving computations today, namely homomorphic cryptography, distributed trust, and secure hardware. These ostensibly unconnected approaches have one unifying element: oblivious algorithms. I will discuss the relevance and pervasiveness of oblivious algorithms in all the different models for privacy-preserving computations. Finally, I highlight the performance and security challenges in deploying such privacy-preserving solutions, and present three of my works that mitigate these obstacles through the design of novel efficient oblivious algorithms.
Read more

A comprehensive analysis of PII leakage-based web tracking

Ha Dao MPI-INF - INET
10 Jan 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Third-party web tracking has been used to collect and analyze user browsing behavior, which has raised concerns about privacy among Internet users. In this talk, I will describe the mechanisms behind third-party web tracking and offer suggestions for user protection. My discussion will extend to our recent work on Personally Identifiable Information (PII) leakage-based tracking across various scenarios. I will conclude the talk by suggesting potential research directions to enhance transparency on the World Wide Web.
Third-party web tracking has been used to collect and analyze user browsing behavior, which has raised concerns about privacy among Internet users. In this talk, I will describe the mechanisms behind third-party web tracking and offer suggestions for user protection. My discussion will extend to our recent work on Personally Identifiable Information (PII) leakage-based tracking across various scenarios. I will conclude the talk by suggesting potential research directions to enhance transparency on the World Wide Web.

Implementability of Asynchronous Communication Protocols - The Power of Choice

Felix Stutz Max Planck Institute for Software Systems
15 Dec 2023, 3:30 pm - 4:30 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Distributed message-passing systems are both widespread and challenging to design and implement. Combining concurrency, asynchrony, and message buffering makes the verification problem algorithmically undecidable. We consider a top-down approach to this problem: one starts with a global communication protocol and automatically generates local specifications for its participants. The problem remains challenging because participants only obtain partial information about an execution through the messages they receive, which can cause confusion and undesired behaviours. The implementability problem asks if a (global) protocol can be implemented locally. ...
Distributed message-passing systems are both widespread and challenging to design and implement. Combining concurrency, asynchrony, and message buffering makes the verification problem algorithmically undecidable. We consider a top-down approach to this problem: one starts with a global communication protocol and automatically generates local specifications for its participants. The problem remains challenging because participants only obtain partial information about an execution through the messages they receive, which can cause confusion and undesired behaviours. The implementability problem asks if a (global) protocol can be implemented locally. It has been studied from two perspectives. On the one hand, multiparty session types (MSTs), with global types to specify protocols, provide an incomplete type-theoretic approach that is very efficient but restricts the expressiveness of protocols. On the other hand, high-level message sequence charts (HMSCs) do not impose any restrictions on the protocols, yielding undecidability of the implementability problem for HMSCs.   The work in this dissertation is the first to formally build a bridge between the world of MSTs and HMSCs. It presents a generalised MST projection operator for sender-driven choice. This allows a sender to send to different receivers when branching, which is crucial to handle common communication patterns from distributed computing. Nevertheless, the classical MST projection approach is inherently incomplete. Using our formal encoding from global types to HMSCs, we prove decidability of the implementability problem for global types with sender-driven choice. Furthermore, we develop the first direct and complete projection operator for global types with sender-driven choice, using automata-theoretic techniques, and show its effectiveness with a prototype implementation. Last, we introduce protocol state machines (PSMs) - an automata-based protocol specification formalism - that subsume both global types from MSTs and HMSCs with regard to expressivity. We use transformations on PSMs to show that many of the syntactic restrictions of global types are not restrictive in terms of protocol expressivity. We prove that the implementability problem for PSMs with mixed choice, which requires no dedicated sender for a branch but solely all labels to be distinct, is undecidable in general. With our results on expressivity, this shows undecidability of the implementability problem for mixed-choice global types.
Read more

Cocon: A Type-Theoretic Framework for Certified Meta-programming

Brigitte Pientka McGill University
15 Dec 2023, 9:00 am - 10:00 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Meta-programming is the art of writing programs that produce or manipulate other programs. This allows programmers to automate error-prone or repetitive tasks, and exploit domain-specific knowledge to customize the generated code. Hence, meta-programming is widely used in a range of technologies: from cryptographic message authentication in secure network protocols to supporting reflection in proof environments such as Lean or Coq.   Unfortunately, writing safe meta-programs remains very challenging and sometimes frustrating, as traditionally errors in the generated code are only detected when running it, ...
Meta-programming is the art of writing programs that produce or manipulate other programs. This allows programmers to automate error-prone or repetitive tasks, and exploit domain-specific knowledge to customize the generated code. Hence, meta-programming is widely used in a range of technologies: from cryptographic message authentication in secure network protocols to supporting reflection in proof environments such as Lean or Coq.   Unfortunately, writing safe meta-programs remains very challenging and sometimes frustrating, as traditionally errors in the generated code are only detected when running it, but not at the time when code is generated. To make it easier to write and maintain meta-programs, tools that allow us to detect errors during code generation -- instead of when running the generated  code — are essential.   This talk revisits Cocon, a Martin-Loef dependent type theory for defining logics and proofs, as a framework for certified meta-programming. Cocon allows us to represent domain-specific languages (DSL) within the logical framework LF and in addition write recursive meta-programs and proofs about those DSLs. In particular, we can embed into LF STLC or System F, or even MLTT itself and then write programs about those encodings using Cocon itself. This means Cocon can be viewed as a type-theoretic framework for certified meta-programming.   This work revisits the LICS'19 paper "A Type Theory for Defining Logics" by Brigitte Pientka, David Thibodeau, Andreas Abel, Francisco Ferreira, Rébecca Zucchini and reframes it as a foundation for meta-programming. It highlights what is necessary to use Cocon as a type-theoretic foundation for certified meta-programming and how to build such a certified meta-programming system from the ground up.

This is joint work with Jason Z. Hu and Junyoung Jang.

---

Please contact the Office team for Zoom link information.
Read more

Automated Reasoning under Weak Memory Consistency

Michalis Kokologiannakis Max Planck Institute for Software Systems
14 Dec 2023, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Weak memory consistency models capture the outcomes of concurrent programs that appear in practice and yet cannot be explained by thread interleavings. Such outcomes pose two major challenges to formal methods.  First, establishing that a memory model satisfies its intended properties (e.g., supports a certain compilation scheme) is extremely error-prone: most proposed language models were initially broken and required multiple iterations to achieve soundness.  Second, weak memory models make verification of concurrent programs much harder, as a result of which there are no scalable verification techniques beyond a few that target very simple models. ...
Weak memory consistency models capture the outcomes of concurrent programs that appear in practice and yet cannot be explained by thread interleavings. Such outcomes pose two major challenges to formal methods.  First, establishing that a memory model satisfies its intended properties (e.g., supports a certain compilation scheme) is extremely error-prone: most proposed language models were initially broken and required multiple iterations to achieve soundness.  Second, weak memory models make verification of concurrent programs much harder, as a result of which there are no scalable verification techniques beyond a few that target very simple models.

In this talk, I present solutions to both of these problems. First, I show that the relevant meta-theory of weak memory models can be effectively decided (sparing years of manual proof efforts), and present Kater, a tool that can answer such queries in a matter of seconds.  Second, I present GenMC, the first  scalable stateless model checker that is parametric in the choice of the memory model, often improving the prior state of the art by orders of magnitude.
Read more

Improving Decision Making with Machine Learning, Provably

Manuel Gomez Rodriguez Max Planck Institute for Software Systems
06 Dec 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels.  However, these systems also need to make human experts understand when and how to use these predictions to update their own  predictions. Unfortunately, this has been proven challenging. In this talk, I will introduce an alternative type of decision support systems  that circumvent this challenge by design. Rather than providing a single label prediction, these systems provide a set of label prediction  values, ...
Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels.  However, these systems also need to make human experts understand when and how to use these predictions to update their own  predictions. Unfortunately, this has been proven challenging. In this talk, I will introduce an alternative type of decision support systems  that circumvent this challenge by design. Rather than providing a single label prediction, these systems provide a set of label prediction  values, namely a prediction set, and forcefully ask experts to predict a label value from the prediction set. Moreover, I will discuss how to use conformal prediction, online learning and counterfactual inference to efficiently construct prediction sets that optimize experts’ performance,  provably. Further, I will present the results of a large-scale human subject study, which show that, for decision support systems based on  prediction sets, limiting experts’ level of agency leads to greater performance than allowing experts to always exercise their own agency.
Read more

Automated and Foundational Verification of Low-Level Programs

Michael Sammler MPI-SWS
04 Dec 2023, 5:00 pm - 6:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Student Defense Talks - Thesis Defense
Formal verification is a promising technique to ensure the reliability of low-level programs like operating systems and hypervisors, since it can show the absence of whole classes of bugs and prevent critical vulnerabilities. To realize the full potential of formal verification for real-world low-level programs, however one has to overcome several challenges, including: (1) dealing with the complexities of realistic models of real-world programming languages; (2) ensuring the trustworthiness of the verification, ideally by providing foundational proofs (i.e., ...
Formal verification is a promising technique to ensure the reliability of low-level programs like operating systems and hypervisors, since it can show the absence of whole classes of bugs and prevent critical vulnerabilities. To realize the full potential of formal verification for real-world low-level programs, however one has to overcome several challenges, including: (1) dealing with the complexities of realistic models of real-world programming languages; (2) ensuring the trustworthiness of the verification, ideally by providing foundational proofs (i.e., proofs that can be checked by a general-purpose proof assistant); and (3) minimizing the manual effort required for verification by providing a high degree of automation.

This dissertation presents multiple projects that advance formal verification along these three axes: RefinedC provides the first approach for verifying C code that combines foundational proofs with a high degree of automation via a novel refinement and ownership type system. Islaris shows how to scale verification of assembly code to realistic models of modern instruction set architectures-in particular, Armv8-A and RISC-V. DimSum develops a decentralized approach for reasoning about programs that consist of components written in multiple different languages (e.g., assembly and C), as is common for low-level programs. RefinedC and Islaris rest on Lithium, a novel proof engine for separation logic that combines automation with foundational proofs.
Read more

The Never-Ending Trace: An Under-Approximate Approach to Divergence Bugs

Caroline Cronjäger Vrije Universiteit Amsterdam
04 Dec 2023, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 607 / Meeting ID: -
SWS Colloquium
The goal of under-approximating logics, such as incorrectness logic, is to reason about the existence of bugs  by specifying a subset of possible program behaviour. While this approach allows for the specification of a broad range of bugs, it does not account for divergence bugs (such as infinite loops), since the nature of the triple itself does not allow statements about infinite execution traces.     To fill this gap, I will present a divergent variant of the forward under-approximating (FUA) triple. ...
The goal of under-approximating logics, such as incorrectness logic, is to reason about the existence of bugs  by specifying a subset of possible program behaviour. While this approach allows for the specification of a broad range of bugs, it does not account for divergence bugs (such as infinite loops), since the nature of the triple itself does not allow statements about infinite execution traces.     To fill this gap, I will present a divergent variant of the forward under-approximating (FUA) triple. This new separation logic reasons functionally compositionally about divergence bugs and is under-approximating in the sense that all specified bugs are true bugs, that is, reachable. It can detect divergence originating from loops, recursive function calls and goto-cycles.     I will motivate the talk by defining the type of divergent programs we are aiming to reason about, and show how previously introduced logics fall short of specifying their divergent behaviour. After introducing the FUA triple, I will outline how this triple differs from traditional over-approximating  and under-approximating approaches such as (partial) Hoare logic and incorrectness logic. Finally, I will discuss the mechanisms within the FUX framework that enable reasoning about divergence, with a focus on how to prove divergence arising from goto-cycles.

--

Please contact the office team for the Zoom link details.
Read more

Rigorous and General Response-Time Analysis for Uniprocessor Real-Time

Sergey Bozhko Max Planck Institute for Software Systems
21 Nov 2023, 2:00 pm - 4:00 pm
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Student Defense Talks - Thesis Proposal
The Prosa project demonstrated that mechanized human-readable real-time systems (RTS) theory was a viable alternative to the mainstream pen-and-paper approach. Unfortunately, this mechanization effort faced two obstacles: (1) the mechanized proofs of real-time systems theory, when designed naively, contain an excessive amount of duplication, and (2) the stochastic approach to analyzing RTSs, which is growing in popularity, seems beyond the reach of Prosa. The proposed thesis will address these fundamental challenges in the mechanization of RTS theory. ...
The Prosa project demonstrated that mechanized human-readable real-time systems (RTS) theory was a viable alternative to the mainstream pen-and-paper approach. Unfortunately, this mechanization effort faced two obstacles: (1) the mechanized proofs of real-time systems theory, when designed naively, contain an excessive amount of duplication, and (2) the stochastic approach to analyzing RTSs, which is growing in popularity, seems beyond the reach of Prosa. The proposed thesis will address these fundamental challenges in the mechanization of RTS theory.

With regard to the issue of extensible non-duplicate RTA proofs, this work proposes the first abstract response-time analysis (RTA) of uniprocessor RTSs mechanized in Coq. To illustrate the flexibility, we propose instantiations of the abstract RTA to all combinations of three different scheduling policies, four preemption policies, and four processor-supply models as well as an instantiation of the abstract RTA to account for the priority ceiling protocol (PCP) for fully-preemptive fixed-priority scheduling and the four processor-supply models.

With regard to adapting the Prosa approach to stochastic RTS theory, this work proposes the first mechanized specification of stochastic RTS as well as the definition of probabilistic worst-case execution time (pWCET) accompanied by mechanized proof of its adequacy. In short, the proven adequacy theorem states that actual execution costs represented via dependent random variables can be replaced with independent and identically distributed (IID) random variables derived from pWCET. The reduction from dependent random variables to IID random variables opens the door to a wide variety of techniques from probability theory. To demonstrate that the mechanization of stochastic RTA is feasible with reasonable effort, we propose building on the adequacy theorem to mechanize a state-of-the-art stochastic RTA, thus creating the first mechanized stochastic RTA.
Read more

The complexity of Presburger arithmetic with power or powers

Dmitry Chistikov University of Warwick
14 Nov 2023, 11:00 am - 12:00 pm
Kaiserslautern building G26, room 111
SWS Colloquium
Presburger arithmetic, or linear integer arithmetic, is known to have decision procedures that work in triply exponential time.

Jointly with M. Benedikt (Oxford) and A. Mansutti (IMDEA Software Institute), we have recently considered two decidable extensions of Presburger arithmetic: with the power function and with the predicate for the set of powers of 2. No elementary decision procedures were known for these two theories.

In this talk, I will introduce this work and outline ideas behind our results. ...
Presburger arithmetic, or linear integer arithmetic, is known to have decision procedures that work in triply exponential time.

Jointly with M. Benedikt (Oxford) and A. Mansutti (IMDEA Software Institute), we have recently considered two decidable extensions of Presburger arithmetic: with the power function and with the predicate for the set of powers of 2. No elementary decision procedures were known for these two theories.

In this talk, I will introduce this work and outline ideas behind our results. Namely, we have shown that existence of solutions over N to systems of linear equations and constraints of the form $y = 2^x$ can be decided in nondeterministic exponential time. Also, linear integer arithmetic extended with a predicate for powers of 2 can be decided in triply exponential time.

(Based on a paper in ICALP'23.)
Read more

Digitizing the Human Face

Mohamed Elgharib MPI-INF - D6
08 Nov 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Digitization is the process of creating digital visual twins of real objects and enabling interactions with them in a seamless manner. The human face is a central element of our visual communication and hence its digitization is of utmost importance in visual computing. This has several applications in videoconferencing, cinema production, video games, virtual environments, and many more. In this talk, we will discuss the digitization of the human face. To this end, we will highlight the main stages of the digitization pipeline, ...
Digitization is the process of creating digital visual twins of real objects and enabling interactions with them in a seamless manner. The human face is a central element of our visual communication and hence its digitization is of utmost importance in visual computing. This has several applications in videoconferencing, cinema production, video games, virtual environments, and many more. In this talk, we will discuss the digitization of the human face. To this end, we will highlight the main stages of the digitization pipeline, and we will outline the main pillars of digitizing the human face. We will show how to achieve these pillars by utilizing the most recent advances in learnable neural implicit scene representations. We will discuss specific face digitization problems in more detail and show important features that learnable neural implicit scene representations can bring to the digitization process. This talk highlights our efforts towards the full digitization of our world, with an ultimate goal of allowing all of us to connect and interact with our friends, family, and loved ones, over a distance.
Read more

Exposing Concurrency Bugs from their Hiding Places

Umang Mathur National University of Singapore
26 Oct 2023, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Concurrent programs are notoriously hard to write correctly, as scheduling nondeterminism introduces subtle errors that are both hard to detect and to reproduce.

Despite rigorous testing, concurrency bugs such as races conditions often find their way into production software, and manifest as critical security issues. Consequently, considerable effort has been made towards developing efficient techniques for detecting them automatically.

The preferred approach to detect data races is through dynamic analysis, where one executes the software with some test inputs, ...
Concurrent programs are notoriously hard to write correctly, as scheduling nondeterminism introduces subtle errors that are both hard to detect and to reproduce.

Despite rigorous testing, concurrency bugs such as races conditions often find their way into production software, and manifest as critical security issues. Consequently, considerable effort has been made towards developing efficient techniques for detecting them automatically.

The preferred approach to detect data races is through dynamic analysis, where one executes the software with some test inputs, and checks for the presence of bugs in the execution observed.

Traditional bug detectors however are often unable to discover simple bugs present in the underlying software, even after executing the program several times, because these bugs are sensitive to thread scheduling.

In this talk, I will discuss how runtime predictive analysis can help. Runtime predictive analyses aim to expose concurrency bugs, that can be otherwise missed by traditional dynamic analysis techniques (such as the race detector TSan), by inferring the presence of these bugs in alternate executions of the underlying software, without explicitly re-executing the software program.

I will talk about the fundamentals of and recent algorithmic advances for building highly scalable and sound predictive analysis techniques.
Read more

Naturalness & Bimodality of Code

Prem Devanbu University California, Davis
18 Oct 2023, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Distinguished Lecture Series
After discovering, back in 2011, that Language Models are useful for modeling repetitive patterns in source code (c.f. The "Naturalness" of software ), and exploring some applications thereof, more recently (since about 2019) our group at UC Davis has focused on the observation that Software, as usually written, is bimodal, admitting both the well-known formal, deterministic semantics (mostly for machines) and probabilistic, noisy semantics (for humans). This bimodality property affords both new approaches to software tool construction (using machine-learning) and new ways of studying human code reading. ...
After discovering, back in 2011, that Language Models are useful for modeling repetitive patterns in source code (c.f. The "Naturalness" of software ), and exploring some applications thereof, more recently (since about 2019) our group at UC Davis has focused on the observation that Software, as usually written, is bimodal, admitting both the well-known formal, deterministic semantics (mostly for machines) and probabilistic, noisy semantics (for humans). This bimodality property affords both new approaches to software tool construction (using machine-learning) and new ways of studying human code reading. In this talk, I'll give an overview of the Naturalness/Bimodality program, and some recent work we have done on calibrating the quality of code produced by large language models, and also on "bimodal prompting".

--

Please contact the office team for link information
Read more

Algorithms for Plurality

Smitha Milli Cornell Tech
17 Oct 2023, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
SWS Colloquium
Machine learning algorithms curate much of the content we encounter online. However, there is concern that these algorithms may unintentionally amplify hostile discourse and perpetuate divisive 'us versus them' mentalities. How can we re-engineer algorithms to bridge diverse perspectives and facilitate constructive conflict? First, I will discuss results from our randomized experiment measuring effects of Twitter’s engagement-based ranking algorithm on downstream sociopolitical outcomes like the amplification of divisive content and users’ perceptions of their in-group and out-group. ...
Machine learning algorithms curate much of the content we encounter online. However, there is concern that these algorithms may unintentionally amplify hostile discourse and perpetuate divisive 'us versus them' mentalities. How can we re-engineer algorithms to bridge diverse perspectives and facilitate constructive conflict? First, I will discuss results from our randomized experiment measuring effects of Twitter’s engagement-based ranking algorithm on downstream sociopolitical outcomes like the amplification of divisive content and users’ perceptions of their in-group and out-group. Crucially, we found that an alternative ranking, based on users’ stated preferences rather than their engagement, reduced amplification of negative, partisan, and out-group hostile content. Second, I will discuss how we applied these insights in practice to design an objective function for algorithmic ranking at Twitter. The core idea to the approach is to interpret users' actions in a way that is consistent with their stated, reflective preferences. Finally, I will discuss lessons learned and open questions for designing algorithms that support a plurality of viewpoints, with an emphasis on the emerging paradigm of bridging-based ranking.
Read more

Causethical ML: Promises & Challenges

Isabel Valera Fachrichtung Informatik - Saarbrücken
11 Oct 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In this talk I will give an overview of the role of causality in ethical machine learning, and in particular, in fair and explainable ML. In particular, I will first detail how to use causal reasoning to  study fairness and interpretability problems in algorithmic decision making, stressing the main  limitations that we encounter when aiming to address these problems in practice.  Then, I will provide some hints about how to solve some of these limitations.
In this talk I will give an overview of the role of causality in ethical machine learning, and in particular, in fair and explainable ML. In particular, I will first detail how to use causal reasoning to  study fairness and interpretability problems in algorithmic decision making, stressing the main  limitations that we encounter when aiming to address these problems in practice.  Then, I will provide some hints about how to solve some of these limitations.

Software Engineering for Data Intensive Scalable Computing and Heterogeneous Computing

Miryung Kim UCLA
28 Sep 2023, 10:30 am - 11:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Distinguished Lecture Series
With the development of big data, machine learning, and AI, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. Furthermore, specialized hardware accelerators like GPUs or FPGAs have become a prominent part of the current computing landscape. However, developing heterogeneous applications is limited to a small subset of programmers with specialized hardware knowledge. To improve productivity and performance for data-intensive and compute-intensive development, now is the time that the software engineering community should design new waves of refactoring, ...
With the development of big data, machine learning, and AI, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. Furthermore, specialized hardware accelerators like GPUs or FPGAs have become a prominent part of the current computing landscape. However, developing heterogeneous applications is limited to a small subset of programmers with specialized hardware knowledge. To improve productivity and performance for data-intensive and compute-intensive development, now is the time that the software engineering community should design new waves of refactoring, testing, and debugging tools for big data analytics and heterogeneous application development.

In this talk, we overview software development challenges in this new data-intensive scalable computing and heterogeneous computing domain. We describe examples of automated software engineering (debugging, testing, and refactoring) techniques that target this new domain and share lessons learned from building these techniques.
Read more

Robust and Equitable Uncertainty Estimation

Aaron Roth University of Pennsylvania
27 Sep 2023, 2:00 pm - 3:00 pm
Virtual talk
SWS Distinguished Lecture Series
Machine learning provides us with an amazing set of tools to make predictions, but how much should we trust particular predictions? To answer this, we need a way of estimating the confidence we should have in particular predictions of black-box models. Standard tools for doing this give guarantees that are averages over predictions. For instance, in a medical application, such tools might paper over poor performance on one medically relevant demographic group if it is made up for by higher performance on another group. ...
Machine learning provides us with an amazing set of tools to make predictions, but how much should we trust particular predictions? To answer this, we need a way of estimating the confidence we should have in particular predictions of black-box models. Standard tools for doing this give guarantees that are averages over predictions. For instance, in a medical application, such tools might paper over poor performance on one medically relevant demographic group if it is made up for by higher performance on another group. Standard methods also depend on the data distribution being static — in other words, the future should be like the past.

In this talk, I will describe new techniques to address both these problems: a way to produce prediction sets for arbitrary black-box prediction methods that have correct empirical coverage even when the data distribution might change in arbitrary, unanticipated ways and such that we have correct coverage even when we zoom in to focus on demographic groups that can be arbitrary and intersecting. When we just want correct group-wise coverage and are willing to assume that the future will look like the past, our algorithms are especially simple.

This talk is based on two papers, that are joint work with Osbert Bastani, Varun Gupta, Chris Jung, Georgy Noarov, and Ramya Ramalingam.

Please contact the office team for link information
Read more

AI as a resource: strategy, uncertainty, and societal welfare

Kate Donahue Cornell University
27 Sep 2023, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
simultaneous videocast to Kaiserslautern building E1 5, room 029
SWS Colloquium
In recent years, humanity has been faced with a new resource - artificial intelligence. AI can be a boon to society, or can also have negative impacts, especially with inappropriate use. My research agenda studies the societal impact of AI, particularly focusing on AI as a resource and on the strategic decisions that agents make in deciding how to use it. In this talk, I will consider some of the key strategic questions that arise in this framework: the decisions that agents make in jointly constructing and sharing AI models, ...
In recent years, humanity has been faced with a new resource - artificial intelligence. AI can be a boon to society, or can also have negative impacts, especially with inappropriate use. My research agenda studies the societal impact of AI, particularly focusing on AI as a resource and on the strategic decisions that agents make in deciding how to use it. In this talk, I will consider some of the key strategic questions that arise in this framework: the decisions that agents make in jointly constructing and sharing AI models, and the decisions that they make in dividing tasks between their own expertise and the expertise of a model. The first of these questions has motivated my work on "model-sharing games", which models scenarios such as federated learning or data cooperatives. In this setting, we view agents with data as game-theoretic players and analyze questions of stability, optimality, and fairness (https://arxiv.org/abs/2010.00753, https://arxiv.org/abs/2106.09580, https:// arxiv.org/abs/2112.00818). Secondly, I will describe some of my ongoing work in modeling human-algorithm collaboration. In particular, I will describe work on best-item recovery in categorical prediction, showing how differential accuracy rates and anchoring on algorithmic suggestions can influence overall performance (https://arxiv.org/abs/2308.11721).
Read more

Privacy Auditing and Protection in Large Language Models

Fatemehsadat Mireshghallah University of Washington
18 Sep 2023, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 112
SWS Colloquium
Large language Models (LLMs, e.g., GPT-3, OPT, TNLG,…) are shown to have a remarkably high performance on standard benchmarks, due to their high parameter count, extremely large training datasets, and significant compute. Although the high parameter count in these models leads to more expressiveness, it can also lead to higher memorization, which, coupled with large unvetted, web-scraped datasets can cause different negative societal and ethical impacts such as leakage of private, sensitive information and generation of harmful text. ...
Large language Models (LLMs, e.g., GPT-3, OPT, TNLG,…) are shown to have a remarkably high performance on standard benchmarks, due to their high parameter count, extremely large training datasets, and significant compute. Although the high parameter count in these models leads to more expressiveness, it can also lead to higher memorization, which, coupled with large unvetted, web-scraped datasets can cause different negative societal and ethical impacts such as leakage of private, sensitive information and generation of harmful text. In this talk, we will go over how these issues affect the trustworthiness of LLMs, and zoom in on how we can measure the leakage and memorization of these models, and mitigate it through differentially private training. Finally we will discuss what it would actually mean for LLMs to be privacy preserving, and what are the future research directions on making large models trustworthy.
Read more