Events 2026 – MPI SWS

Pushing the Boundary on Automated Modular Floating-Point Verification

Rosa Abbasi Max Planck Institute for Software Systems

27 Mar 2026, 10:00 am - 11:00 am

Kaiserslautern building G26, room 111

SWS Student Defense Talks - Thesis Defense

Floating-point numbers often represent real numbers in computer systems. They are applicable in many domains, including embedded systems, machine learning, and scientific computing. Despite their widespread use, they pose some difficulties. Floating-point numbers and operations typically suffer from roundoff errors, making computations over floating-points inaccurate with respect to a real-valued specification. Moreover, the IEEE 754 floating-point standard, a fundamental element in formalizing floating-point arithmetic for today’s computers, presents additional challenges due to special values and resulting unintuitive behaviors. ...

Floating-point numbers often represent real numbers in computer systems. They are applicable in many domains, including embedded systems, machine learning, and scientific computing. Despite their widespread use, they pose some difficulties. Floating-point numbers and operations typically suffer from roundoff errors, making computations over floating-points inaccurate with respect to a real-valued specification. Moreover, the IEEE 754 floating-point standard, a fundamental element in formalizing floating-point arithmetic for today’s computers, presents additional challenges due to special values and resulting unintuitive behaviors. This thesis has three main contributions that address existing gaps in automated reasoning about floating-point arithmetic, making it easier for developers and researchers to understand, verify, and trust the floating-point computations in their programs. First, we introduce the first floating-point support in a deductive verifier for the Java programming language. Our support in the KeY verifier automatically handles floating-point arithmetic and transcendental functions. We achieve this with a combination of delegation to external SMT solvers on one hand, and rule-based reasoning within KeY on the other, exploiting the complementary strengths of both approaches. As a result, this approach can prove functional floating-point properties for realistic programs. Second, inspired by KeY’s treatment of method calls and the need for a scalable roundoff error analysis, we present the first modular optimization-based roundoff error analysis for non-recursive procedural floating-point programs. Our key idea is to achieve modularity while maintaining reasonable accuracy by automatically computing procedure summaries that are a function of the input parameters. Technically, we extend an existing optimization-based roundoff error analysis and show how to effectively use first-order Taylor approximations to compute precise procedure summaries, and how to integrate those to obtain end-to-end roundoff error bounds. Third, our experience using SMT solvers to discharge KeY’s floating-point verification conditions revealed unexpected performance behavior, motivating a systematic study of floating-point reasoning in SMT solvers. We propose a metamorphic testing approach that uses semantics-preserving rewrite rules, focusing on floating-point special values, to uncover unexpected performance behavior in SMT solvers’ handling of floating-point formulas, such as an increase in solving time when the SMT queries are simplified. Using real-world test inputs, our approach can identify such performance bugs for every SMT solver tested.

Permissive Assumptions in Logical Controller Synthesis for Cyber-Physical Systems

Satya Prakash Nayak Max Planck Institute for Software Systems

26 Mar 2026, 3:00 pm - 4:00 pm

Kaiserslautern building G26, room 111

SWS Student Defense Talks - Thesis Defense

The synthesis of logical controllers that guarantee desired specifications is a central problem in the design of cyber-physical systems (CPS). In practice, such guarantees rely on assumptions about how the controller interacts with its environment. These assumptions restrict environment behavior to make synthesis feasible, but in existing approaches they are often overly restrictive, leading to conservative designs and limiting the range of behaviors that systems can safely accommodate.

This thesis rethinks the role of assumptions in logical controller synthesis by emphasizing their \emph{permissiveness}---the ability to capture a wide range of admissible environment behaviors. ...

The synthesis of logical controllers that guarantee desired specifications is a central problem in the design of cyber-physical systems (CPS). In practice, such guarantees rely on assumptions about how the controller interacts with its environment. These assumptions restrict environment behavior to make synthesis feasible, but in existing approaches they are often overly restrictive, leading to conservative designs and limiting the range of behaviors that systems can safely accommodate.

This thesis rethinks the role of assumptions in logical controller synthesis by emphasizing their \emph{permissiveness}---the ability to capture a wide range of admissible environment behaviors. We study permissive assumptions in two key settings: (a) interactions among multiple discrete components in distributed systems, and (b) interactions between high-level logical controllers and low-level physical dynamics in hybrid systems. In both settings, we develop theoretical and algorithmic foundations for computing and exploiting permissive assumptions to enable new design paradigms for logical controller synthesis.

For distributed systems, we define permissiveness as capturing all cooperative behaviors of other components that enable a controller to satisfy its specification. We present an algorithm for computing such assumptions in monolithic systems and extend it to distributed systems via a negotiation-based framework that iteratively constructs permissive assume-guarantee contracts for each component. These contracts enable decentralized synthesis and are applied to human-robot interaction, allowing robots to cooperate with humans whenever possible and request cooperation only when necessary.

For hybrid systems, we utilize permissive assumptions on the plant model---the abstract representation of physical dynamics---to address three key challenges. To enable seamless adaptation of controllers to changing logical contexts, i.e., changes in high-level goals or tasks, we introduce a novel synthesis framework that utilizes \emph{persistent live groups}, a class of assumptions capturing liveness properties of continuous dynamics. To improve scalability to large or uncertain plant models, we develop \emph{universal controllers} where decisions are conditioned on branching-time assumptions called \emph{prophecies}, which are learned from representative models and efficiently verified at runtime on unseen plant models. Finally, to enhance robustness under uncertainty or partial violations of assumptions on the plant model, we introduce a robust semantics for branching-time temporal logics, enabling formal reasoning about controller behavior under such violations.

Overall, this work enables correctness-by-construction synthesis while avoiding unnecessary conservatism, resulting in CPS that are more robust, scalable, and responsive.

Strong Program Logics for Weak Memory and Even Stronger Types for Tactic Programming

Jan-Oliver Kaiser Max Planck Institute for Software Systems

19 Mar 2026, 10:00 am - 11:00 am

Saarbrücken building E1 5, room 029

SWS Student Defense Talks - Thesis Defense

Computers have become ubiquitous in everyday life and so have bugs in programs running on those computers. Research in the field of programming languages and verification has produced countless ways to attack the problem of software defects. This thesis concerns itself with two established techniques being applied to an unconventional setting.

Firstly, in the category of extending the applicability of program verification to more realistic settings, we demonstrate how to verify high-performance algorithms and data structures making use of highly efficient memory access patterns that are not automatically synchronized with main memory or other processors’ caches. ...

Computers have become ubiquitous in everyday life and so have bugs in programs running on those computers. Research in the field of programming languages and verification has produced countless ways to attack the problem of software defects. This thesis concerns itself with two established techniques being applied to an unconventional setting.

Firstly, in the category of extending the applicability of program verification to more realistic settings, we demonstrate how to verify high-performance algorithms and data structures making use of highly efficient memory access patterns that are not automatically synchronized with main memory or other processors’ caches. Hardware and programming languages that expose these weakly or un-synchronized memory accesses are said to have weak memory models. The lack of synchronization in weak memory models goes directly against the assumption of sequential consistency which still sits at the heart of most verification works. Concretely, we show how to perform verification in weak memory models using an existing program logic framework, Iris, that was traditionally limited to sequential consistency. In building on Iris, we inherit its mechanized proof of soundness of the core logic as well as the ability to perform mechanized program verification.

Secondly, we propose to bring the benefits of dependent types to the process of writing and automating proofs in the Rocq proof assistant. This aims to adress the limitations of Rocq's oldest — and, for a long time, only — tactic language: Ltac. Ltac's pitfalls are numerous and it is arguably unable to fulfill the requirements of large verification projects. Our contribution is a new tactic language called Mtac2. Mtac2 is based on Mtac, a principled metaprogramming language for Rocq offering strongly typed primitives based on Rocq’s own dependent type system. Mtac’s primitives could already be used to implement some tactics but it lacks the ability to directly interact with Rocq’s proof state and to perform backwards reasoning on it. Mtac2 extends Mtac with support for backwards reasoning and keeps in line with Mtac’s tradition of strong types by introducing the concept of typed tactics. Typed tactics statically track the expected type of the current goal(s) and can rule out entire classes of mistakes that often plague Ltac tactics.

From Interventions to Assistants: Toward Intelligent Security Support

Daniele Lain ETH Zurich
(hosted by Carmela Troncoso)

12 Mar 2026, 10:00 am - 10:45 am

Bochum building MPI-SP, room tba

CIS@MPG Colloquium

As users are exposed to an unprecedented range of security threats, a rich ecosystem of support mechanisms has emerged to assist the "last line of defense". Yet as these mechanisms become widely adopted in industry, a question arises: do they provide the support users actually need? In the first part of this talk, I will show that this is not always the case. Using phishing (one of the most prevalent and damaging cybercrimes) as a case study, ...

As users are exposed to an unprecedented range of security threats, a rich ecosystem of support mechanisms has emerged to assist the "last line of defense". Yet as these mechanisms become widely adopted in industry, a question arises: do they provide the support users actually need? In the first part of this talk, I will show that this is not always the case. Using phishing (one of the most prevalent and damaging cybercrimes) as a case study, I will present results from large-scale, real-world measurement studies that challenge common assumptions: that widely deployed mechanisms such as training and password managers are inherently effective, and that users primarily lack knowledge about this threat and how to detect it. Instead, I will show that phishing susceptibility is often an attention problem, due to limited and poorly surfaced indicators and cues. In the second part of the talk, I will discuss how we translate these insights into the design of novel systems that better support secure behavior. I will present a tailored countermeasure that assists users at critical decision points, and discuss its limitations in terms of increased user burden to introduce recent work on automating security decisions through AI-driven assistance. I will conclude by outlining key research challenges in designing security systems that are adaptive, context-aware, and robust.

Testing AI's Implicit World Models

Keyon Vafa Harvard University
(hosted by Krishna Gummadi)

05 Mar 2026, 10:00 am - 11:00 am

Kaiserslautern building G26, room 111

CIS@MPG Colloquium

Real-world AI systems must be robust across a wide range of conditions. One path to such robustness is if a model recovers a coherent structural understanding of its domain. But it is unclear how to measure, or even define, structural understanding. This talk will present theoretically-grounded definitions and metrics that test the structural recovery — or implicit "world models" — of generative models. We will propose different ways to formalize the concept of a world model, ...

Real-world AI systems must be robust across a wide range of conditions. One path to such robustness is if a model recovers a coherent structural understanding of its domain. But it is unclear how to measure, or even define, structural understanding. This talk will present theoretically-grounded definitions and metrics that test the structural recovery — or implicit "world models" — of generative models. We will propose different ways to formalize the concept of a world model, develop tests based on these notions, and apply them across domains. In applications ranging from testing whether LLMs apply logic to whether foundation models acquire Newtonian mechanics, we will see that models can make highly accurate predictions with incoherent world models. We will also connect these tests to a broader agenda of building generative models that are robust across downstream uses, incorporating ideas from statistics and the behavioral sciences. Developing reliable inferences about model behavior across tasks offer new ways to assess and improve the efficacy of generative models.

Auditing Personalized Content Recommendations: Data Collection Practices and Transparency Efforts on Online Platforms

Sepehr Mousavi Max Planck Institute for Software Systems

03 Mar 2026, 11:00 am - 12:00 pm

Saarbrücken building E1 5, room 005

SWS Student Defense Talks - Thesis Proposal

Online platforms increasingly rely on opaque content recommendation systems to curate personalized content. The black-box nature of these systems has raised significant societal concerns, such as formation of filter bubbles or promotion of extreme content. In response, regulatory bodies such as the European Commission have enacted digital regulations aimed at strengthening platform accountability and transparency. Achieving these goals requires audits of content recommendations and the mechanisms that govern them. To systematically audit personalized content recommendation systems deployed by online platforms, ...

Online platforms increasingly rely on opaque content recommendation systems to curate personalized content. The black-box nature of these systems has raised significant societal concerns, such as formation of filter bubbles or promotion of extreme content. In response, regulatory bodies such as the European Commission have enacted digital regulations aimed at strengthening platform accountability and transparency. Achieving these goals requires audits of content recommendations and the mechanisms that govern them. To systematically audit personalized content recommendation systems deployed by online platforms, in this thesis we investigate their data gathering practices and transparency efforts. To this end, this thesis makes the following three contributions: First, it investigates personal data collection practices of online platforms, as these practices directly enable personalized content recommendations. Second, by employing sockpuppet accounts, it conducts an audit of transparency efforts implemented by online platforms through studying explanations provided for organic content recommendations. Finally, this thesis contributes to improving transparency in the procedures and objectives of recommendation systems deployed by online platforms.

Infinite Groups: An Interplay of Automata, Logic, and Algebra

Ruiwen Dong University of Oxford
(hosted by Joël Ouaknine)

03 Mar 2026, 10:00 am - 11:00 am

Saarbrücken building E1 5, room 029

CIS@MPG Colloquium

Algorithmic problems in infinite groups arise naturally in the analysis of computational models, and they forge a fundamental link between algebra, logic, and computation. The central aim of my research is to understand the boundary between decidability and undecidability in computational algebra and group theory. Despite significant advances in the past 80 years, essential questions on the nature of these boundaries remain largely unanswered. The decision problems that we consider (the Diophantine Problem, membership problems, intersection problems) are particularly relevant to algorithmic verification, ...

Algorithmic problems in infinite groups arise naturally in the analysis of computational models, and they forge a fundamental link between algebra, logic, and computation. The central aim of my research is to understand the boundary between decidability and undecidability in computational algebra and group theory. Despite significant advances in the past 80 years, essential questions on the nature of these boundaries remain largely unanswered. The decision problems that we consider (the Diophantine Problem, membership problems, intersection problems) are particularly relevant to algorithmic verification, automata theory, and program analysis. Our methodology relates algorithms in groups to other central problems in infinite state systems, first-order arithmetic theories, and computer algebra. In this talk, I will illustrate a few recent results that expand the known decidability frontier, as well as some immediate and impactful open problems.

Security: A Next Frontier in AI Coding

Jingxuan He UC Berkeley
(hosted by Thorsten Holz)

02 Mar 2026, 10:00 am - 11:00 am

Bochum building MPI-SP, room MB/1-84/90

CIS@MPG Colloquium

AI is reshaping software development, yet this rapid adoption risks introducing a new generation of security debt. In this talk, I will present my research program aimed at transforming AI from a source of vulnerabilities to a security enabler. I will begin by introducing a systematic framework for quantifying AI-induced cybersecurity risks through two benchmarks: CyberGym, which evaluates AI agents’ offensive capabilities in vulnerability reproduction and discovery, and BaxBench, which measures LLMs’ propensity to introduce security flaws when generating code. ...

AI is reshaping software development, yet this rapid adoption risks introducing a new generation of security debt. In this talk, I will present my research program aimed at transforming AI from a source of vulnerabilities to a security enabler. I will begin by introducing a systematic framework for quantifying AI-induced cybersecurity risks through two benchmarks: CyberGym, which evaluates AI agents’ offensive capabilities in vulnerability reproduction and discovery, and BaxBench, which measures LLMs’ propensity to introduce security flaws when generating code. Building on these findings, I will present a secure-by-design approach for AI-generated code. This includes security-centric fine-tuning that embeds secure coding practices directly into models, as well as a decoding-time constraining mechanism based on type systems to enforce safety guarantees. Finally, I will conclude by discussing my future research on building broader security and trust in AI-driven software ecosystems.

Designing for Society: AI in Networks, Markets, and Platforms

Ana-Andreea Stoic Max Planck Institute for Intelligent Systems , Tübingen
(hosted by Manuel Gomez Rodriguez)

26 Feb 2026, 10:00 am - 11:00 am

Kaiserslautern building G26, room 111

CIS@MPG Colloquium

AI systems increasingly reshape our networks, markets, and platforms. When deployed in social environments— online platforms, labor markets, and information ecosystems—AI interacts with complex human behavior, strategic incentives, and structural inequality. This talk focuses on foundational challenges and opportunities for AI systems: how to design and evaluate algorithmic interventions in complex social environments. I will present recent work on causal inference under competing treatments, which formalizes how competition for user attention and strategic behavior among experimenters distort experimental data and invalidate naïve estimates of algorithmic impact. ...

AI systems increasingly reshape our networks, markets, and platforms. When deployed in social environments— online platforms, labor markets, and information ecosystems—AI interacts with complex human behavior, strategic incentives, and structural inequality. This talk focuses on foundational challenges and opportunities for AI systems: how to design and evaluate algorithmic interventions in complex social environments. I will present recent work on causal inference under competing treatments, which formalizes how competition for user attention and strategic behavior among experimenters distort experimental data and invalidate naïve estimates of algorithmic impact. By modeling experimentation as a strategic data acquisition problem, we show how evaluation itself becomes an optimization problem, and we derive mechanisms that recover meaningful estimates despite interference and competition. I connect this problem to deriving foundational properties of systems that enable reliable experimentation. Beyond this case study, the talk highlights broader implications for the design and evaluation of AI systems in networks, markets, and platforms. I argue that responsible deployment requires rethinking evaluation methodologies to account for incentives, feedback loops, and system-level effects, and I outline directions for how algorithmic and statistical tools can tackle these challenges.

Kugelblitz & Flamma sōlis — Executable,Cost-Aware Design-Space Exploration for Programmable PacketPipelines

Artemio Ageev Max Planck Institute for Software Systems

25 Feb 2026, 3:00 pm - 4:00 pm

Saarbrücken building E1 5, room 029

SWS Student Defense Talks - Thesis Proposal

Programmable hardware accelerator design is often driven by expert intuition and proxy metrics, with ground-truth feedback arriving only after costly RTL development, synthesis, and system integration.This weak feedback loop makes it difficult to compare alternatives, to understand which design choicesmatter, and to reason about feasibility, cost, and end-to-end performance in a principled way. This thesisargues for an executable, feasibility-first design methodology: designers should be able to (i) rapidly ruleout designs that cannot support the workloads of interest, ...

Programmable hardware accelerator design is often driven by expert intuition and proxy metrics, with ground-truth feedback arriving only after costly RTL development, synthesis, and system integration.This weak feedback loop makes it difficult to compare alternatives, to understand which design choicesmatter, and to reason about feasibility, cost, and end-to-end performance in a principled way. This thesisargues for an executable, feasibility-first design methodology: designers should be able to (i) rapidly ruleout designs that cannot support the workloads of interest, (ii) obtain synthesis-backed cost metrics, and (iii)evaluate realistic end-to-end performance in a full system context, enabling systematic exploration beyondproxy-driven reasoning. As a concrete and demanding case study, we focus on reconfigurable programmable packet-processingpipelines, whose many architectural knobs strongly affect both capability (which programs can execute)and performance under strict line-rate and latency constraints. The thesis first contributes Kugelblitz, a nframework that decouples programs from architectures, performs fast feasibility checking, automaticallygenerates synthesizable RTL for feasible designs, and evaluates cost (area/timing) and system-level per-formance via cycle-accurate full-system simulation (SimBricks). Building on this foundation, we proposeFlamma sôlis, an automated exploration layer that searches the ConfigIR design space to optimize a user-chosen objective under user-chosen constraints. Flamma sôlis will automate design-space exploration by combining multi-stage evaluation with practical systems optimizations: it progressively applies increasingly expensive checks for feasibility, cost, and end-to-end performance, while using early termination, caching, and parallel execution to keep exploration tractable under realistic time budgets. We expect this approach to make architectural trade-offs explicit, reduce reliance on misleading proxies, and automate the discovery

Foundations for Applied Cryptography: Post-quantum Authentication in Modern Applications

Michael Reichle ETH Zurich
(hosted by Catalin Hritcu)

24 Feb 2026, 10:00 am - 11:00 am

Bochum building MPI-SP, room MB/1-84/90

CIS@MPG Colloquium

Modern cryptography must navigate the transition to post-quantum security while meeting the demands of real-world systems. Due to the tension between post-quantum security and performance, this transition entails complex challenges on several fronts. In this talk, I will discuss the intricacies of provable security and how to establish solid theory for cryptographic building blocks that underpin our digital society. First, I will briefly cover my past work on searchable encryption and zero-knowledge proofs. In the main part of the talk, ...

Modern cryptography must navigate the transition to post-quantum security while meeting the demands of real-world systems. Due to the tension between post-quantum security and performance, this transition entails complex challenges on several fronts. In this talk, I will discuss the intricacies of provable security and how to establish solid theory for cryptographic building blocks that underpin our digital society. First, I will briefly cover my past work on searchable encryption and zero-knowledge proofs. In the main part of the talk, I will highlight my research on tools for advanced authentication in a post-quantum world. In particular, I will present my recent works on threshold signatures and blind signatures and discuss how to achieve stronger security guarantees under weak assumptions. In the end, I will highlight some open problems that are urgent to solve.

Workload-Aware Networks for Machine Learning

Weiyang Frank Wang MIT
(hosted by Antoine Kaufmann)

23 Feb 2026, 3:00 pm - 4:00 pm

Virtual talk

CIS@MPG Colloquium

Today's ML workloads require networks that connect tens to hundreds of thousands of GPUs. Existing GPU clusters rely on network designs offering any-to-any connectivity while remaining agnostic to the data they carry. These traits are carried over from legacy CPU datacenters, limiting scalability and hindering GPU utilization. I will present workload-aware networking, a systematic approach that exploits structures inherent to machine learning traffic to co-design networks with ML workloads. I start by showing that large language model (LLM)’s network traffic exhibits a surprising property: it stays within the bottom layer of a switched network. ...

Today's ML workloads require networks that connect tens to hundreds of thousands of GPUs. Existing GPU clusters rely on network designs offering any-to-any connectivity while remaining agnostic to the data they carry. These traits are carried over from legacy CPU datacenters, limiting scalability and hindering GPU utilization. I will present workload-aware networking, a systematic approach that exploits structures inherent to machine learning traffic to co-design networks with ML workloads. I start by showing that large language model (LLM)’s network traffic exhibits a surprising property: it stays within the bottom layer of a switched network. This insight enables rail-only network designs that dramatically reduce cost and complexity. I then discuss TopoOpt, which uses reconfigurable networks to adapt to the repetitive, predictable traffic patterns of ML training, delivering performance improvements over today's network designs. Finally, I show that understanding traffic content in the network unlocks new functionalities. I introduce Checkmate, a system that embeds checkpointing into the network through gradient replication, enabling per-iteration checkpointing with zero GPU overhead. I conclude with future directions for extending these principles to emerging workloads like agentic AI, and building orchestration frameworks that automate network-workload co-design.

Uncovering the Mechanics of Vision AI Model Failures: Textures and Beyond

Blaine Hoak University of Wisconsin-Madison
(hosted by Christof Paar)

16 Feb 2026, 10:00 am - 11:00 am

Bochum building MPI-SP, room MB/1-84/90

CIS@MPG Colloquium

Artificial Intelligence (AI) models now serve as core components to a range of mature applications but remain vulnerable to a wide spectrum of attacks. Yet, the research community has yet to develop systematic understanding of model vulnerability. In this talk, I approach uncovering the mechanics of model failure from two complementary perspectives: the design of attack techniques and the features models exploit. First, I introduce The Space of Adversarial Strategies, a robustness evaluation framework constructed through a decomposition and reformulation of current attacks. ...

Artificial Intelligence (AI) models now serve as core components to a range of mature applications but remain vulnerable to a wide spectrum of attacks. Yet, the research community has yet to develop systematic understanding of model vulnerability. In this talk, I approach uncovering the mechanics of model failure from two complementary perspectives: the design of attack techniques and the features models exploit. First, I introduce The Space of Adversarial Strategies, a robustness evaluation framework constructed through a decomposition and reformulation of current attacks. With this, I isolate the components that drive attack success and provide insights for future defenses. Motivated by the widespread failure observed, I then turn to the feature space, where I uncover differences in visual processing and the human visual system that explain failures in AI systems. My work reveals that textures, or repeated patterns, are a core mechanism for driving model generalization, yet are also a primary source of vulnerability. I present new methodologies to quantify a model’s bias toward texture, uncover learned associations between textures and objects, and identify textures in images. With this, I find that up to 90% of failures can be explained by mismatches in texture information, highlighting texture as an important, yet overlooked, influence in model robustness. I conclude by outlining future work for addressing trustworthiness issues in both classification and generative settings, with particular attention to (mis)alignment between biological and artificial intelligence.

Building Private, Secure and Transparent Digital Identity at Scale

Harjasleen Malvai University of Illinois Urbana–Champaign
(hosted by Peter Schwabe)

12 Feb 2026, 10:00 am - 11:00 am

Bochum building MPI-SP, room .

CIS@MPG Colloquium

Digital identity controls access to many everyday essentials, from getting paid to accessing banking and benefits, and sits in the critical path of modern security. Even end-to-end encrypted messaging depends on authentic cryptographic identity, i.e., a trustworthy way to learn the cryptographic keys needed to encrypt messages to the right person. In practice, centralized identity providers become both a single point of failure and a single point of control: they decide what assertions are supported, and their compromise or coercion enables targeted attacks that are hard to detect. ...

Digital identity controls access to many everyday essentials, from getting paid to accessing banking and benefits, and sits in the critical path of modern security. Even end-to-end encrypted messaging depends on authentic cryptographic identity, i.e., a trustworthy way to learn the cryptographic keys needed to encrypt messages to the right person. In practice, centralized identity providers become both a single point of failure and a single point of control: they decide what assertions are supported, and their compromise or coercion enables targeted attacks that are hard to detect. Yet, many strong proposals assume new ecosystems or significant user effort – assumptions that don’t hold for the systems and users we have today.

My thesis is that it is possible to make identity infrastructure more private, secure, and transparent at scale while designing for existing user and ecosystem constraints. I’ll present two case studies across the identity lifecycle.

First, I’ll discuss key transparency for end-to-end encrypted messaging: how to make centralized key directories auditable so that even a compromised server cannot quietly swap keys for targeted users. I’ll show how this line of work evolved from formal privacy and history guarantees (SEEMless) to a billion-user architecture (Parakeet) built for real operational constraints (such as distributed storage and long time horizons), which now underpins the key transparency deployments in WhatsApp and Facebook Messenger.

Second, I’ll briefly describe credential bootstrapping with accountability (CanDID): a path to establishing privacy-preserving credentials from existing web sources without assuming a fully mature verifiable-credential ecosystem, while supporting practical requirements like revocation, recovery, and compliance checks.

I’ll close by highlighting ongoing work and open problems motivated by these systems and sketch a research agenda for building auditable and privacy-preserving infrastructure at internet scale, for identity and beyond.

Beyond Static Alignment: Advancing Trustworthy and Socially Intelligent AI Assistant

Jieyu Zhao University of Southern California
(hosted by Abhilasha Ravichander, Asia Biega)

11 Feb 2026, 3:00 pm - 4:00 pm

Virtual talk

CIS@MPG Colloquium

Large language models have transformed how we interact with technology, but most deployed systems remain reactive and rely on static, one-size-fits-all alignment, limiting trust in real-world, high-stakes settings. This talk explores a path toward personalized, trustworthy AI assistants that can reason, continually adapt, and align with user values while remaining safe and socially appropriate. I will introduce Computer-Using Agents that combine GUI operations and code generation to efficiently complete real-world tasks, and present CoAct-1, a multi-agent system that coordinates planning and execution. ...

Large language models have transformed how we interact with technology, but most deployed systems remain reactive and rely on static, one-size-fits-all alignment, limiting trust in real-world, high-stakes settings. This talk explores a path toward personalized, trustworthy AI assistants that can reason, continually adapt, and align with user values while remaining safe and socially appropriate. I will introduce Computer-Using Agents that combine GUI operations and code generation to efficiently complete real-world tasks, and present CoAct-1, a multi-agent system that coordinates planning and execution. I will then discuss SEA, a black-box auditing algorithm for uncovering LLM knowledge deficiencies and probing failure modes such as hallucination under limited query budgets. Next, I will present WildFeedback, a framework that learns in-situ user preferences from natural, multi-turn interactions, enabling continual personalization beyond lab-style preference data. Finally, I will highlight ongoing work on proactive social intelligence and culturally grounded evaluation, spanning intention understanding, reasoning consistency, and value-aligned collaboration. Together, these advances move us closer to AI systems that don’t just respond, but adapt responsibly and assist people in ways that are reliable, equitable, and context-aware.

Towards More Trustworthy and Efficient Systems

Hugo Lefeuvre University of British Columbia
(hosted by Gilles Barthe, Deepak Garg)

10 Feb 2026, 10:00 am - 11:00 am

Bochum building MPI-SP

CIS@MPG Colloquium

Software usages grow much faster than computing hardware. Paradoxically, modern software systems make inefficient use of computing resources: decades of feature creep have made them too generic to perform well on any specific task. These decades of growth have also made systems fragile -- and frankly insecure, glued together from countless components of diverse origins, critical or confidential, buggy, risky, AI-generated, or otherwise untrustworthy. This talk will take the audience on a journey at the intersection of systems and security. ...

Software usages grow much faster than computing hardware. Paradoxically, modern software systems make inefficient use of computing resources: decades of feature creep have made them too generic to perform well on any specific task. These decades of growth have also made systems fragile -- and frankly insecure, glued together from countless components of diverse origins, critical or confidential, buggy, risky, AI-generated, or otherwise untrustworthy. This talk will take the audience on a journey at the intersection of systems and security. I will give an overview of my past and present works applying isolation and specialization techniques to make systems more more trustworthy and more efficient (Unikraft, FlexOS, CHERIoT), demonstrating the shortcomings of these techniques and how to address them (CIVs, SoK), and getting these advances deployed to better the real world. I will conclude with a forward-looking perspective on my research and impact plans towards achieving this vision of more robust and efficient software systems.

Modern Challenges in Learning Theory

Nataly Brukhim Institute for Advanced Study and the Center for Discrete Mathematics and Theoretical Computer Science
(hosted by Derek Dreyer)

09 Feb 2026, 10:00 am - 11:00 am

Saarbrücken building E1 5, room 029

CIS@MPG Colloquium

Machine learning relies on its ability to generalize from limited data, yet a principled theoretical understanding of generalization remains incomplete. While binary classification is well understood in the classical PAC framework, even its natural extension to multiclass learning is substantially more challenging. In this talk, I will present recent progress in multiclass learning that characterizes when generalization is possible and how much data is required, resolving a long-standing open problem on extending the Vapnik–Chervonenkis (VC) dimension beyond the binary setting. ...

Machine learning relies on its ability to generalize from limited data, yet a principled theoretical understanding of generalization remains incomplete. While binary classification is well understood in the classical PAC framework, even its natural extension to multiclass learning is substantially more challenging. In this talk, I will present recent progress in multiclass learning that characterizes when generalization is possible and how much data is required, resolving a long-standing open problem on extending the Vapnik–Chervonenkis (VC) dimension beyond the binary setting. I will then turn to complementary results on efficient learning via boosting. We extend boosting theory to multiclass classification, while maintaining computational and statistical efficiency even for unbounded label spaces. Lastly, I will discuss generalization in sequential learning settings, where a learner interacts with an environment over time. We introduce a new framework that subsumes classically studied settings (bandits and statistical queries) together with a combinatorial parameter that bounds the number of interactions required for learning.

Symmetry in Neural Network Parameter Spaces

Bo Zhao University of California, San Diego
(hosted by Bernt Schiele)

05 Feb 2026, 3:00 pm - 4:00 pm

Virtual talk

CIS@MPG Colloquium

In many neural networks, different parameter values can yield the same loss, often due to underlying symmetries in the parameter space. We introduce a general framework for continuous symmetries based on equivariance in activation functions, revealing a new set of nonlinear, data-dependent symmetries. Using these symmetries, we derive topological properties of the minima and identify conserved quantities in gradient flows. As a practical application of parameter space symmetries, we present an algorithm called symmetry teleportation, which leverages parameter symmetries to search loss level sets for points with desired properties. ...

In many neural networks, different parameter values can yield the same loss, often due to underlying symmetries in the parameter space. We introduce a general framework for continuous symmetries based on equivariance in activation functions, revealing a new set of nonlinear, data-dependent symmetries. Using these symmetries, we derive topological properties of the minima and identify conserved quantities in gradient flows. As a practical application of parameter space symmetries, we present an algorithm called symmetry teleportation, which leverages parameter symmetries to search loss level sets for points with desired properties. This approach leads to improvements in both convergence and generalization. We also discuss future work on foundational understanding of deep learning that reveals, characterizes, and leverages hidden structures to shape optimization and model behavior, enabling more efficient and interpretable AI systems.

The Skolem Problem: a century-old enigma at the heart of computation

Joël Ouaknine Max Planck Institute for Software Systems

04 Feb 2026, 12:15 pm - 1:15 pm

Saarbrücken building E1 5, room 002

Joint Lecture Series

It has been described as the most important problem whose decidability is still open: the Skolem Problem asks how to determine algorithmically whether a given integer linear recurrence sequence (such as the Fibonacci numbers) has a zero term. This deceptively simple question arises across a wide range of topics in computer science and mathematics, from program verification and automata theory to number theory and logic. This talk traces the history of the Skolem Problem: from the early 1930s to the current frontier of one of the most enduring open questions in computer science.

Toward Practical and Scalable Systems Evaluation for Post-Moore Datacenters

Hejing Li Max Planck Institute for Software Systems

02 Feb 2026, 10:00 am - 11:00 am

Saarbrücken building E1 5, room 029

SWS Student Defense Talks - Thesis Proposal

Having a solid system evaluation under realistic workloads and environments is essential for datacenter network research. Modern datacenter systems are shaped by a wide range of factors - including hardware behavior, software components, and complex interactions between them - whose combined effects on end-to-end performance is often difficult to predict. However, evaluating such systems on physical testbeds is frequently infeasible due to scale, cost, limited experimental control, and the increasing reliance on specialized hardware that may be unavailable or still under development. ...

Having a solid system evaluation under realistic workloads and environments is essential for datacenter network research. Modern datacenter systems are shaped by a wide range of factors - including hardware behavior, software components, and complex interactions between them - whose combined effects on end-to-end performance is often difficult to predict. However, evaluating such systems on physical testbeds is frequently infeasible due to scale, cost, limited experimental control, and the increasing reliance on specialized hardware that may be unavailable or still under development. As a result, researchers often turn to simulation. Existing simulators, however, typically focus on isolated components, such as network protocols, host architectures, or hardware RTL, making it challenging to conduct faithful end-to-end evaluations or to scale experiments to realistic datacenter sizes. The goal of this thesis is to provide researchers with practical and scalable tools and methodologies for conducting faithful end-to-end system evaluation targeting modern datacenters. To this end, this thesis is structured around three components. First, it introduced SimBricks, an end-to-end simulation framework that enables the modular composition of best-of-breed simulators, allowing unmodified hardware and software system implementations to be evaluated together within a single virtual testbed. Second, it presented SplitSim, a simulation framework designed to make large-scale end-to-end evaluation practical by supporting mixed-.delity simulation, controlled decomposition, and efficient resource utilization. Finally, the thesis will include a set of case studies that apply SplitSim to the evaluation of large-scale networked systems, demonstrating a concrete evaluation workflow and distilling lessons on navigating trade-offs between fidelity, scalability, and simulation cost.

Towards a Privacy-First Future for Wearable AI

Shwetha Rajaram University of Michigan
(hosted by Carmela Troncoso)

02 Feb 2026, 10:00 am - 11:00 am

Bochum building MPI-SP, room MB/1-84/90

CIS@MPG Colloquium

Wearable AI systems, such as display-free smartglasses and augmented reality (AR) devices, are emerging as a new paradigm of personal computing. However, they introduce novel privacy harms for both users and bystanders, relying on multimodal data that reveals people's identities, activities, and surroundings. In this talk, I demonstrate that by building wearable AI systems for adaptation – enabling users and systems to dynamically adjust interactions across contexts – we can mitigate privacy risks while supporting meaningful use. ...

Wearable AI systems, such as display-free smartglasses and augmented reality (AR) devices, are emerging as a new paradigm of personal computing. However, they introduce novel privacy harms for both users and bystanders, relying on multimodal data that reveals people's identities, activities, and surroundings. In this talk, I demonstrate that by building wearable AI systems for adaptation – enabling users and systems to dynamically adjust interactions across contexts – we can mitigate privacy risks while supporting meaningful use. First, through co-design studies with AR and security & privacy researchers, I establish a design space of alternatives to traditional interactions that reduce data exposure while maintaining functionality. To embed this privacy mindset in designers’ workflows, I develop interactive tools for critically analyzing risks in their own applications. Finally, I investigate how to automatically balance user experience and privacy for wearable AI users and bystanders, proposing an optimization framework for system-driven "negotiations" of sensing capabilities. I conclude by discussing open challenges and future research towards this vision of privacy-adaptive wearable AI, from technical mechanisms to implications for policy and regulation.

Pushing the Boundaries in Stateless Model Checking

Iason Marmanis Max Planck Institute for Software Systems

28 Jan 2026, 3:30 pm - 4:30 pm

Kaiserslautern building G26, room 111

SWS Student Defense Talks - Thesis Defense

Stateless model checking (SMC) verifies a concurrent program by systematically exploring its state space. To combat the state-space explosion problem, SMC is frequently combined with Dynamic Partial Order Reduction (DPOR), a technique that avoids exploring executions that are deemed equivalent to one another. Still, DPOR’s scalability is limited by the size of the input program.

This thesis improves scalability by (i) providing direct support for common coding patterns that would otherwise have to be handled inefficiently, ...

Stateless model checking (SMC) verifies a concurrent program by systematically exploring its state space. To combat the state-space explosion problem, SMC is frequently combined with Dynamic Partial Order Reduction (DPOR), a technique that avoids exploring executions that are deemed equivalent to one another. Still, DPOR’s scalability is limited by the size of the input program.

This thesis improves scalability by (i) providing direct support for common coding patterns that would otherwise have to be handled inefficiently, and (ii) combining DPOR with other state-space reduction techniques. Key to our contributions is a DPOR framework that generalizes an existing state-of-the-art algorithm.

Building Robotics Foundation Models with Reasoning in the Loop

Jiafei Duan University of Washington
(hosted by Christian Theobalt)

28 Jan 2026, 10:00 am - 11:00 am

Saarbrücken building E1 5, room 029

CIS@MPG Colloquium

Recent advances in generative AI have demonstrated the power of scaling: large language and vision models trained on internet-scale data now exhibit remarkable capabilities in perception, generation, and reasoning. These successes have inspired growing interest in bringing foundation-model paradigms to robotics, with the goal of moving beyond task-specific autonomy in constrained environments toward general-purpose robots that can operate robustly in open-world settings. However, robotics fundamentally differs from language and vision. Robot learning cannot rely on passive internet data at scale, ...

Recent advances in generative AI have demonstrated the power of scaling: large language and vision models trained on internet-scale data now exhibit remarkable capabilities in perception, generation, and reasoning. These successes have inspired growing interest in bringing foundation-model paradigms to robotics, with the goal of moving beyond task-specific autonomy in constrained environments toward general-purpose robots that can operate robustly in open-world settings. However, robotics fundamentally differs from language and vision. Robot learning cannot rely on passive internet data at scale, and collecting large-scale, high-quality embodied interaction data remains expensive and slow. As a result, simply scaling data and model parameters is insufficient. To build general-purpose and robust robotics foundation models, we must instead ask: how can robots learn more from less data—and continue to improve over time? In this talk, I argue that reasoning in the loop offers a promising path forward. Rather than treating reasoning as a downstream capability applied after learning, I show how reasoning can be integrated directly into the learning process itself. This enables robots to learn from structured feedback, temporal context, and failure, thereby compensating for data scarcity and improving generalization. I will present a unified research agenda along three axes. First, I introduce approaches for spatial reasoning, enabling robots to ground language in 3D space and reason about object relationships for precise manipulation. Second, I discuss temporal reasoning, focusing on memory-centric models that retain, query, and reason over past observations to support long-horizon, high-precision control. Third, I show how reasoning over failures allows robots to understand why actions fail and use that understanding to self-improve, increasing robustness without additional supervision. Together, these results reframe robotics foundation models as systems that learn through reasoning, closing the loop between perception, action, and structured inference to enable self-improving autonomy.

On Predictive Accuracy and Fairness in Human-AI Teams

Nastaran Okati Max Planck Institute for Software Systems

26 Jan 2026, 1:30 pm - 2:30 pm

Kaiserslautern building G26, room 111

SWS Student Defense Talks - Thesis Defense

Human-AI teams are increasingly deployed across high-stakes domains such as medical diagnosis, content moderation, and recruitment. These teams strive to leverage the strengths of both human decision-makers and AI models, while mitigating their respective weaknesses. For instance, such hybrid systems can combine the efficiency and quality of AI-generated decisions with human experience and domain knowledge. Moreover, they can be free of statistical biases in AI models and cognitive biases inherent to human decision-making. The combined system's performance can hence surpass that of the human or the AI model in isolation, ...

Human-AI teams are increasingly deployed across high-stakes domains such as medical diagnosis, content moderation, and recruitment. These teams strive to leverage the strengths of both human decision-makers and AI models, while mitigating their respective weaknesses. For instance, such hybrid systems can combine the efficiency and quality of AI-generated decisions with human experience and domain knowledge. Moreover, they can be free of statistical biases in AI models and cognitive biases inherent to human decision-making. The combined system's performance can hence surpass that of the human or the AI model in isolation, achieving human-AI complementarity.

Despite the prevalence of such hybrid teams, most existing approaches remain heuristic-driven, ignore human users and their error models, and hence fail to optimize the human-AI team performance. My thesis strives to close these gaps and fully unlock this potential for complementarity by (i) optimizing for human-AI collaboration, rather than model-centric performance—ensuring these models best support and complement human decision-makers who utilize them—and (ii) designing efficient post-processing algorithms that ensure fairness of high-stakes decisions made by these teams—thereby supporting people who are affected by such decisions.

Where are all the fixed points?

Benjamin Kaminski Fachrichtung Informatik - Saarbrücken

14 Jan 2026, 12:15 pm - 1:15 pm

Saarbrücken building E1 5, room 002

Joint Lecture Series

Fixed points are a recurring theme in computer science and related fields: shortest paths, game equilibria, semantics and verification, social choice theory, or dynamical systems are only a few of many instances. Various fixed point theorems - e.g. the famous Kleene fixed point theorem - state that fixed points emerge as limits of suitably seeded fixed point iterations.

I will showcase in this talk a purely algebraic way of reasoning about fixed points which we call the algebra of iterative constructions (AIC). ...

Fixed points are a recurring theme in computer science and related fields: shortest paths, game equilibria, semantics and verification, social choice theory, or dynamical systems are only a few of many instances. Various fixed point theorems - e.g. the famous Kleene fixed point theorem - state that fixed points emerge as limits of suitably seeded fixed point iterations.

I will showcase in this talk a purely algebraic way of reasoning about fixed points which we call the algebra of iterative constructions (AIC). It lead us to discover novel fixed point theorems as well as prove fully algebraically well-established ones (e.g. Kleene, Tarksi-Kantorovic, and k-induction). As for the novel fixed point theorems, we will (given a suitable setting) obtain a method that maps any point to two canonical corresponding fixed points of a function by way of a limit of some fixed point iteration.

Our algebra is mechanized in Isabelle/HOL. Isabelle's sledgehammer tool is able to find proofs of fixed point theorems fully automatically whereas sledgehammer is not able to find such proofs relying only on Isabelle’s standard libraries.

From the audience I would like to learn: Do you encounter fixed points or fixed point problems in your work? Do you perhaps work on algorithms for solving fixed point problems?

I look forward to talking to you!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Need help?