Events

Upcoming events

Testing AI's Implicit World Models

Keyon Vafa Harvard University
(hosted by Krishna Gummadi)
05 Mar 2026, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Real-world AI systems must be robust across a wide range of conditions. One path to such robustness is if a model recovers a coherent structural understanding of its domain. But it is unclear how to measure, or even define, structural understanding. This talk will present theoretically-grounded definitions and metrics that test the structural recovery — or implicit "world models" — of generative models. We will propose different ways to formalize the concept of a world model, ...
Real-world AI systems must be robust across a wide range of conditions. One path to such robustness is if a model recovers a coherent structural understanding of its domain. But it is unclear how to measure, or even define, structural understanding. This talk will present theoretically-grounded definitions and metrics that test the structural recovery — or implicit "world models" — of generative models. We will propose different ways to formalize the concept of a world model, develop tests based on these notions, and apply them across domains. In applications ranging from testing whether LLMs apply logic to whether foundation models acquire Newtonian mechanics, we will see that models can make highly accurate predictions with incoherent world models. We will also connect these tests to a broader agenda of building generative models that are robust across downstream uses, incorporating ideas from statistics and the behavioral sciences. Developing reliable inferences about model behavior across tasks offer new ways to assess and improve the efficacy of generative models.
Read more

From Interventions to Assistants: Toward Intelligent Security Support

Daniele Lain ETH Zurich
(hosted by Carmela Troncoso)
12 Mar 2026, 10:00 am - 10:45 am
Bochum building MPI-SP, room tba
CIS@MPG Colloquium
As users are exposed to an unprecedented range of security threats, a rich ecosystem of support mechanisms has emerged to assist the "last line of defense". Yet as these mechanisms become widely adopted in industry, a question arises: do they provide the support users actually need? In the first part of this talk, I will show that this is not always the case. Using phishing (one of the most prevalent and damaging cybercrimes) as a case study, ...
As users are exposed to an unprecedented range of security threats, a rich ecosystem of support mechanisms has emerged to assist the "last line of defense". Yet as these mechanisms become widely adopted in industry, a question arises: do they provide the support users actually need? In the first part of this talk, I will show that this is not always the case. Using phishing (one of the most prevalent and damaging cybercrimes) as a case study, I will present results from large-scale, real-world measurement studies that challenge common assumptions: that widely deployed mechanisms such as training and password managers are inherently effective, and that users primarily lack knowledge about this threat and how to detect it. Instead, I will show that phishing susceptibility is often an attention problem, due to limited and poorly surfaced indicators and cues. In the second part of the talk, I will discuss how we translate these insights into the design of novel systems that better support secure behavior. I will present a tailored countermeasure that assists users at critical decision points, and discuss its limitations in terms of increased user burden to introduce recent work on automating security decisions through AI-driven assistance. I will conclude by outlining key research challenges in designing security systems that are adaptive, context-aware, and robust.
Read more

Strong Program Logics for Weak Memory and Even Stronger Types for Tactic Programming

Jan-Oliver Kaiser Max Planck Institute for Software Systems
19 Mar 2026, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Computers have become ubiquitous in everyday life and so have bugs in programs running on those computers. Research in the field of programming languages and verification has produced countless ways to attack the problem of software defects. This thesis concerns itself with two established techniques being applied to an unconventional setting.

Firstly, in the category of extending the applicability of program verification to more realistic settings, we demonstrate how to verify high-performance algorithms and data structures making use of highly efficient memory access patterns that are not automatically synchronized with main memory or other processors’ caches. ...
Computers have become ubiquitous in everyday life and so have bugs in programs running on those computers. Research in the field of programming languages and verification has produced countless ways to attack the problem of software defects. This thesis concerns itself with two established techniques being applied to an unconventional setting.

Firstly, in the category of extending the applicability of program verification to more realistic settings, we demonstrate how to verify high-performance algorithms and data structures making use of highly efficient memory access patterns that are not automatically synchronized with main memory or other processors’ caches. Hardware and programming languages that expose these weakly or un-synchronized memory accesses are said to have weak memory models. The lack of synchronization in weak memory models goes directly against the assumption of sequential consistency which still sits at the heart of most verification works. Concretely, we show how to perform verification in weak memory models using an existing program logic framework, Iris, that was traditionally limited to sequential consistency. In building on Iris, we inherit its mechanized proof of soundness of the core logic as well as the ability to perform mechanized program verification.

Secondly, we propose to bring the benefits of dependent types to the process of writing and automating proofs in the Rocq proof assistant. This aims to adress the limitations of Rocq's oldest — and, for a long time, only — tactic language: Ltac. Ltac's pitfalls are numerous and it is arguably unable to fulfill the requirements of large verification projects. Our contribution is a new tactic language called Mtac2. Mtac2 is based on Mtac, a principled metaprogramming language for Rocq offering strongly typed primitives based on Rocq’s own dependent type system. Mtac’s primitives could already be used to implement some tactics but it lacks the ability to directly interact with Rocq’s proof state and to perform backwards reasoning on it. Mtac2 extends Mtac with support for backwards reasoning and keeps in line with Mtac’s tradition of strong types by introducing the concept of typed tactics. Typed tactics statically track the expected type of the current goal(s) and can rule out entire classes of mistakes that often plague Ltac tactics.
Read more

Recent events

Auditing Personalized Content Recommendations: Data Collection Practices and Transparency Efforts on Online Platforms

Sepehr Mousavi Max Planck Institute for Software Systems
03 Mar 2026, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 005
SWS Student Defense Talks - Thesis Proposal
Online platforms increasingly rely on opaque content recommendation systems to curate personalized content. The black-box nature of these systems has raised significant societal concerns, such as formation of filter bubbles or promotion of extreme content. In response, regulatory bodies such as the European Commission have enacted digital regulations aimed at strengthening platform accountability and transparency. Achieving these goals requires audits of content recommendations and the mechanisms that govern them. To systematically audit personalized content recommendation systems deployed by online platforms, ...
Online platforms increasingly rely on opaque content recommendation systems to curate personalized content. The black-box nature of these systems has raised significant societal concerns, such as formation of filter bubbles or promotion of extreme content. In response, regulatory bodies such as the European Commission have enacted digital regulations aimed at strengthening platform accountability and transparency. Achieving these goals requires audits of content recommendations and the mechanisms that govern them. To systematically audit personalized content recommendation systems deployed by online platforms, in this thesis we investigate their data gathering practices and transparency efforts. To this end, this thesis makes the following three contributions: First, it investigates personal data collection practices of online platforms, as these practices directly enable personalized content recommendations. Second, by employing sockpuppet accounts, it conducts an audit of transparency efforts implemented by online platforms through studying explanations provided for organic content recommendations. Finally, this thesis contributes to improving transparency in the procedures and objectives of recommendation systems deployed by online platforms.
Read more

Infinite Groups: An Interplay of Automata, Logic, and Algebra

Ruiwen Dong University of Oxford
(hosted by Joël Ouaknine)
03 Mar 2026, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
CIS@MPG Colloquium
Algorithmic problems in infinite groups arise naturally in the analysis of computational models, and they forge a fundamental link between algebra, logic, and computation. The central aim of my research is to understand the boundary between decidability and undecidability in computational algebra and group theory. Despite significant advances in the past 80 years, essential questions on the nature of these boundaries remain largely unanswered. The decision problems that we consider (the Diophantine Problem, membership problems, intersection problems) are particularly relevant to algorithmic verification, ...
Algorithmic problems in infinite groups arise naturally in the analysis of computational models, and they forge a fundamental link between algebra, logic, and computation. The central aim of my research is to understand the boundary between decidability and undecidability in computational algebra and group theory. Despite significant advances in the past 80 years, essential questions on the nature of these boundaries remain largely unanswered. The decision problems that we consider (the Diophantine Problem, membership problems, intersection problems) are particularly relevant to algorithmic verification, automata theory, and program analysis. Our methodology relates algorithms in groups to other central problems in infinite state systems, first-order arithmetic theories, and computer algebra. In this talk, I will illustrate a few recent results that expand the known decidability frontier, as well as some immediate and impactful open problems.
Read more

Security: A Next Frontier in AI Coding

Jingxuan He UC Berkeley
(hosted by Thorsten Holz)
02 Mar 2026, 10:00 am - 11:00 am
Bochum building MPI-SP, room MB/1-84/90
CIS@MPG Colloquium
AI is reshaping software development, yet this rapid adoption risks introducing a new generation of security debt. In this talk, I will present my research program aimed at transforming AI from a source of vulnerabilities to a security enabler. I will begin by introducing a systematic framework for quantifying AI-induced cybersecurity risks through two benchmarks: CyberGym, which evaluates AI agents’ offensive capabilities in vulnerability reproduction and discovery, and BaxBench, which measures LLMs’ propensity to introduce security flaws when generating code. ...
AI is reshaping software development, yet this rapid adoption risks introducing a new generation of security debt. In this talk, I will present my research program aimed at transforming AI from a source of vulnerabilities to a security enabler. I will begin by introducing a systematic framework for quantifying AI-induced cybersecurity risks through two benchmarks: CyberGym, which evaluates AI agents’ offensive capabilities in vulnerability reproduction and discovery, and BaxBench, which measures LLMs’ propensity to introduce security flaws when generating code. Building on these findings, I will present a secure-by-design approach for AI-generated code. This includes security-centric fine-tuning that embeds secure coding practices directly into models, as well as a decoding-time constraining mechanism based on type systems to enforce safety guarantees. Finally, I will conclude by discussing my future research on building broader security and trust in AI-driven software ecosystems.
Read more

Archive