Events 2023

Why do large language models align with human brains: insights, opportunities, and challenges

Mariya Toneva Max Planck Institute for Software Systems
07 Jun 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Language models that have been trained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language. Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. In this talk, we will discuss a series of recent works that make progress towards this question along different dimensions. The unifying principle among these works that allows us to make scientific claims about why one black box (language model) aligns with another black box (the human brain) is our ability to make specific perturbations in the language model and observe their effect on the alignment with the brain. ...
Language models that have been trained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language. Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. In this talk, we will discuss a series of recent works that make progress towards this question along different dimensions. The unifying principle among these works that allows us to make scientific claims about why one black box (language model) aligns with another black box (the human brain) is our ability to make specific perturbations in the language model and observe their effect on the alignment with the brain. Building on this approach, these works reveal that the observed alignment is due to more than next-word prediction and word-level semantics and is partially related to joint processing of select linguistic information in both systems. Furthermore, we find that the brain alignment can be improved by training a language model to summarize narratives. Taken together, these works make progress towards determining the sufficient and necessary conditions under which language in machines aligns with language in the brain.
Read more

Digital Humans: From Sensor Measurements to Deeper Understanding and Synthesis

Marc Habermann MPI-INF - D6
03 May 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
The earliest paintings depicting a human date back to the Stone Age. Since that, sensing devices have been improved and nowadays, digital sensing can be found in everyone’s home. Nonetheless, the main element in many image compositions still is the human, i.e. most of the images one finds in the media, such as on the Internet or in textbooks and magazines, contain humans as the main point of attention. Since sensing of humans became more accurate, ...
The earliest paintings depicting a human date back to the Stone Age. Since that, sensing devices have been improved and nowadays, digital sensing can be found in everyone’s home. Nonetheless, the main element in many image compositions still is the human, i.e. most of the images one finds in the media, such as on the Internet or in textbooks and magazines, contain humans as the main point of attention. Since sensing of humans became more accurate, automated, affordable, and most importantly digital, it comes along with many opportunities in downstream applications such as telepresence, AR/VR, and health care, to only name a few. At the same time, this development also introduces major challenges, since raw sensing measurements cannot be immediately processed by those applications. Instead, it requires algorithms, which are capable of automatically analyzing human-related information and even synthesizing new content. In this talk, I will present some of our recent methods on the analysis and synthesis (rendering) of humans from digital measurements on the basis of Graphics, Vision, and Machine Learning concepts.
Read more

2vyper: Contracts for Smart Contracts

Alexander J. Summers University of British Columbia
27 Apr 2023, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Smart contract languages are increasingly popular and numerous, and their programming models and challenges are somewhat unusual. The ubiquitous presence of untrusted external code in such a system makes classical contracts unsuitable for safety verification, while the intentional presence of (potentially-mutating) callbacks via unknown code makes standard static analysis techniques imprecise in general. On the other hand, smart contract languages such as Vyper (for Ethereum) tightly encapsulate direct access to the program's state. In this talk I'll present a methodology for expressing contracts for this language, ...
Smart contract languages are increasingly popular and numerous, and their programming models and challenges are somewhat unusual. The ubiquitous presence of untrusted external code in such a system makes classical contracts unsuitable for safety verification, while the intentional presence of (potentially-mutating) callbacks via unknown code makes standard static analysis techniques imprecise in general. On the other hand, smart contract languages such as Vyper (for Ethereum) tightly encapsulate direct access to the program's state. In this talk I'll present a methodology for expressing contracts for this language, in a way that supports sound verification of safety properties, with deductive verification tooling (converting Vyper to Viper) to automate the corresponding proofs.

Based on joint work with Christian Bräm, Marco Eilers, Peter Müller and Robin Sierra; see also the accompanying paper at OOPSLA 2021. --- Please contact Office for Zoom link information.
Read more

Automating cryptographic code generation

Yuval Yarom Ruhr University Bochum
24 Apr 2023, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 029
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Cryptography provides the data protection mechanisms that underlie security and privacy in the modern connected world. Given this pivotal role, implementations of cryptographic code must not only be correct, but also meet stringent performance and security requirements. Achieving these aims is often difficult and requires significant investment in software development and manual tuning.

This talk presents two approaches for automating the task of generating correct, secure, and efficient cryptographic code. The first, Rosita, uses a power consumption emulator to detect unintended leaky interactions between values in the microarchitecture. ...
Cryptography provides the data protection mechanisms that underlie security and privacy in the modern connected world. Given this pivotal role, implementations of cryptographic code must not only be correct, but also meet stringent performance and security requirements. Achieving these aims is often difficult and requires significant investment in software development and manual tuning.

This talk presents two approaches for automating the task of generating correct, secure, and efficient cryptographic code. The first, Rosita, uses a power consumption emulator to detect unintended leaky interactions between values in the microarchitecture. It then rewrites the code to eliminate these interactions and produce code that is resistant to power analysis. The second, CryptOpt, uses evolutionary computation to search for the most efficient constant-time implementation of a cryptographic function. It then formally verifies that the produced implementation is semantically equivalent to the original code.

Rosita is a joint work with Lejla Batina, Łukasz Chmielewski, Francesco Regazzoni, Niels Samwel, Madura A. Shelton, and Markus Wagner.CryptOpt is a joint work with Adam Chlipala, Chitchanok Chuengsatiansup, Owen Conoly, Andres Erbsen, Daniel Genkin, Jason Gross, Joel Kuepper, Chuyue Sun, Samuel Tian, Markus Wagner, and David Wu.

Please contact office for zoom link information.
Read more

Faster Approximate String Matching: Now with up to 500 Errors

Philip Wellnitz MPI-INF - D1
05 Apr 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
String matching algorithms are everywhere. You use them every day, perhaps even without noticing. Whenever your mail client filters out spam, whenever your spell checker flags a spelling mistake, whenever your search engine of choice presents you with results it deems useful, to list just some examples. For many applications, we need more than finding exact occurrences of a pattern string in a text. Hence, in this talk, I consider the classical problem of finding parts of a text that are close to a given pattern string. ...
String matching algorithms are everywhere. You use them every day, perhaps even without noticing. Whenever your mail client filters out spam, whenever your spell checker flags a spelling mistake, whenever your search engine of choice presents you with results it deems useful, to list just some examples. For many applications, we need more than finding exact occurrences of a pattern string in a text. Hence, in this talk, I consider the classical problem of finding parts of a text that are close to a given pattern string.

In particular, I consider the problem of finding all parts of a given text that can be transformed to a given pattern string by inserting, deleting, or substituting at most k characters—the Approximate String Matching problem. The first algorithms for this problem were developed in the 1970s, some of them are relevant even today. After a long line of research, around the year 2000, the progress in obtaining faster algorithms for Approximate String Matching came to a halt—even though the interest did not fade.

In a recent series of papers, I and coauthors improved the more than 20-year-old state-of-the-art algorithms. In this talk, I highlight the new tools and techniques that we developed for Approximate String Matching. Further, I give an overview of additional applications of our techniques to settings where text and pattern are given in a compressed setting or where we allow text and pattern to change dynamically.
Read more

Statistical inference with privacy and computational constraints

Maryam Aliakbarpour Boston University and Northeastern University
02 Mar 2023, 9:30 am - 10:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 002
SWS Colloquium
The vast amount of digital data we create and collect has revolutionized many scientific fields and industrial sectors. Yet, despite our success in harnessing this transformative power of data, computational and societal trends emerging from the current practices of data science necessitate upgrading our toolkit for data analysis. In this talk, we discuss how practical considerations such as privacy and memory limits affect statistical inference tasks. In particular, we focus on two examples: First, we consider hypothesis testing with privacy constraints. ...
The vast amount of digital data we create and collect has revolutionized many scientific fields and industrial sectors. Yet, despite our success in harnessing this transformative power of data, computational and societal trends emerging from the current practices of data science necessitate upgrading our toolkit for data analysis. In this talk, we discuss how practical considerations such as privacy and memory limits affect statistical inference tasks. In particular, we focus on two examples: First, we consider hypothesis testing with privacy constraints. More specifically, how one can design an algorithm that tests whether two data features are independent or correlated with a nearly-optimal number of data points while preserving the privacy of the individuals participating in the data set. Second, we study the problem of entropy estimation of a distribution by streaming over i.i.d. samples from it. We determine how bounded memory affects the number of samples we need to solve this problem. Please contact office for zoom link information
Read more

Societal Computing

Ingmar Weber Fachrichtung Informatik - Saarbrücken
01 Mar 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Technology and new communication tools have permeated all aspects of our lives. From finding a romantic partner to booking the next holiday - our choices and transactions are increasingly mediated by digital tools and online platforms, leaving behind digital traces. Simultaneously, earth observation satellites provide a real-time window into global changes, letting us observe anything from urbanization to garden parties during covid-19 lockdowns. This "digitization of everything" creates a wide range of opportunities as well as challenges by letting us monitor and interact with society at scales and speeds never seen before. ...
Technology and new communication tools have permeated all aspects of our lives. From finding a romantic partner to booking the next holiday - our choices and transactions are increasingly mediated by digital tools and online platforms, leaving behind digital traces. Simultaneously, earth observation satellites provide a real-time window into global changes, letting us observe anything from urbanization to garden parties during covid-19 lockdowns. This "digitization of everything" creates a wide range of opportunities as well as challenges by letting us monitor and interact with society at scales and speeds never seen before. Societal Computing tries to make use of these afforded opportunities by advancing computational research on both (i) computing _of_  society, i.e., measuring, describing, and understanding societal phenomena, and (ii) computing _for_ society, i.e., working with partners on digitally assisted interventions to address societal challenges.   In this talk, I’ll present examples of Societal Computing work done by myself and interdisciplinary collaborators in the domain of international development and humanitarian response. Apart from highlighting advances in data innovation, I’ll describe real-world ethical challenges related to fairness, such as "leaving behind" the most vulnerable during crisis response, and risks of abuse, such as mapping undocumented migrants. I’ll close by outlining plans for an Interdisciplinary Institute for Societal Computing (IDISC) with the goal of strengthening research collaborations between computer science and both the social sciences and humanities.
Read more

The Power of Feedback in a Cyber-Physical World

Dr. Anne-Kathrin Schmuck Max Planck Institute for Software Systems
28 Feb 2023, 9:30 am - 10:30 am
Kaiserslautern building G26, room 111
simultaneous videocast to Saarbrücken building E1 5, room 029
SWS Colloquium
Feedback allows systems to seamlessly and instantaneously adapt their behavior to their environment and is thereby the fundamental principle of life and technology -- it lets animals breathe, it stabilizes the climate, it allows airplanes to fly, and the energy grid to operate. During the last century, control technology excelled at using this power of feedback to engineer extremely stable, robust, and reliable technological systems.

With the ubiquity of computing devices in modern technological systems, ...
Feedback allows systems to seamlessly and instantaneously adapt their behavior to their environment and is thereby the fundamental principle of life and technology -- it lets animals breathe, it stabilizes the climate, it allows airplanes to fly, and the energy grid to operate. During the last century, control technology excelled at using this power of feedback to engineer extremely stable, robust, and reliable technological systems.

With the ubiquity of computing devices in modern technological systems, feedback loops become cyber-physical -- the laws of physics governing technological, social or biological processes interact with (cyber) computing systems in a highly nontrivial manner, pushing towards higher and higher levels of autonomy and self-regulation. While reliability of these systems remains of utmost importance, a fundamental understanding of cyber-physical feedback loops for large-scale CPS is lacking far behind.

In this talk I will discuss how a control-inspired view on formal methods for reliable software design enables us to utilize the power of feedback for robust and reliable self-adaptation in cyber-physical system design.

Please contact office team for link information.
Read more

Fusing AI and Formal Methods for Automated Synthesis

Priyanka Golia NUS, Singapore and IIT Kanpur
23 Feb 2023, 9:30 am - 10:30 am
Kaiserslautern building G26, room 111
SWS Colloquium
We entrust large parts of our daily lives to computer systems, which are becoming increasingly more complex. Developing scalable yet trustworthy techniques for designing and verifying such systems is an important problem. In this talk, our focus will be on automated synthesis, a technique that uses formal specifications to automatically generate systems (such as functions, programs, or circuits) that provably satisfy the requirements of the specification. I will introduce a state-of-the-art synthesis algorithm that leverages artificial intelligence to provide an initial guess for the system, ...
We entrust large parts of our daily lives to computer systems, which are becoming increasingly more complex. Developing scalable yet trustworthy techniques for designing and verifying such systems is an important problem. In this talk, our focus will be on automated synthesis, a technique that uses formal specifications to automatically generate systems (such as functions, programs, or circuits) that provably satisfy the requirements of the specification. I will introduce a state-of-the-art synthesis algorithm that leverages artificial intelligence to provide an initial guess for the system, and then uses formal methods to repair and verify the guess to synthesize probably correct system. I will conclude by exploring the potential for combining AI and formal methods to address real-world scenarios. Please contact the office team for link information.
Read more

Learning for Decision Making: A Tale of Complex Human Preferences

Leqi Liu Carnegie Mellon University
14 Feb 2023, 2:00 pm - 3:00 pm
Virtual talk
SWS Colloquium
Machine learning systems are deployed in diverse decision-making settings in service of stakeholders characterized by complex preferences. For example, in healthcare and finance, we ought to account for various levels of risk tolerance; and in personalized recommender systems, we face users whose preferences evolve dynamically over time. Building systems better aligned with stakeholder needs requires that we take the rich nature of human preferences into account. In this talk, I will give an overview of my research on the statistical and algorithmic foundations for building such human-centered machine learning systems. ...
Machine learning systems are deployed in diverse decision-making settings in service of stakeholders characterized by complex preferences. For example, in healthcare and finance, we ought to account for various levels of risk tolerance; and in personalized recommender systems, we face users whose preferences evolve dynamically over time. Building systems better aligned with stakeholder needs requires that we take the rich nature of human preferences into account. In this talk, I will give an overview of my research on the statistical and algorithmic foundations for building such human-centered machine learning systems. First, I will present a line of work that draws inspiration from the economics literature to develop learning algorithms that account for the risk preferences of stakeholders. Subsequently, I will discuss a line of work that draws insights from the psychology literature to develop online learning algorithms for personalized recommender systems that account for users’ evolving preferences. Please contact the office team for link information
Read more

Programmatic Interfaces for Design and Simulation

Aman Mathur Max Planck Institute for Software Systems
09 Feb 2023, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Although Computer Aided Design (CAD) and simulation tools have been around for quite some time, modern design and prototyping pipelines are challenging the limits of these tools. Advances in 3D printing have brought manufacturing capability to the general public. Moreover, advancements in machine learning and sensor technology are enabling enthusiasts and small companies to develop their own autonomous vehicles and machines. This means that many more users are designing (or customizing) 3D objects in CAD, and many are testing autonomous machines in simulation. ...
Although Computer Aided Design (CAD) and simulation tools have been around for quite some time, modern design and prototyping pipelines are challenging the limits of these tools. Advances in 3D printing have brought manufacturing capability to the general public. Moreover, advancements in machine learning and sensor technology are enabling enthusiasts and small companies to develop their own autonomous vehicles and machines. This means that many more users are designing (or customizing) 3D objects in CAD, and many are testing autonomous machines in simulation. Though Graphical User Interfaces (GUIs) are the de-facto standard for these tools, we find that these interfaces are not robust and flexible. For example, designs made using GUIs often break when customized, and setting up large simulations is quite tedious in GUI. Though programmatic interfaces do not suffer from these limitations, they are generally quite difficult to use, and often do not provide appropriate abstractions and language constructs.

In this thesis, we present our work on combining the ease of use of GUI, with the robustness and flexibility of programming. For CAD, we propose an interactive framework that automatically synthesizes robust programs from GUI-based design operations. Additionally, we apply program analysis to ensure customizations do not lead to invalid objects. Finally, for simulation, we propose a novel programmatic framework that simplifies the building of complex test environments, and a test generation mechanism that guarantees good coverage over test parameters.
Read more

Toward Deep Semantic Understanding: Event-Centric Multimodal Knowledge Acquisition

Manling Li University of Illinois Urbana Champaign
01 Feb 2023, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 111
SWS Colloquium
Please note that this is a virtual talk which will be video casted to Saarbrücken and Kaiserslautern.

Traditionally, multimodal information consumption has been entity-centric with a focus on concrete concepts (such as objects, object types, physical relations, e.g., a person in a car), but lacks ability to understand abstract semantics (such as events and semantic roles of objects, e.g., driver, passenger, mechanic). However, such event-centric semantics are the core knowledge communicated, regardless whether in the form of text, ...
Please note that this is a virtual talk which will be video casted to Saarbrücken and Kaiserslautern.

Traditionally, multimodal information consumption has been entity-centric with a focus on concrete concepts (such as objects, object types, physical relations, e.g., a person in a car), but lacks ability to understand abstract semantics (such as events and semantic roles of objects, e.g., driver, passenger, mechanic). However, such event-centric semantics are the core knowledge communicated, regardless whether in the form of text, images, videos, or other data modalities.

At the core of my research in Multimodal Information Extraction (IE) is to bring such deep semantic understanding ability to the multimodal world. My work opens up a new research direction Event-Centric Multimodal Knowledge Acquisition to transform traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. Such a transformation poses two significant challenges: (1) understanding multimodal semantic structures that are abstract (such as events and semantic roles of objects): I will present my solution of zero-shot cross-modal transfer (CLIP-Event), which is the first to model event semantic structures for vision-language pretraining, and supports zero-shot multimodal event extraction for the first time; (2) understanding long-horizon temporal dynamics: I will introduce Event Graph Model, which empowers machines to capture complex timelines, intertwined relations and multiple alternative outcomes. I will also show its positive results on long-standing open problems, such as timeline generation, meeting summarization, and question answering. Such Event-Centric Multimodal Knowledge starts the next generation of information access, which allows us to effectively access historical scenarios and reason about the future. I will lay out how I plan to grow a deep semantic understanding of language world and vision world, moving from concrete to abstract, from static to dynamic, and ultimately from perception to cognition. Please contact the Office team for Zoom link information.
Read more

Terabyte-Scale Genome Analysis for Underfunded Labs

Sven Rahmann MMCI; CISPA Helmholtz Center for Information Security;
01 Feb 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In 2023, you can get your personal genome sequenced for under 1000 Euros. If you do, you will obtain the data (50G to 100G) on a USB stick or hard drive. The data consists of many short strings (length 100 or 150) over the famous DNA alphabet {A,C,G,T}, for a total of 50 to 100 billion letters.

With this personal data, you may want to learn about your ancestry, or judge your personal risk of getting various conditions, ...
In 2023, you can get your personal genome sequenced for under 1000 Euros. If you do, you will obtain the data (50G to 100G) on a USB stick or hard drive. The data consists of many short strings (length 100 or 150) over the famous DNA alphabet {A,C,G,T}, for a total of 50 to 100 billion letters.

With this personal data, you may want to learn about your ancestry, or judge your personal risk of getting various conditions, such as myocardial infarction, stroke, breast cancer, and others. So you must look for certain (known) DNA variations in your individual genome that are known to be associated to certain population groups or diseases. As the raw data ("reads") consists of essentially random sub-strings of the genome, it is necessary to find the place of origin of each read in the genome, an error-tolerant pattern search task. In a medium-scale research study (say, for heart disease), we have similar tasks for a few hundred individual patients and healthy control persons, for a total of roughly 30-50 TB of data, delivered on a few USB hard drives. After primary analysis, the task is to find genetic variants (or, more coarsely, genes) related to the disease, i.e., we have a pattern mining problem with millions of features and a few hundred samples. The full workflow for such a study consists of more than 100_000 single steps, including simple per-sample steps (e.g., removing low-quality reads), and complex ones, involving statistical models across all samples for variant calling. Particularly in a medical setting, each step needs to be fully reproducible, we need to trace data provenance and maintain a chain of accountability. In the past ten years, we have worked and contributed to many aspects of variant-calling workflows and realized that the strategy to attack the ever-growing data with ever-growing compute clusters and storage systems will not scale well in the near future. Thus, our current work focuses on so-called alignment-free methods, which have the potential to yield the same answers as current state-of-the-art methods with 10 to 100 times less CPU work. I will present our recent advances in laying better foundations for alignment-free methods: engineered and optimized parallel hash tables for short DNA pieces (k-mers), and the design masks for gapped k-mers with optimal error tolerance. These new methods will enable even small labs to analyze large genomics datasets on a "good gaming PC", while investing less than 5000 Euros into computational hardware. I will also advertise our workflow language and execution engine "Snakemake", a combination of Make and Python that is now one of the most frequently used Bioinformatics workflow management tools, but actually not restricted to Bioinformatics research.
Read more