Events

Upcoming events

Terabyte-Scale Genome Analysis for Underfunded Labs

Sven Rahmann MMCI; CISPA Helmholtz Center for Information Security;
01 Feb 2023, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In 2023, you can get your personal genome sequenced for under 1000 Euros. If you do, you will obtain the data (50G to 100G) on a USB stick or hard drive. The data consists of many short strings (length 100 or 150) over the famous DNA alphabet {A,C,G,T}, for a total of 50 to 100 billion letters.

With this personal data, you may want to learn about your ancestry, or judge your personal risk of getting various conditions, ...
In 2023, you can get your personal genome sequenced for under 1000 Euros. If you do, you will obtain the data (50G to 100G) on a USB stick or hard drive. The data consists of many short strings (length 100 or 150) over the famous DNA alphabet {A,C,G,T}, for a total of 50 to 100 billion letters.

With this personal data, you may want to learn about your ancestry, or judge your personal risk of getting various conditions, such as myocardial infarction, stroke, breast cancer, and others. So you must look for certain (known) DNA variations in your individual genome that are known to be associated to certain population groups or diseases. As the raw data ("reads") consists of essentially random sub-strings of the genome, it is necessary to find the place of origin of each read in the genome, an error-tolerant pattern search task. In a medium-scale research study (say, for heart disease), we have similar tasks for a few hundred individual patients and healthy control persons, for a total of roughly 30-50 TB of data, delivered on a few USB hard drives. After primary analysis, the task is to find genetic variants (or, more coarsely, genes) related to the disease, i.e., we have a pattern mining problem with millions of features and a few hundred samples. The full workflow for such a study consists of more than 100_000 single steps, including simple per-sample steps (e.g., removing low-quality reads), and complex ones, involving statistical models across all samples for variant calling. Particularly in a medical setting, each step needs to be fully reproducible, we need to trace data provenance and maintain a chain of accountability. In the past ten years, we have worked and contributed to many aspects of variant-calling workflows and realized that the strategy to attack the ever-growing data with ever-growing compute clusters and storage systems will not scale well in the near future. Thus, our current work focuses on so-called alignment-free methods, which have the potential to yield the same answers as current state-of-the-art methods with 10 to 100 times less CPU work. I will present our recent advances in laying better foundations for alignment-free methods: engineered and optimized parallel hash tables for short DNA pieces (k-mers), and the design masks for gapped k-mers with optimal error tolerance. These new methods will enable even small labs to analyze large genomics datasets on a "good gaming PC", while investing less than 5000 Euros into computational hardware. I will also advertise our workflow language and execution engine "Snakemake", a combination of Make and Python that is now one of the most frequently used Bioinformatics workflow management tools, but actually not restricted to Bioinformatics research.
Read more

Recent events

Formal Controller Synthesis for Dynamical Systems: Decidability and Scalability

Mahmoud Salamati Max Planck Institute for Software Systems
14 Dec 2022, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 113
SWS Student Defense Talks - Thesis Proposal
In cyber-physical systems research an important challenge is the synthesis of reliable controllers with respect to a general temporal specification. The synthesized controller must provide formal guarantees against different sources of uncertainty, such as measurement noise and mismatch between the dynamics of the physical system and its model, etc. By synthesizing correct-by-construction controllers for complex dynamical systems, we can enable a large number of exciting applications in various domains, including autonomous vehicle industry, energy systems and healthcare. ...
In cyber-physical systems research an important challenge is the synthesis of reliable controllers with respect to a general temporal specification. The synthesized controller must provide formal guarantees against different sources of uncertainty, such as measurement noise and mismatch between the dynamics of the physical system and its model, etc. By synthesizing correct-by-construction controllers for complex dynamical systems, we can enable a large number of exciting applications in various domains, including autonomous vehicle industry, energy systems and healthcare. In this thesis, we plan to study controller synthesis for several different classes of dynamical systems. If it is not possible to give a complete and sound synthesis algorithm, we will try to propose a sound but scalable technique. Also, for some special classes of dynamics, we provide sound and complete decision procedures. First, we consider continuous dynamical systems with bounded disturbance. The underlying dynamics for every continuous dynamical system can—in the bounded adversarial setting—be modeled by a (non-linear) differential inclusion system, provided that a bound over the range of model uncertainties is known. A common approach to tackle the continuous nature of the state space is to use abstraction-based controller design (ABCD) schemes. The controller designed by the ABCD scheme is described as being formal due to the guarantees on satisfaction of the specification by the original system in closed loop with the designed controller. In the first part of the thesis, we present methods to improve applicability of ABCD, by proposing (1) a data-driven scheme for relaxing the requirement of having analytical model, (2) a neural abstraction method to lower memory requirements of both synthesis and deployment, and (3) a scalable method for solving reach-avoid problems for multi-agent systems. Second, we study continuous-time Markov decision processes (CTMDPs), which are a widely used model for continuous-time stochastic systems. A fundamental problem in the analysis of CTMDPs is time-bounded reachability, which asks whether we can synthesize a control policy with which the probability of reaching a target state within a finite horizon is greater than a given threshold. Time-bounded reachability is the core technical problem for model checking stochastic temporal logics such as Continuous Stochastic Logic, and having efficient implementations of time-bounded reachability is crucial to scaling formal analysis of CTMDPs. Existing work either considers time-abstract policies or focuses on numerical approximation. Despite the importance of this problem, its decidability is yet open. Moreover, the existing discretization-based approximation methods are not scalable for CTMDPs with a large number of states. In the second part of the thesis, we study the time-bounded reachability problem for CTMDPs, and propose (1) a conditional decidability result, and (2) a systematic method for improving scalability of numerical approximation methods. Finally, we study the class of linear dynamical systems, which are fundamental models in many different domains of science and engineering. In the third part of this thesis proposal, we consider several reachability-related problems for linear dynamical systems, and propose (1) a hardness result for point-to-point reachability of linear dynamical systems with hyper-rectangular control sets, (2) decidability of pseudo-reachability for hyperplane target sets, and (3) decidability of pseudo-reachability for semi-algebraic target sets
Read more

Let's play! - Solving controller synthesis games for cyber-physical system design

Anne Schmuck Max Planck Institute for Software Systems
07 Dec 2022, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Cyber-Physical Systems (CPS) are technical systems where a large software stack orchestrates the interaction of physical and digital components. Such systems are omnipresent in our daily life and their correct behavior is crucial. However, developing safe, reliable and performant CPS is challenging. A promising research direction towards this goal is the combination of formal methods from computer science and controller synthesis techniques from automation.

In my talk I will discuss how infinite two-player games over finite graphs, ...
Cyber-Physical Systems (CPS) are technical systems where a large software stack orchestrates the interaction of physical and digital components. Such systems are omnipresent in our daily life and their correct behavior is crucial. However, developing safe, reliable and performant CPS is challenging. A promising research direction towards this goal is the combination of formal methods from computer science and controller synthesis techniques from automation.

In my talk I will discuss how infinite two-player games over finite graphs, originating from the formal methods community, can be utilized and enhanced for higher layer control of CPS. In particular, I will discuss how the use of environment assumptions - used to model particularities of the system under control within these games - has to be rethought in order to effectively solve controller synthesis tasks for CPS.
Read more

Enforcing Stack Safety on a Capability Machine

Aïna Linn Georges Aarhus University
24 Nov 2022, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 005
simultaneous videocast to Kaiserslautern building G26, room 207 / Meeting ID: -
SWS Colloquium
Memory safety is a major source of vulnerabilities in computer systems. Capability machines are a type of CPUs that support fine-grained privilege separation using capabilities, machine words that include forms of authority. Over the last decade, CHERI, a family of capability machines, has matured into an extensive design featuring, among other, a full UNIX-style operating system, CheriBSD. Building on ideas from CHERI, capability machines are even becoming a reality in industry; the Arm Morello program is a research program led by Arm to create a prototype system on chip with capabilities. ...
Memory safety is a major source of vulnerabilities in computer systems. Capability machines are a type of CPUs that support fine-grained privilege separation using capabilities, machine words that include forms of authority. Over the last decade, CHERI, a family of capability machines, has matured into an extensive design featuring, among other, a full UNIX-style operating system, CheriBSD. Building on ideas from CHERI, capability machines are even becoming a reality in industry; the Arm Morello program is a research program led by Arm to create a prototype system on chip with capabilities. One of the promises of capability machines is that they can enforce security properties that we expect from high-level languages, in particular stack safety, even when machine code is linked with other untrusted and possibly adversarial machine code. In this talk, I will discuss what it takes to realise that promise in practice. Since stack safety properties can be quite subtle, it is crucial to formally reason about the enforcement mechanisms enabled by capabilities. This is a complex task that involves reasoning about the interaction between known code, and unknown untrusted code. We use Iris to formally reason about the deep semantic properties of capability machines.
Read more

Archive