News: Research Spotlight

Research Spotlight: Tracing the Behavior of Cloud Applications

April 25, 2019

Consider the everyday websites and apps that we use: online shops, news websites, search engines, social networks, navigation apps, instant messaging apps, and many more.  Most of these programs don't just run in isolation on our laptops or phones, but instead connect over the internet to backends and databases running in datacenters across the world.  These backends perform a wide range of tasks, including constructing your personalized social network feed, storing and retrieving comments on message boards,

Consider the everyday websites and apps that we use: online shops, news websites, search engines, social networks, navigation apps, instant messaging apps, and many more.  Most of these programs don't just run in isolation on our laptops or phones, but instead connect over the internet to backends and databases running in datacenters across the world.  These backends perform a wide range of tasks, including constructing your personalized social network feed, storing and retrieving comments on message boards, and calculating results for your search query.  From our perspective as users, the actions we perform are simple, such as opening the app and loading our personalized profile.  But under the hood, each action usually results in complex processing across many processes and machines in a datacenter.

It has never been easier to write and deploy complex programs like these.  Cloud computing companies who own datacenters (such as Google, Amazon, and Microsoft) will gladly rent out computer services at a touch of a button, on demand.  Using designs like microservices, it is easy for programmers to construct complex programs out of smaller, simpler building blocks.  There are frameworks and open-source software packages to help developers construct big applications out of small pieces, to spread those pieces out over multiple machines in a datacenter, and to have the pieces communicate and interact with each other over the network.

Problems show up when software goes live.  Compared to developing and deploying the software, it is much harder to make sure everything goes smoothly when the software is up and running.  Distributed computer programs have lots of moving pieces, and there are lots of opportunities for things to go wrong.  For example, if one machine in the datacenter has a hardware problem, or the code is buggy, or too many people are trying to access it at once, the effects can be wide-ranging.  It can create a butterfly effect of problems, which we term cascading failures, that can lead to the app or website as a whole becoming slow, or going down entirely.  It's hard for programmers to get to the bottom of these kinds of problems, because there's no single machine or process doing all the work.  A problem that occurs on one machine might manifest as strange symptoms on a different machine later on.  Figuring out the root cause of a problem is challenging, as is anticipating problems in the first place.  Even big internet companies like Facebook and Google experience problems like this today.

These kinds of problems motivate the research of the Cloud Software Systems Research Group at the Max Planck Institute for Software Systems.  We research ways for operators to understand what's going on in their live distributed system, to troubleshoot problems when they occur at runtime, and to design systems that proactively avoid problems.  One approach we take is to design distributed tracing tools that can be used by the system operators.  The goal of distributed tracing is to record information about what a program does while it's running.  The tools record events, metrics, and performance counters, which together expose the current state and performance of the system, and how it changes over time.  A key additional step taken by distributed tracing tools is to record the causal ordering of events happening in the system — that is, the interactions and dependencies between machines and processes.  Causal ordering is is very useful for diagnosing problems that span multiple processes and machines, especially when there might be lots of concurrent, unrelated activity going on at the same time.  It lets us reconstruct the end-to-end execution paths of requests, across all components and machines, and then reason about the sequence of conditions and events that led up to a problem.  Without causal ordering, this information is missing, and pinpointing the root cause of a problem would be like searching for a needle in a haystack.

The Cloud Software Systems Research Group has looked at a number of challenges in making distributed tracing tools efficient, scalable, and more widely deployable. In our recent work, we have thought about how you can efficiently insert instrumentation to record entirely new information, into an already-running system, without having to rebuild or restart the system [1].  We have looked at problems in dealing with the large volume of data generated by distributed tracing tools, and deciding which data is most valuable to keep if there's not enough room to keep it all [2].  We have also considered the implications of distributed tracing at extremely large scale, and how to efficiently collect, aggregate, and process tracing data in real-time [3].

In our ongoing work, we are investigating ways for the data recorded by tracing tools to feed back in to decisions made by datacenter infrastructure, such as resource management, scheduling, and load balancing.  We are also considering new challenges that arise in scalable data analysis: how do you analyze large datasets of traces and derive insights about aggregate system behavior?  One approach we are exploring uses techniques in representational machine learning, to transform richly annotated tracing data into a more tractable form for interactive analysis.  More broadly, our group investigates a variety of approaches besides just distributed tracing tools, including ways to better design and develop the distributed systems in the first place.  Ultimately, our goal is to make modern cloud systems easier to operate, understand, and diagnose.

References

[1] Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca.  Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems.  In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP '15), 2015.

[2] Pedro Las-Casas, Jonathan Mace, Dorgival Guedes, and Rodrigo Fonseca.  Weighted Sampling of Execution Traces: Capturing More Needles and Less Hay.  In Proceedings of the 9th ACM Symposium on Cloud Computing (SoCC'18), 2018.

[3] Jonathan Kaldor, Jonathan Mace, Michał Bejda, Edison Gao, Wiktor Kuropatwa, Joe O'Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, Yee Jiun Song.  Canopy: An End-to-End Performance Tracing And Analysis System. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP '17), 2017.

Read more

Research Spotlight: A General Data Anonymity Measure

January 30, 2019

A long-standing problem both within research and in society generally is that of how to analyze data about people without risking the privacy of those people. There is an ever-growing amount of data about people: medical, financial, social, government, geo-location, etc. This data is very valuable in terms of better understanding ourselves. Unfortunately, analyzing the data in its raw form carries the risk of exposing private information about people.

The problem of how to analyze data while protecting privacy has been around for more than 40 years---ever since the first data processing systems were developed.

A long-standing problem both within research and in society generally is that of how to analyze data about people without risking the privacy of those people. There is an ever-growing amount of data about people: medical, financial, social, government, geo-location, etc. This data is very valuable in terms of better understanding ourselves. Unfortunately, analyzing the data in its raw form carries the risk of exposing private information about people.

The problem of how to analyze data while protecting privacy has been around for more than 40 years---ever since the first data processing systems were developed. Most workable solutions are ad hoc: practitioners try things like removing personally identifying information (e.g. names and addresses), aggregating data, removing outlying data, and even swapping some data between individuals. This process can work reasonably well, but it is time-consuming, requires substantial expertise to get right, and invariably limits the accuracy of the analysis or the types of analysis that can be done.

A holy grail within computer science is to come up with an anonymization system that has formal guarantees of anonymity and provides good analytics. Most effort in this direction has focused on two ideas, K-anonymity and Differential Privacy. Both can provide strong anonymity, but except in rare cases neither can do so while still providing adequate analytics. As a result, common practice is still to use informal ad hoc techniques with weak anonymization, and to mitigate risk by for instance sharing data only with trusted partners.

The European Union has raised the stakes with the General Data Protection Regulation (GDPR). The GDPR has strict rules on how personal data may be used, and threatens large fines to organizations that do not follow the rules. However, GDPR says that if data is anonymous, then it is not considered personal and does not fall under the rules. Unfortunately, there are no precise guidelines on how to determine if data is anonymous or not. Member states are expected to come up with certification programs for anonymity, but do not know how to do so.

This is where we come in. Paul Francis' group, in research partnership with the startup Aircloak, has been developing an anonymizing technology called Diffix over the last five years. Diffix is an empirical, not a formal technology, and so the question remains "how anonymous is Diffix?" While it may not be possible to precisely answer that question, one way we try to answer that question is through a bounty program: we pay attackers who can demonstrate how to compromise anonymity in our system. Last year we ran the first (and still only) bounty program for anonymity. The program was successful in that some attacks were found, and in the process of designing defensive measures, Diffix has improved.

In order to run the bounty program, we naturally needed a measure of anonymity so that we could decide how much to pay attackers. We designed a measure based on how much confidence an attacker has in a prediction of an individual's data values, among other things. At some point, we realized that our measure applies not just to attacks on Diffix, but to any anonymization system. We also realized that our measure might be useful in the design of certification programs for anonymity.

We decided to develop a general score for anonymity, and to build tools that would allow anyone to apply the measure to any anonymization technology. The score is called the GDA Score, for General Data Anonymity Score.

The primary strength of the GDA Score is that it can be applied to any anonymization method, and therefore is apples-to-apples. The primary weakness is that it is based on empirical attacks (real attacks against real systems), and therefore the score is only as good as the attacks themselves. If there are unknown attacks on a system, then the score won't reflect this and may therefore make a system look more anonymous than it is.

Our hope is that over time enough attacks will be developed that we can have high confidence in the GDA Score. Towards that end, we've started the Open GDA Score Project. This is a community effort to provide software and databases in support of developing new attacks, and a repository where the scores can be viewed. We recently launched the project in the form of a website, www.gda-score.org, and some initial tools and simple attacks. We will continue to develop tools and new attacks, but our goal is to attract broad participation from the community.

For more information, visit www.gda-score.org.

Read more

Research Spotlight: Learning to interact with learning agents

September 26, 2018

Many real-world systems involve repeatedly making decisions under uncertainty—for instance, choosing one of the several products to recommend to a user in an online recommendation service, or dynamically allocating resources among available stock options in a financial market. Machine learning (ML) algorithms driving these systems typically operate under the assumption that they are interacting with static components, e.g., users' preferences are fixed, trading tools providing stock recommendations are static, and data distributions are stationary. This assumption is often violated in modern systems,

Many real-world systems involve repeatedly making decisions under uncertainty—for instance, choosing one of the several products to recommend to a user in an online recommendation service, or dynamically allocating resources among available stock options in a financial market. Machine learning (ML) algorithms driving these systems typically operate under the assumption that they are interacting with static components, e.g., users' preferences are fixed, trading tools providing stock recommendations are static, and data distributions are stationary. This assumption is often violated in modern systems, as these algorithms are increasingly interacting with and seeking information from learning agents including people, robots, and adaptive adversaries. Consequently, many well-studied ML frameworks and algorithmic techniques fail to provide desirable theoretical guarantees—for instance, algorithms might converge to a sub-optimal solution or fail arbitrarily bad in these settings.

Researchers at the Machine Teaching Group, MPI-SWS are designing novel ML algorithms that have to interact with agents that are adaptive or learning over time, especially in situations when the algorithm's decisions directly affect the state dynamics of these agents. In recent work [1], they have studied the above-mentioned problem in the context of two fundamental machine learning frameworks: (i) online learning using experts' advice and (ii) active learning using labeling oracles. In particular, they consider a setting where experts/oracles themselves are learning agents. For instance, active learning algorithms typically query labels from an oracle, e.g., a (possibly noisy) domain expert; however, in emerging crowd-powered systems, these experts are getting replaced by inexpert participants who could themselves be learning over time (e.g., volunteers in citizen science projects). They have shown that when these experts/oracles themselves are learning agents, well-studied algorithms (like the EXP3 algorithm) fail to converge to the optimal solution and can have arbitrarily bad performance for this new problem setting. Furthermore, they provide an impossibility result showing that without sharing any information across experts, it is impossible to achieve convergence guarantees. This calls for developing novel algorithms with practical ways of coordination between the central algorithm and learning agents to achieve desired guarantees.

Currently, researchers at the Machine Teaching Group are studying these challenges in the context of designing next-generation human-AI collaborative systems. As a concrete application setting, consider a car driving scenario where the goal is to develop an assistive AI agent to drive the car in an auto-pilot mode, but giving control back to the human driver in safety-critical situations. They study this setting by casting it as a multi-agent reinforcement learning problem. When the human agent has a stationary policy (i.e., the actions take by the human driver in different states/scenarios are fixed), it is trivial to learn an optimal policy for the AI agent that maximizes the overall performance of this collaborative system. However, in real-life settings where a human driver would adapt their behavior in response to the presence of an auto-pilot mode, they show that the problem of learning an optimal policy for the AI agent becomes computationally intractable. This work is one of the recent additions to an expanding set of results and algorithmic techniques developed by MPI-SWS researchers in the nascent area of Machine Teaching [2, 3].

References

[1] Adish Singla, Hamed Hassani, and Andreas Krause. Learning to Interact with Learning Agents. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), 2018.

[2] Xiaojin Zhu, Adish Singla, Sandra Zilles, and Anna N. Rafferty. An Overview of Machine Teaching. arXiv 1801.05927, 2018.

[3] Maya Cakmak, Anna N. Rafferty, Adish Singla, Xiaojin Zhu, and Sandra Zilles. Workshop on Teaching Machines, Robots, and Humans. NIPS 2017.

Read more

Research Spotlight: From Newton to Turing to cyber-physical systems

February 13, 2018

In 1937, a young Englishman by the name of Alan M. Turing published a paper with the obscure title "On computable numbers, with an application to the Entscheidungsproblem'' in the Proceedings of the London Mathematical Society. In doing so, he arguably laid the mathematical foundations of modern computer science. Turing's seminal contribution was to show that the famous Entscheidungsproblem, formulated by the great German mathematician David Hilbert several years earlier,

In 1937, a young Englishman by the name of Alan M. Turing published a paper with the obscure title "On computable numbers, with an application to the Entscheidungsproblem'' in the Proceedings of the London Mathematical Society. In doing so, he arguably laid the mathematical foundations of modern computer science. Turing's seminal contribution was to show that the famous Entscheidungsproblem, formulated by the great German mathematician David Hilbert several years earlier, could not be solved: more precisely, Turing proved (in modern parlance) that the problem of determining whether a given computer program halts could not be done algorithmically---in other words that the famous Halting Problem is undecidable.

Although seemingly at the time a rather esoteric concern, the Halting Problem (and related questions) have dramatically gained in importance and relevance in more contemporary times. Fast forward to the 21st Century: nowadays, it is widely acknowledged that enabling engineers, programmers, and researchers to automatically verify and certify the correctness of the computer systems that they design is one of the Grand Challenges of computer science. In increasingly many instances, it is absolutely critical that the software governing various aspects of our daily lives (such as that running on an aircraft controller, for example) behave exactly as intended, lest catastrophic consequences ensue.

What classes of infinite-state programs can be analyzed algorithmically?

Researchers at the Foundations of Algorithmic Verification group are investigating what classes of infinite-state programs can, at least in principle, be fully handled and analyzed algorithmically by viewing computer programs abstractly as dynamical systems, and they seek to design exact algorithms enabling one to fully analyse the behaviour of such systems. In particular, they are presently tackling a range of central algorithmic problems from verification, synthesis, performance, and control for linear dynamical systems, drawing among others on tools from number theory, Diophantine geometry, and algebraic geometry, with the overarching goal of offering a systematic exact computational treatment of various important classes of dynamical systems and other fundamental models used in mathematics, computer science, and the quantitative sciences. Some of their achievements include several decidability and hardness results for linear recurrence sequences, which can be used to model simple loops in computer programs, answering a number of longstanding open questions in the mathematics and computer science literature.

In a series of recent papers [1, 2],  they have attacked the so-called Zero Problem for linear differential equations, i.e., the question of determining algorithmically whether the unique solution to a given linear differential equation has a zero or not. Such equations, which go back as far as Newton, are ubiquitous in mathematics, physics, and engineering; they are also particularly useful to model cyber-physical systems, i.e., digital systems that evolve in and interact with a continuous environment. In their work, they obtained several important partial results: if one is interested in the existence of a zero over a bounded time interval, then it is possible to determine this algorithmically, provided that a certain hypothesis from the mathematical field of number theory, known as Schanuel's Conjecture, is true. They were also able to partially account for the fact that the Zero Problem has hitherto remained open in full generality: indeed, if one were able to solve it in dimension 9 (or higher), then in turn this would enable one to solve various longstanding hard open problems from a field of mathematics known as Diophantine approximation. In doing so, they therefore exhibited surprising and unexpected connections between the modelling and analysis of cyber-physical systems and seemingly completely unrelated deep mathematical theories dealing with questions about whole numbers.

References

[1] Ventsislav Chonev, Joel Ouaknine, and James Worrell. On recurrent reachability for continuous linear dynamical systems. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 2016.

[2] Ventsislav Chonev, Joel Ouaknine, and James Worrell. On the Skolem Problem for continuous linear dynamical systems. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP), 2016.

Read more

Research Spotlight: Teaching machine learning algorithms to be fair

December 7, 2017

Machine learning algorithms are increasingly being used to automate decision making in several domains such as hiring, lending and crime-risk prediction. These algorithms have shown significant promise in leveraging large or “big” training datasets to achieve high prediction accuracy, sometimes surpassing even human accuracy.

Unfortunately, some recent investigations have shown that machine learning algorithms can also lead to unfair outcomes. For example, a recent ProPublica study found that COMPAS,

Machine learning algorithms are increasingly being used to automate decision making in several domains such as hiring, lending and crime-risk prediction. These algorithms have shown significant promise in leveraging large or “big” training datasets to achieve high prediction accuracy, sometimes surpassing even human accuracy.

Unfortunately, some recent investigations have shown that machine learning algorithms can also lead to unfair outcomes. For example, a recent ProPublica study found that COMPAS, a tool used in US courtrooms for assisting judges with crime risk prediction, was unfair towards black defendants. In fact, several studies from governments, regulatory authorities, researchers as well as civil rights groups have raised concerns about machine learning potentially acting as a tool for perpetuating existing unfair practices in society, and worse, introducing new kinds of unfairness in prediction tasks. As a consequence, a flurry of recent research has focused on defining and implementing appropriate computational notions of fairness for machine learning algorithms.

Parity-based fairness

Existing computational notions of fairness in the machine learning literature are largely inspired by the concept of discrimination in social sciences and law. These notions require the decision outcomes to ensure parity (i.e. equality) in treatment and in impact.

Notions based on parity in treatment require that the decision algorithm should not take into account the sensitive feature information (e.g., gender, race) of a user. Notions based on parity in impact require that the decision algorithm should give beneficial decision outcomes (e.g., granting a loan) to similar percentages of people from all sensitive feature groups (e.g., men, women).

However, in many cases, these existing notions are too stringent and can lead to unexpected side effects. For example, ensuring parity has been shown to lead to significant reductions in prediction accuracy. Parity may also lead to scenarios where none of the groups involved in decision making (e.g., neither men nor women) get beneficial outcomes. In other words, these scenarios might be preferred neither by the decision maker using the algorithm (due to diminished accuracy), nor by the groups involved (due to very little benefits).

User preferences and fairness

In recent work, to appear at NIPS 2017, researchers at MPI-SWS have introduced two new computational notions of algorithmic fairness: preferred treatment and preferred impact. These notions are inspired by ideas related to envy-freeness and bargaining problem in economics and game theory. Preferred treatment and preferred impact leverage these ideas to build more accurate solutions that are preferable for both the decision maker and the user groups.

The new notion of preferred treatment allows basing the decisions on sensitive feature information (thereby relaxing the parity treatment criterion) as long as the decision outcomes do not lead to envy. That is, each group of users prefers their own group membership over other groups and does not feel that presenting itself to the algorithm as another group would have led to better outcomes for the group.

The new notion of preferred impact allows differences in beneficial outcome rates for different groups (thereby relaxing the parity impact criterion) as long as all the groups get more beneficial outcomes than what they would have received under the parity impact criterion.

In their work, MPI-SWS researchers have developed a technique to ensure machine learning algorithms satisfy preferred treatment and / or preferred impact. They also tested their technique by designing crime-predicting machine-learning algorithms that satisfy the above-mentioned notions. In their experiments, they show that preference-based fairness notions can provide significant gains in overall decision-making accuracy as compared to parity-based fairness, while simultaneously increasing the beneficial outcomes for the groups involved.

This work is one of the most recent additions to an expanding set of techniques developed by MPI-SWS researchers to enable fairness, accountability and interpretability of machine learning algorithms.

References

Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna Gummadi and Adrian Weller. From Parity to Preference: Learning with Cost-effective Notions of Fairness. Neural Information Processing Systems (NIPS), Long Beach (CA, USA), December 2017

Read more