News – MPI SWS

Research Spotlight: Steering Policies in Multi-Agent Collaboration

October 2020

Ever since the birth of Artificial Intelligence (AI) at the Dartmouth workshop in 1956, researchers have debated about the exact role that AI will play, and should play, in society. While some have envisioned a romanticized version of AI, incorporated into the narratives of 20th century movies, successful AI developments are often closer to J. C. R. Licklider’s vision of AI, which puts an emphasis on a collaborative relationship between humans and AI, and focuses on hybrid human-AI decision making. ...

Ever since the birth of Artificial Intelligence (AI) at the Dartmouth workshop in 1956, researchers have debated about the exact role that AI will play, and should play, in society. While some have envisioned a romanticized version of AI, incorporated into the narratives of 20th century movies, successful AI developments are often closer to J. C. R. Licklider’s vision of AI, which puts an emphasis on a collaborative relationship between humans and AI, and focuses on hybrid human-AI decision making.

In the Multi-Agent Systems group at MPI-SWS, we study multi-agent sequential decision making using formal frameworks that can capture nuances often presented in human-AI collaborative settings. Specifically, we study different aspects of agent-to-agent interaction in settings where agents share a common goal, but can have different perceptions of reality. The overall goal is to design a more effective AI decision maker that accounts for the behavior of its collaborators, and compensates for their imperfections. To achieve this goal, the AI decision maker can use steering policies to nudge its collaborators to adopt better policies, i.e., policies that lead to an improved joint outcome. In what follows, we summarize some of our recent results related to this agenda.

Accounting for misaligned world-views. An effective way to model behavioral differences between humans and modern AI tools (based on machine learning) is through a model that captures the misalignment in how the agents perceive their environment. Using this approach, we have proposed a new computational model, called Multi-View Decision Process, suitable for modeling two-agent cooperative scenarios in which agents agree on their goals, but disagree on how their actions affect the state of the world [1]. This framework enables us to formally analyze the utility of accounting for the misalignment in agents’ world-views when only one of the agents has a correct model of the world. Our results show that modeling such a misalignment is not only beneficial, but critical. The main takeaway is that to facilitate a more successful collaboration among agents, it is not sufficient to make one agent (more) accurate in its world-view: naively improving the accuracy of one agent can degrade the joint performance unless one explicitly accounts for the imperfections of the other agent. To this end, we have developed an algorithm for finding an approximately optimal steering policy for the agent with the correct world-view.

Adapting to a non-stationary collaborator. In addition to accounting for a misalignment in world-views, decision makers must also account for the effects of their behavior on other agents. Namely, decision makers respond to each other's behavior, leading to behavior which is non-stationary and changes over time. In the context of human-AI collaboration, this might happen if the human agent changes their behavior over time, for example, as it learns to interact with the AI agent. Such non-stationary behavior of the human agent could have a negative impact on the collaboration, and can lead to a substantially worse performance unless the AI agent adapts to the changing behavior of the human agent. We can model this situation with a two-agent setting similar to the one presented above, but which allows agents to change their behavior as they interact over time [2]. The agent with the correct world-view now has to adapt to the non-stationary behavior of its collaborator. We have proposed a learning procedure that has provable guarantees on the joint performance under the assumption that the behavior of the other agent is not abruptly changing over time. We have shown that this assumption is not trivial to relax in that obtaining the same guarantees without this assumption would require solving a computationally intractable problem.

Steering via environment design. The previous two cases consider indirect steering policies for which the agent with the correct model implicitly influences the behavior of its collaborator by acting in the world. A more explicit influence would be obtained if the actions of this agent are directly changing the world-view of its collaborator. In the context of human-AI collaboration, the AI agent could shape the environment to nudge the human agent to adopt a more efficient decision policy. This can be done through reward shaping, i.e., by making some actions more costly for humans in terms of effort, or through dynamics shaping, i.e., by changing the perceived influence that the human’s actions have on the world. In the machine learning terminology, such a steering strategy is nothing else but a form of an adversarial attack of the AI agent (attacker) on the human agent. In our recent work [3], we have characterized how to optimally perform these types of attacks and how costly they are from an attacker’s point of view.

References:

[1] Dimitrakakis, C., Parkes, D.C., Radanovic, G. and Tylkin, P., 2017. Multi-view Decision Processes: The Helper-AI Problem. In Advances in Neural Information Processing Systems.

[2] Radanovic, G., Devidze, R., Parkes, D. and Singla, A., 2019. Learning to Collaborate in Markov Decision Processes. In International Conference on Machine Learning.

[3] Rakhsha, A., Radanovic, G., Devidze, R., Zhu, X. and Singla, A., 2020. Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning. In International Conference on Machine Learning.

Manuel Gomez-Rodriguez awarded ERC Starting Grant

September 2020

ERC Awards, Homepage News

Manuel Gomez-Rodriguez, head of the MPI-SWS Human-Centric Machine Learning group, has been awarded an ERC Starting Grant. Over the next five years, his project "Human-Centric Machine Learning" will receive 1.49 million euros, which will allow the group to develop the foundations of human-centric machine learning.

In the most recent round for Starting Grants, over 3300 research proposals were submitted to the ERC. The sole selection criterion is scientific excellence. This year, less than 14% of all ERC Starting Grant applicants across all scientific disciplines received the award, ...

Manuel Gomez-Rodriguez, head of the MPI-SWS Human-Centric Machine Learning group, has been awarded an ERC Starting Grant. Over the next five years, his project "Human-Centric Machine Learning" will receive 1.49 million euros, which will allow the group to develop the foundations of human-centric machine learning.

In the most recent round for Starting Grants, over 3300 research proposals were submitted to the ERC. The sole selection criterion is scientific excellence. This year, less than 14% of all ERC Starting Grant applicants across all scientific disciplines received the award, with only 20 awardees in Computer Science across all of Europe!

Summary of the HumanML project proposal

With the advent of mass-scale digitization of information and virtually limitless computational power, an increasing number of social, information and cyber-physical systems evaluate, support or even replace human decisions using machine learning models and algorithms. Machine learning models and algorithms have been traditionally designed to take decisions autonomously, without human intervention, on the basis of passively collected data. However, in most social, information and cyber-physical systems, algorithmic and human decisions feed on and influence each other. As these decisions become more consequential to individuals and society, machine learning models and algorithms have been blamed for playing a major role in an increasing number of missteps, from discriminating against minorities, causing car accidents and increasing polarization to misleading people in social media.

In this project, we will develop human-centric machine learning models and algorithms for evaluating, supporting and enhancing decision-making processes where algorithmic and human decisions feed on and influence each other. These models and algorithms will account for the feedback loop between algorithmic and human decisions, which currently perpetuates or even amplifies biases and inequalities, and they will learn to operate under different automation levels. Moreover, they will anticipate how individuals will react to their algorithmic decisions, often strategically, to receive beneficial decisions and they will provide actionable insights about their algorithmic decisions. Finally, we will perform observational and interventional experiments as well as realistic simulations to evaluate their effectiveness in a wide range of applications, from content moderation, recidivism prediction, and credit scoring to medical diagnosis and autonomous driving.

Two MPI-SWS papers accepted at ICML 2020

July 2020

The following two MPI-SWS papers have been accepted to ICML 2020, one of the flagship conferences in machine learning:

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning by Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla.

Adaptive Reward-Poisoning Attacks against Reinforcement Learning by Xuezhou Zhang, Yuzhe Ma, Adish Singla, Xiaojin Zhu.

Redmiles' research on ethical adoption of COVID19 apps gains international media attention

July 2020

Research in the News

Research by MPI-SWS faculty member Elissa Redmiles and collaborators at Microsoft Research, the University of Zurich, the University of Maryland and Johns Hopkins University was featured in the New York Times, Scientific American (article 1, article 2), Wired (article 1, article 2), STAT News, and other venues.

The articles cover two papers: (1) Redmiles' paper in ACM Digital Government: Research and Practice proposing a framework and empirical validation through a large-scale survey of the attributes of COVID19 apps that may compel users to adopt them, ...

Research by MPI-SWS faculty member Elissa Redmiles and collaborators at Microsoft Research, the University of Zurich, the University of Maryland and Johns Hopkins University was featured in the New York Times, Scientific American (article 1, article 2), Wired (article 1, article 2), STAT News, and other venues.

The articles cover two papers: (1) Redmiles' paper in ACM Digital Government: Research and Practice proposing a framework and empirical validation through a large-scale survey of the attributes of COVID19 apps that may compel users to adopt them, such as the benefits of the apps both to individual users and to their community, the accuracy with which they detect exposures, potential privacy leaks, and the costs of using the apps; and (2) a preprint paper by Redmiles and her collaborators that develops predictive models of COVID19 app adoption based on an app's level of accuracy and privacy protection.

These works are part of a larger project Redmiles leads on ethical adoption of COVID 19 apps: https://covidadoptionproject.mpi-sws.org/.

Isabel Valera becomes full professor at Saarland University

April 2020

Alumni News

Isabel Valera, a postdoc alumni of the Human-Centric Machine Learning group, has become full professor in the Department of Computer Science at Saarland University. Congratulations Isabel!

Isabel's research focuses on developing machine learning methods that are flexible, robust, interpretable and fair. Her research can be applied in a broad range of fields, from medicine and psychiatry to social and communication systems. You can find out more about her work at https://ivaleram.github.io/.

Three MPI-SWS papers accepted at AAAI 2020

February 2020

New publications

The following three MPI-SWS papers have been accepted to AAAI 2020, one of the flagship conferences in artificial intelligence:

Incremental Fairness in Two-Sided Market Platforms: On Smoothly Updating Recommendations by Gourab K. Patro, Abhijnan Chakraborty, Niloy Ganguly, Krishna P. Gummadi.

Regression Under Human Assistance by Abir De, Paramita Koley, Niloy Ganguly, Manuel Gomez-Rodriguez.

The Effectiveness of Peer Prediction in Long-Term Forecasting by Debmalya Mandal, Goran Radanovic, David C. Parkes.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

News 2020