I'm a lecturer at the Department of Computer Science, the University of Oxford. I work primarily in computational game theory, multi-agent systems, and AI. My research focuses on understanding and shaping interactions among intelligent agents in complex real-world scenarios. We explore the power of computing to address these challenges and solve the underlying algorithmic âpuzzlesâ. Through this, we develop algorithmic tools to guide agents toward broader organisational and societal goals. My work bridges theory and practice, addressing issues such as fairness, sustainability and safety in multi-agent interactions.
đ I'm looking for motivated PhD students. Feel free to get in touch if you're interested in my research areas! (See more about PhD at Oxford here.) For any application-related inquiries, please directly contact me (jiarui.gan@cs.ox.ac.uk) or the Department of Computer Science.
Lecturer @Oxford ◄ Postdoc @MPI-SWS (host: Rupak Majumdar) ◄ PhD @Oxford (supervisors: Edith Elkind & Michael Wooldridge)
Insights from well-established frameworksâsuch as principal-agency and dynamic mechanism designâprovide blueprints for architecting and optimising multi-agent systems. By coupling these frameworks with computational models, we create automated solutions that ensure advanced AI agentsâwhether in transportation, digital platforms, or broader societal ecosystemsâremain reliably aligned with organisational objectives. Our work focuses on establishing the computational foundations necessary for these missions, developing systematic approaches to guide multi-agent systems toward beneficial and equitable outcomes.
We study how to design incentive mechanisms that effectively coordinate autonomous agents towards organisational objectives, drawing on principal-agency ideas from economics. By framing these design tasks as computational problems, we aim to develop algorithms capable of computing optimal mechanisms. This approach allows advanced computing machineries to be leveraged to generate tangible solutions, especially in complex or large-scale settings where manually crafted solutions are infeasible or insufficient. Our recent work focuses on two key directions: 1) developing universal computational frameworks for principal-agency to illuminate the shared algorithmic structures underlying diverse principal-agent scenarios, and 2) extending these frameworks to address the more intricate and dynamic settings where interactions between agents unfold over multiple stages. Our overarching goal is to discover fundamental computational theories and algorithmic tools that are broadly applicable to principal-agent problems across a wide range of domains.
-
Generalized principal-agency: contracts, information, games and beyond. WINE '24.
J. Gan, M. Han, J. Wu, H. Xu. [arXiv]In the principal-agent problem formulated by Myerson'82, agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant linear programming solution that relies on the revelation principle. This paper extends Myerson's results to a more general setting where the principal's action space can be infinite and subject to additional design constraints. Our generalized principal-agent model unifies several important design problems including contract design, information design, and Bayesian Stackelberg games, and encompasses them as special cases. We first extend the revelation principle to this general model, based on which a polynomial-time algorithm is then derived for computing the optimal mechanism for the principal. This algorithm not only implies new efficient solutions simultaneously for all the aforementioned special cases but also significantly simplifies previously known algorithms designed for special cases. Inspired by the recent interest in the algorithmic design of a single contract and menu of contracts, we study such constrained design problems to our general principal-agent model. In contrast to the above unification, our results here illustrate the other facet of diversity among different principal-agent design problems and demonstrate how their different structures can lead to different complexities: some are tractable whereas others are APX-hard. Finally, we reveal an interesting connection of our model to the problem of information acquisition for decision making and study its algorithmic properties in general.
-
Stochastic principal-agent problems: efficient computation and learning.
J. Gan, R. Majumdar, D. Mandal, G. Radanovic. [arXiv]We introduce a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The players communicate with each other and then select actions independently. Each of them receives a payoff based on the state and their joint action, and the environment transitions to a new state. The interaction continues over a finite time horizon. Both players are far-sighted, aiming to maximize their total payoffs over the time horizon. The model encompasses as special cases extensive-form games (EFGs) and stochastic games of incomplete information, partially observable Markov decision processes (POMDPs), as well as other forms of sequential principal-agent interactions, including Bayesian persuasion and automated mechanism design problems.
We consider both the computation and learning of the principal's optimal policy. Since the general problem, which subsumes POMDPs, is intractable, we explore algorithmic solutions under hindsight observability, where the state and the interaction history are revealed at the end of each step. Though the problem becomes more amenable under this condition, the number of possible histories remains exponential in the length of the time horizon, making approaches for EFG-based models infeasible. We present an efficient algorithm based on the inducible value sets. The algorithm computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. Additionally, we show an efficient learning algorithm for an episodic reinforcement learning setting where the transition probabilities are unknown. The algorithm guarantees sublinear regret $\tilde{O}(T^{2/3})$ for both players over $T$ episodes. -
Sequential decision making with information asymmetry. CONCUR '22.
J. Gan, R. Majumdar, G. Radanovic, A. Singla. [link]
We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decision-makers. We consider two versions of the problem: persuasion and mechanism design. In persuasion, a more-informed principal influences the actions of a less-informed agent by signaling information. In mechanism design, a less-informed principal incentivizes a more-informed agent to reveal information by committing to a mechanism, so that the principal can make more informed decisions. We define Markov persuasion processes and Markov mechanism processes that model persuasion and mechanism design into dynamic models. Then we survey results on optimal persuasion and optimal mechanism design on myopic and far-sighted agents. These problems are solvable in polynomial time for myopic agents but hard for far-sighted agents.
-
Bayesian persuasion in sequential decision-making. AAAI '22.
J. Gan, R. Majumdar, G. Radanovic, A. Singla. [pdf | arXiv | link]
We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.
đ„ Also see: our tutorial at WINE '24.
As autonomous agents grow increasingly capable, we face the spectre of manipulative behaviours, ranging from hidden collusions to deceptions, that lead to unintended outcomes. We investigate how information misreporting can evade basic auditing by synchronising behaviour with fabricated claimsâa form of deception we term imitative deception. Such deceptive strategies pose a significant challenge to learning-based mechanism design, undermining its reliabilityâparticularly in high-stakes contexts like security. If agents consistently mask their motives through this approach, what implications does this have for the safety and efficacy of mechanisms built on behavioural data? We initiate algorithmic studies around this question, investigating the vulnerabilities as well as countermeasures associated with deceptive tactics.
-
Persuading a credible agent.
J. Gan, A. Ghosh, N. Teh.
[arXiv]
How to optimally persuade an agent who has a private type? When elicitation is feasible, this amounts to a fairly standard principal-agent-style mechanism design problem, where the persuader employs a mechanism to first elicit the agent's type and then plays the corresponding persuasion strategy based on the agent's report. The optimal mechanism design problem in this setting is relatively well-understood in the literature, with incentive compatible (IC) mechanisms known to be optimal and computationally tractable. In this paper, we study this problem given a credible agent, i.e., if the agent claims they are of a certain type in response to the mechanism's elicitation, then they will act optimally with respect to the claimed type, even if they are actually not of that type.
We present several interesting findings in this new setting that differ significantly from results in the non-credible setting. In terms of the structure of optimal mechanisms, we show that not only may IC mechanisms fail to be optimal, but all mechanisms following the standard 'eliciting-then-persuading' mechanism design structure may be suboptimal. To achieve optimality requires two additional instruments â pre-signaling and non-binding elicitation â which naturally result in multi-stage mechanisms. We characterize optimal mechanisms under these design choices. Based on our characterization, we provide a polynomial-time algorithm for computing optimal multi-stage mechanisms. We also discover that in scenarios that allow for it, partial information elicitation can be employed to improve the principal's payoff even further. Though, surprisingly, an unbounded number of rounds of information exchange between the principal and the agent may be necessary to achieve optimality. -
Learning to manipulate a commitment optimizer.
Y. Chen, X. Deng, J. Gan, Y. Li.
[arXiv]
It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.
-
Optimally deceiving a learning leader in Stackelberg games.
JAIR & NeurIPS '20.
G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-CossĂo, N. Rajgopal, A. Voudouris. [pdf]Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the followerâs deception and the leaderâs maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
-
Manipulating a learning defender and ways to counteract. NeurIPS '19.
J. Gan, Q. Guo, L. Tran-Thanh, B. An, M. Wooldridge. [pdf | arXiv | poster]In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.
-
Imitative follower deception in Stackelberg games. EC '19.
J. Gan, H. Xu, Q. Guo, L. Tran-Thanh, Z. Rabinovich, M. Wooldridge. [pdf | arXiv | poster]Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.
Standard models often assume neat and predictable agent behaviour to simplify theoretical analysis. While these assumptions have led to foundational insights, they frequently fail to account for the complexities and nuances of real-world scenarios. For example, agents may ignore, or be unable to discern, small payoff differences, thereby making suboptimal decisions. Similarly, in sequential decision-making, while standard models assume that agents value future rewards at a constant discount rate, this may vary over time according to real-world empirical evidences. Strategies produced by such idealised models can be surprisingly unreliable once put in practice. We identify these gaps and propose refined solution concepts that more accurately reflect realistic agent behaviours. More importantly, we investigate the computational challenges introduced by these refinements, discovering effective computational approaches to more robust and practical decision-making when interacting with autonomous agents.
-
Robust Stackelberg equilibrium. EC '23.
J. Gan, M. Han, J. Wu, H. Xu.
[link |
arXiv]
This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-ÎŽ suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different â in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency. -
Markov decision processes with time-varying geometric discounting. AAAI '23.
J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic. [link | pdf]Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspectiveâwhereby each time step is treated as an independent decision maker with their own (fixed) discount factorâand we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of Ï”-SPE and show that an Ï”-SPE exists under milder assumptions. An algorithm is presented to compute an Ï”-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
We explore how to embed fairness, safety, and sustainability into multi-agent systems to improve social good. Our work spans developing fair algorithms for resource allocation and decision-making, analysing societal dynamics like demographic segregation, and leveraging game theory to optimise security, transportation, and energy systems. Together, our efforts highlight how computational game theory can address societal challenges while driving innovation in algorithmic research and multi-agent systems.
-
Socially fair reinforcement learning.
D. Mandal, J. Gan.
[arXiv]
We consider the problem of episodic reinforcement learning where there are multiple stakeholders with different reward functions. Our goal is to output a policy that is socially fair with respect to different reward functions. Prior works have proposed different objectives that a fair policy must optimize including minimum welfare, and generalized Gini welfare. We first take an axiomatic view of the problem, and propose four axioms that any such fair objective must satisfy. We show that the Nash social welfare is the unique objective that uniquely satisfies all four objectives, whereas prior objectives fail to satisfy all four axioms. We then consider the learning version of the problem where the underlying model i.e. Markov decision process is unknown. We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare. Based on optimistic planning, we propose a generic learning algorithm and derive its regret bound with respect to the three different policies. For the objective of Nash social welfare, we also derive a lower bound in regret that grows exponentially with n, the number of agents. Finally, we show that for the objective of minimum welfare, one can improve regret by a factor of $O(H)$ for a weaker notion of regret.
-
Envy-free policy teaching to multiple agents. NeurIPS '22.
J. Gan, R. Majumdar, G. Radanovic, A. Singla. [pdf | link]We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without significant computational or PoF burdens.
-
Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence.
J. Gan, E. Elkind, S. Kraus, M. Wooldridge. [link]Real-world security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multi-defender security games and explore the possibility of improving efficiency by coordinating the defenders â specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.
(Preliminary versions appeared in AAMAS '18 and '20.) -
Schelling games on graphs. Artificial Intelligence.
A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris. [link]We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
(Preliminary versions appeared in IJCAI '19 and AAAI '20.)
(See more in full list.)
[αÎČ] denotes alphabetical ordering of authors. (Why?)
-
J. Gan, R. Majumdar, D. Mandal, G. Radanovic.
Stochastic principal-agent problems: efficient computation and learning. [arXiv]We introduce a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The players communicate with each other and then select actions independently. Each of them receives a payoff based on the state and their joint action, and the environment transitions to a new state. The interaction continues over a finite time horizon. Both players are far-sighted, aiming to maximize their total payoffs over the time horizon. The model encompasses as special cases extensive-form games (EFGs) and stochastic games of incomplete information, partially observable Markov decision processes (POMDPs), as well as other forms of sequential principal-agent interactions, including Bayesian persuasion and automated mechanism design problems.
We consider both the computation and learning of the principal's optimal policy. Since the general problem, which subsumes POMDPs, is intractable, we explore algorithmic solutions under hindsight observability, where the state and the interaction history are revealed at the end of each step. Though the problem becomes more amenable under this condition, the number of possible histories remains exponential in the length of the time horizon, making approaches for EFG-based models infeasible. We present an efficient algorithm based on the inducible value sets. The algorithm computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. Additionally, we show an efficient learning algorithm for an episodic reinforcement learning setting where the transition probabilities are unknown. The algorithm guarantees sublinear regret $\tilde{O}(T^{2/3})$ for both players over $T$ episodes. -
J. Gan, A. Ghosh, N. Teh.
Persuading a credible agent. [arXiv]How to optimally persuade an agent who has a private type? When elicitation is feasible, this amounts to a fairly standard principal-agent-style mechanism design problem, where the persuader employs a mechanism to first elicit the agent's type and then plays the corresponding persuasion strategy based on the agent's report. The optimal mechanism design problem in this setting is relatively well-understood in the literature, with incentive compatible (IC) mechanisms known to be optimal and computationally tractable. In this paper, we study this problem given a credible agent, i.e., if the agent claims they are of a certain type in response to the mechanism's elicitation, then they will act optimally with respect to the claimed type, even if they are actually not of that type.
We present several interesting findings in this new setting that differ significantly from results in the non-credible setting. In terms of the structure of optimal mechanisms, we show that not only may IC mechanisms fail to be optimal, but all mechanisms following the standard 'eliciting-then-persuading' mechanism design structure may be suboptimal. To achieve optimality requires two additional instruments â pre-signaling and non-binding elicitation â which naturally result in multi-stage mechanisms. We characterize optimal mechanisms under these design choices. Based on our characterization, we provide a polynomial-time algorithm for computing optimal multi-stage mechanisms. We also discover that in scenarios that allow for it, partial information elicitation can be employed to improve the principal's payoff even further. Though, surprisingly, an unbounded number of rounds of information exchange between the principal and the agent may be necessary to achieve optimality. -
J. Gan, M. Han, J. Wu, H. Xu.
Robust Stackelberg equilibrium. EC '23. [link | arXiv]This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-ÎŽ suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different â in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency. -
J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic.
Markov decision processes with time-varying geometric discounting. AAAI '23. [link | pdf]Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspectiveâwhereby each time step is treated as an independent decision maker with their own (fixed) discount factorâand we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of Ï”-SPE and show that an Ï”-SPE exists under milder assumptions. An algorithm is presented to compute an Ï”-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
-
J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Envy-free policy teaching to multiple agents. NeurIPS '22. [pdf | link]We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without significant computational or PoF burdens.
-
J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence. [link]Real-world security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multi-defender security games and explore the possibility of improving efficiency by coordinating the defenders â specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.
(Preliminary versions appeared in AAMAS '18 and '20.) -
J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Bayesian persuasion in sequential decision-making. AAAI '22 [pdf | arXiv | link]
(đ„ Outstanding Paper Honorable Mention @ AAAI '22)We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.
-
A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. Artificial Intelligence. [link]We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
(Preliminary versions appeared in IJCAI '19 and AAAI '20.) -
G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-CossĂo, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games. Journal of AI Research (JAIR). [pdf]Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the followerâs deception and the leaderâs maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
(A preliminary version appeared in NeurIPS '20.) -
E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. Artificial Intelligence. [link]Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
(A preliminary version appeared in IJCAI '19.) -
J. Gan, Q. Guo, L. Tran-Thanh, B. An, M. Wooldridge.
Manipulating a learning defender and ways to counteract. NeurIPS '19. [pdf | arXiv | poster]In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.
-
J. Gan, H. Xu, Q. Guo, L. Tran-Thanh, Z. Rabinovich, M. Wooldridge.
Imitative follower deception in Stackelberg games. EC '19. [pdf | arXiv | poster]Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.
-
AAAI 19-25, ICLR 22-25,
SAGT 22, ICML 21-24, IJCAI 19-24, AAMAS 20-24, NeurIPS 20-25,
EC 22, ECAI 20&23, EAMO 21, MFCS 19, GameSec 19.
- Artificial Intelligence (AIJ), Journal of Artificial Intelligence Research (JAIR), Theoretical Computer Science (TCS), IEEE Trans. on AI (TAI), IEEE Trans. on Information Forensics and Security (TIFS), Autonomous Agents and Multi-agent Systems (JAAMAS).
- Artificial Intelligence (2022/23/24), lecturer, Oxford
- Advanced Topics in Machine Learning (2022), lecturer, Oxford
https://www.cs.ox.ac.uk/people/jiarui.gan/
Department of Computer Science
University of Oxford
Parks Rd, Oxford OX1 3QD, UK
© Jiarui Gan 2025