I'm a lecturer at the Department of Computer Science, the University of Oxford. I work primarily in computational game theory, multi-agent systems, and AI. My research focuses on understanding and shaping interactions among intelligent agents in complex real-world scenarios. We explore the power of computing to address these challenges and solve the underlying algorithmic “puzzles”. Through this, we develop algorithmic tools to guide agents toward broader organisational and societal goals. My work bridges theory and practice, addressing issues such as fairness, sustainability and safety in multi-agent interactions.

🔍 I'm looking for motivated PhD students. Feel free to get in touch if you're interested in my research areas! (See more about PhD at Oxford here.) For any application-related inquiries, please directly contact me (jiarui.gan@cs.ox.ac.uk) or the Department of Computer Science.

BIO 🛾

Lecturer @Oxford ◄ Postdoc @MPI-SWS (host: Rupak Majumdar) ◄ PhD @Oxford (supervisors: Edith Elkind & Michael Wooldridge)

RESEARCH đŸ§Ș

Insights from well-established frameworks—such as principal-agency and dynamic mechanism design—provide blueprints for architecting and optimising multi-agent systems. By coupling these frameworks with computational models, we create automated solutions that ensure advanced AI agents—whether in transportation, digital platforms, or broader societal ecosystems—remain reliably aligned with organisational objectives. Our work focuses on establishing the computational foundations necessary for these missions, developing systematic approaches to guide multi-agent systems toward beneficial and equitable outcomes.


Algorithmic Principal-Agency

We study how to design incentive mechanisms that effectively coordinate autonomous agents towards organisational objectives, drawing on principal-agency ideas from economics. By framing these design tasks as computational problems, we aim to develop algorithms capable of computing optimal mechanisms. This approach allows advanced computing machineries to be leveraged to generate tangible solutions, especially in complex or large-scale settings where manually crafted solutions are infeasible or insufficient. Our recent work focuses on two key directions: 1) developing universal computational frameworks for principal-agency to illuminate the shared algorithmic structures underlying diverse principal-agent scenarios, and 2) extending these frameworks to address the more intricate and dynamic settings where interactions between agents unfold over multiple stages. Our overarching goal is to discover fundamental computational theories and algorithmic tools that are broadly applicable to principal-agent problems across a wide range of domains.

  • Generalized principal-agency: contracts, information, games and beyond. WINE '24.
    J. Gan, M. Han, J. Wu, H. Xu. [arXiv]

    In the principal-agent problem formulated by Myerson'82, agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant linear programming solution that relies on the revelation principle. This paper extends Myerson's results to a more general setting where the principal's action space can be infinite and subject to additional design constraints. Our generalized principal-agent model unifies several important design problems including contract design, information design, and Bayesian Stackelberg games, and encompasses them as special cases. We first extend the revelation principle to this general model, based on which a polynomial-time algorithm is then derived for computing the optimal mechanism for the principal. This algorithm not only implies new efficient solutions simultaneously for all the aforementioned special cases but also significantly simplifies previously known algorithms designed for special cases. Inspired by the recent interest in the algorithmic design of a single contract and menu of contracts, we study such constrained design problems to our general principal-agent model. In contrast to the above unification, our results here illustrate the other facet of diversity among different principal-agent design problems and demonstrate how their different structures can lead to different complexities: some are tractable whereas others are APX-hard. Finally, we reveal an interesting connection of our model to the problem of information acquisition for decision making and study its algorithmic properties in general.

  • Stochastic principal-agent problems: efficient computation and learning.
    J. Gan, R. Majumdar, D. Mandal, G. Radanovic. [arXiv]

    We introduce a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The players communicate with each other and then select actions independently. Each of them receives a payoff based on the state and their joint action, and the environment transitions to a new state. The interaction continues over a finite time horizon. Both players are far-sighted, aiming to maximize their total payoffs over the time horizon. The model encompasses as special cases extensive-form games (EFGs) and stochastic games of incomplete information, partially observable Markov decision processes (POMDPs), as well as other forms of sequential principal-agent interactions, including Bayesian persuasion and automated mechanism design problems.
    We consider both the computation and learning of the principal's optimal policy. Since the general problem, which subsumes POMDPs, is intractable, we explore algorithmic solutions under hindsight observability, where the state and the interaction history are revealed at the end of each step. Though the problem becomes more amenable under this condition, the number of possible histories remains exponential in the length of the time horizon, making approaches for EFG-based models infeasible. We present an efficient algorithm based on the inducible value sets. The algorithm computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. Additionally, we show an efficient learning algorithm for an episodic reinforcement learning setting where the transition probabilities are unknown. The algorithm guarantees sublinear regret $\tilde{O}(T^{2/3})$ for both players over $T$ episodes.

  • Sequential decision making with information asymmetry. CONCUR '22.
    J. Gan, R. Majumdar, G. Radanovic, A. Singla. [link]

    We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decision-makers. We consider two versions of the problem: persuasion and mechanism design. In persuasion, a more-informed principal influences the actions of a less-informed agent by signaling information. In mechanism design, a less-informed principal incentivizes a more-informed agent to reveal information by committing to a mechanism, so that the principal can make more informed decisions. We define Markov persuasion processes and Markov mechanism processes that model persuasion and mechanism design into dynamic models. Then we survey results on optimal persuasion and optimal mechanism design on myopic and far-sighted agents. These problems are solvable in polynomial time for myopic agents but hard for far-sighted agents.

  • Bayesian persuasion in sequential decision-making. AAAI '22.
    J. Gan, R. Majumdar, G. Radanovic, A. Singla. [pdf | arXiv | link]

    We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.

đŸ’„ Also see: our tutorial at WINE '24.


Manipulative Agents: Risks & Countermeasures

As autonomous agents grow increasingly capable, we face the spectre of manipulative behaviours, ranging from hidden collusions to deceptions, that lead to unintended outcomes. We investigate how information misreporting can evade basic auditing by synchronising behaviour with fabricated claims—a form of deception we term imitative deception. Such deceptive strategies pose a significant challenge to learning-based mechanism design, undermining its reliability—particularly in high-stakes contexts like security. If agents consistently mask their motives through this approach, what implications does this have for the safety and efficacy of mechanisms built on behavioural data? We initiate algorithmic studies around this question, investigating the vulnerabilities as well as countermeasures associated with deceptive tactics.

  • Persuading a credible agent. J. Gan, A. Ghosh, N. Teh. [arXiv]

    How to optimally persuade an agent who has a private type? When elicitation is feasible, this amounts to a fairly standard principal-agent-style mechanism design problem, where the persuader employs a mechanism to first elicit the agent's type and then plays the corresponding persuasion strategy based on the agent's report. The optimal mechanism design problem in this setting is relatively well-understood in the literature, with incentive compatible (IC) mechanisms known to be optimal and computationally tractable. In this paper, we study this problem given a credible agent, i.e., if the agent claims they are of a certain type in response to the mechanism's elicitation, then they will act optimally with respect to the claimed type, even if they are actually not of that type.
    We present several interesting findings in this new setting that differ significantly from results in the non-credible setting. In terms of the structure of optimal mechanisms, we show that not only may IC mechanisms fail to be optimal, but all mechanisms following the standard 'eliciting-then-persuading' mechanism design structure may be suboptimal. To achieve optimality requires two additional instruments — pre-signaling and non-binding elicitation — which naturally result in multi-stage mechanisms. We characterize optimal mechanisms under these design choices. Based on our characterization, we provide a polynomial-time algorithm for computing optimal multi-stage mechanisms. We also discover that in scenarios that allow for it, partial information elicitation can be employed to improve the principal's payoff even further. Though, surprisingly, an unbounded number of rounds of information exchange between the principal and the agent may be necessary to achieve optimality.

  • Learning to manipulate a commitment optimizer. Y. Chen, X. Deng, J. Gan, Y. Li. [arXiv]

    It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.

  • Optimally deceiving a learning leader in Stackelberg games. JAIR & NeurIPS '20.
    G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-CossĂ­o, N. Rajgopal, A. Voudouris. [pdf]

    Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.

  • Manipulating a learning defender and ways to counteract. NeurIPS '19.
    J. Gan, Q. Guo, L. Tran-Thanh, B. An, M. Wooldridge. [pdf | arXiv | poster]

    In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.

  • Imitative follower deception in Stackelberg games. EC '19.
    J. Gan, H. Xu, Q. Guo, L. Tran-Thanh, Z. Rabinovich, M. Wooldridge. [pdf | arXiv | poster]

    Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.


Realistic Agent Behaviour Modelling

Standard models often assume neat and predictable agent behaviour to simplify theoretical analysis. While these assumptions have led to foundational insights, they frequently fail to account for the complexities and nuances of real-world scenarios. For example, agents may ignore, or be unable to discern, small payoff differences, thereby making suboptimal decisions. Similarly, in sequential decision-making, while standard models assume that agents value future rewards at a constant discount rate, this may vary over time according to real-world empirical evidences. Strategies produced by such idealised models can be surprisingly unreliable once put in practice. We identify these gaps and propose refined solution concepts that more accurately reflect realistic agent behaviours. More importantly, we investigate the computational challenges introduced by these refinements, discovering effective computational approaches to more robust and practical decision-making when interacting with autonomous agents.

  • Robust Stackelberg equilibrium. EC '23. J. Gan, M. Han, J. Wu, H. Xu. [link | arXiv]

    This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-ή suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different — in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
    We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.

  • Markov decision processes with time-varying geometric discounting. AAAI '23.
    J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic. [link | pdf]

    Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective—whereby each time step is treated as an independent decision maker with their own (fixed) discount factor—and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of Ï”-SPE and show that an Ï”-SPE exists under milder assumptions. An algorithm is presented to compute an Ï”-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.


AI for the Greater Good: Fairness, Security and Sustainability

We explore how to embed fairness, safety, and sustainability into multi-agent systems to improve social good. Our work spans developing fair algorithms for resource allocation and decision-making, analysing societal dynamics like demographic segregation, and leveraging game theory to optimise security, transportation, and energy systems. Together, our efforts highlight how computational game theory can address societal challenges while driving innovation in algorithmic research and multi-agent systems.

  • Socially fair reinforcement learning. D. Mandal, J. Gan. [arXiv]

    We consider the problem of episodic reinforcement learning where there are multiple stakeholders with different reward functions. Our goal is to output a policy that is socially fair with respect to different reward functions. Prior works have proposed different objectives that a fair policy must optimize including minimum welfare, and generalized Gini welfare. We first take an axiomatic view of the problem, and propose four axioms that any such fair objective must satisfy. We show that the Nash social welfare is the unique objective that uniquely satisfies all four objectives, whereas prior objectives fail to satisfy all four axioms. We then consider the learning version of the problem where the underlying model i.e. Markov decision process is unknown. We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare. Based on optimistic planning, we propose a generic learning algorithm and derive its regret bound with respect to the three different policies. For the objective of Nash social welfare, we also derive a lower bound in regret that grows exponentially with n, the number of agents. Finally, we show that for the objective of minimum welfare, one can improve regret by a factor of $O(H)$ for a weaker notion of regret.

  • Envy-free policy teaching to multiple agents. NeurIPS '22.
    J. Gan, R. Majumdar, G. Radanovic, A. Singla. [pdf | link]

    We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without significant computational or PoF burdens.

  • Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence.
    J. Gan, E. Elkind, S. Kraus, M. Wooldridge. [link]

    Real-world security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multi-defender security games and explore the possibility of improving efficiency by coordinating the defenders — specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.

    (Preliminary versions appeared in AAMAS '18 and '20.)

  • Schelling games on graphs. Artificial Intelligence.
    A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris. [link]

    We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.

    (Preliminary versions appeared in IJCAI '19 and AAAI '20.)

  • Envy-freeness in house allocation problems. Mathematical Social Sciences.
    J. Gan, W. Suksompong, A. Voudouris. [pdf | arXiv]

    We consider the house allocation problem, where m houses are to be assigned to n agents so that each agent gets exactly one house. We present a polynomial-time algorithm that determines whether an envy-free assignment exists, and if so, computes one such assignment. We also show that an envy-free assignment exists with high probability if the number of houses exceeds the number of agents by a logarithmic factor.

(See more in full list.)

PAPERS đŸ§» [view full list]

[αÎČ] denotes alphabetical ordering of authors. (Why?)


  • preprints
  • J. Gan, R. Majumdar, D. Mandal, G. Radanovic.
    Stochastic principal-agent problems: efficient computation and learning. [arXiv]

    We introduce a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The players communicate with each other and then select actions independently. Each of them receives a payoff based on the state and their joint action, and the environment transitions to a new state. The interaction continues over a finite time horizon. Both players are far-sighted, aiming to maximize their total payoffs over the time horizon. The model encompasses as special cases extensive-form games (EFGs) and stochastic games of incomplete information, partially observable Markov decision processes (POMDPs), as well as other forms of sequential principal-agent interactions, including Bayesian persuasion and automated mechanism design problems.
    We consider both the computation and learning of the principal's optimal policy. Since the general problem, which subsumes POMDPs, is intractable, we explore algorithmic solutions under hindsight observability, where the state and the interaction history are revealed at the end of each step. Though the problem becomes more amenable under this condition, the number of possible histories remains exponential in the length of the time horizon, making approaches for EFG-based models infeasible. We present an efficient algorithm based on the inducible value sets. The algorithm computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. Additionally, we show an efficient learning algorithm for an episodic reinforcement learning setting where the transition probabilities are unknown. The algorithm guarantees sublinear regret $\tilde{O}(T^{2/3})$ for both players over $T$ episodes.

  • J. Gan, A. Ghosh, N. Teh.
    Persuading a credible agent. [arXiv]

    How to optimally persuade an agent who has a private type? When elicitation is feasible, this amounts to a fairly standard principal-agent-style mechanism design problem, where the persuader employs a mechanism to first elicit the agent's type and then plays the corresponding persuasion strategy based on the agent's report. The optimal mechanism design problem in this setting is relatively well-understood in the literature, with incentive compatible (IC) mechanisms known to be optimal and computationally tractable. In this paper, we study this problem given a credible agent, i.e., if the agent claims they are of a certain type in response to the mechanism's elicitation, then they will act optimally with respect to the claimed type, even if they are actually not of that type.
    We present several interesting findings in this new setting that differ significantly from results in the non-credible setting. In terms of the structure of optimal mechanisms, we show that not only may IC mechanisms fail to be optimal, but all mechanisms following the standard 'eliciting-then-persuading' mechanism design structure may be suboptimal. To achieve optimality requires two additional instruments — pre-signaling and non-binding elicitation — which naturally result in multi-stage mechanisms. We characterize optimal mechanisms under these design choices. Based on our characterization, we provide a polynomial-time algorithm for computing optimal multi-stage mechanisms. We also discover that in scenarios that allow for it, partial information elicitation can be employed to improve the principal's payoff even further. Though, surprisingly, an unbounded number of rounds of information exchange between the principal and the agent may be necessary to achieve optimality.

  • Y. Chen, X. Deng, J. Gan, Y. Li.
    Learning to manipulate a commitment optimizer. [arXiv]

    It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.

  • D. Mandal, J. Gan.
    Socially fair reinforcement learning. [arXiv]

    We consider the problem of episodic reinforcement learning where there are multiple stakeholders with different reward functions. Our goal is to output a policy that is socially fair with respect to different reward functions. Prior works have proposed different objectives that a fair policy must optimize including minimum welfare, and generalized Gini welfare. We first take an axiomatic view of the problem, and propose four axioms that any such fair objective must satisfy. We show that the Nash social welfare is the unique objective that uniquely satisfies all four objectives, whereas prior objectives fail to satisfy all four axioms. We then consider the learning version of the problem where the underlying model i.e. Markov decision process is unknown. We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare. Based on optimistic planning, we propose a generic learning algorithm and derive its regret bound with respect to the three different policies. For the objective of Nash social welfare, we also derive a lower bound in regret that grows exponentially with n, the number of agents. Finally, we show that for the objective of minimum welfare, one can improve regret by a factor of $O(H)$ for a weaker notion of regret.


  • 2024
  • J. Gan, M. Han, J. Wu, H. Xu.
    Generalized principal-agency: contracts, information, games and beyond. WINE '24. [arXiv]

    In the principal-agent problem formulated by Myerson'82, agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant linear programming solution that relies on the revelation principle. This paper extends Myerson's results to a more general setting where the principal's action space can be infinite and subject to additional design constraints. Our generalized principal-agent model unifies several important design problems including contract design, information design, and Bayesian Stackelberg games, and encompasses them as special cases. We first extend the revelation principle to this general model, based on which a polynomial-time algorithm is then derived for computing the optimal mechanism for the principal. This algorithm not only implies new efficient solutions simultaneously for all the aforementioned special cases but also significantly simplifies previously known algorithms designed for special cases. Inspired by the recent interest in the algorithmic design of a single contract and menu of contracts, we study such constrained design problems to our general principal-agent model. In contrast to the above unification, our results here illustrate the other facet of diversity among different principal-agent design problems and demonstrate how their different structures can lead to different complexities: some are tractable whereas others are APX-hard. Finally, we reveal an interesting connection of our model to the problem of information acquisition for decision making and study its algorithmic properties in general.

  • H. Aziz, J. Gan, G. Lisowski, A. Pourmiri.
    The team order problem: maximizing the probability of matching being large enough. SAGT '24. [link]

    We consider a matching problem, which is meaningful in team competitions, as well as in information theory, recommender systems, and assignment problems. In the competitions which we study, each competitor in a team order plays a match with the corresponding opposing player. The team that wins more matches wins. We consider a problem where the input is the graph of probabilities that a team 1 player can win against the team 2 player, and the output is the optimal ordering of team 1 players given the fixed ordering of team 2. Our central result is a polynomial-time approximation scheme (PTAS) to compute a matching whose winning probability is at most $\epsilon$ less than the winning probability of the optimal matching. We also provide tractability results for several special cases of the problem, as well as an analytical bound on how far the winning probability of a maximum weight matching of the underlying graph is from the best achievable winning probability.


  • 2023
  • J. Gan, M. Han, J. Wu, H. Xu.
    Robust Stackelberg equilibrium. EC '23. [link | arXiv]

    This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-ή suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different — in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
    We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.

  • W. Lee, D. Hyland, A. Abate, E. Elkind, J. Gan, J. Gutierrez, P. Harrenstein, M. Wooldridge
    k-Prize weighted voting game. AAMAS '23. [arXiv | link]

    We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts to analyse the games in this class, and characterise the existence of stable outcomes in games with three players and two prizes, and in games with uniform prizes. We then explore the efficiency of stable outcomes in terms of Pareto optimality and utilitarian social welfare. Finally, we study the computational complexity of finding stable outcomes.

  • J. Gan, B. Li, X. Wu.
    Approximation algorithm for computing budget-feasible EF1 allocations. AAMAS '23. [arXiv | link]

    We study algorithmic fairness in a budget-feasible resource allocation problem. In this problem, a set of items with varied sizes and values are to be allocated to a group of agents, while each agent has a budget constraint on the total size of items she can receive. An envy-free (EF) allocation is defined in this context as one in which no agent envies another for the items they get and, in addition, no agent envies the charity, who is automatically endowed with all the unallocated items. Since EF allocations barely exist even without budget constraints, we are interested in the relaxed notion of envy-freeness up to one item (EF1). In this paper, we further the recent progress towards understanding the existence and approximations of EF1 (or EF2) allocations. We propose a polynomial-time algorithm that computes a 1/2-approximate EF1 allocation for an arbitrary number of agents with heterogeneous budgets. For the uniform-budget and two-agent cases, we present a polynomial-time algorithm that computes an exact EF1 allocation. We also consider the large budget setting, where the item sizes are infinitesimal relative to the agents' budgets. We show that both the allocations that maximize the Nash social welfare and the allocations that our main algorithm computes are EF1 in the limit.

  • J. Gan, B. Li, Y. Li.
    Your college dorm and dormmates: fair resource sharing with externalities. JAIR. [arXiv | link]

    We study a fair resource sharing problem, where a set of resources are to be shared among a group of agents. Each agent demands one resource and each resource can serve a limited number of agents. An agent cares about what resource they get as well as the externalities imposed by their mates, who share the same resource with them. Clearly, the strong notion of envy-freeness, where no agent envies another for their resource or mates, cannot always be achieved and we show that even deciding the existence of such a strongly envy-free assignment is an intractable problem. Hence, a more interesting question is whether (and in what situations) a relaxed notion of envy-freeness, the Pareto envy-freeness, can be achieved. Under this relaxed notion, an agent envies another only when they envy both the resource and the mates of the other agent. In particular, we are interested in a dorm assignment problem, where students are to be assigned to dorms with the same capacity and they have dichotomous preference over their dormmates. We show that when the capacity of each dorm is 2, a Pareto envy-free assignment always exists and we present a polynomial-time algorithm to compute such an assignment. Nevertheless, the result breaks immediately when the capacity increases to 3, in which case even Pareto envy-freeness cannot be guaranteed. In addition to the existential results, we also investigate the utility guarantees of (Pareto) envy-free assignments in our model.

  • J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic.
    Markov decision processes with time-varying geometric discounting. AAAI '23. [link | pdf]

    Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective—whereby each time step is treated as an independent decision maker with their own (fixed) discount factor—and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of Ï”-SPE and show that an Ï”-SPE exists under milder assumptions. An algorithm is presented to compute an Ï”-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.

  • D. Mandal, G. Radanovic, J. Gan, A. Singla, R. Majumdar.
    Online reinforcement learning with uncertain episode lengths. AAAI '23. [link]

    Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is drawn from a distribution. We first establish that this problem is equivalent to online reinforcement learning with general discounting where the learner is trying to optimize the expected discounted sum of rewards over an infinite horizon, but where the discounting function is not necessarily geometric. We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths. We then design a reinforcement learning algorithm that minimizes regret with general discounting but acts for the setting with uncertain episode lengths. We instantiate our general bound for different types of discounting, including geometric and polynomial discounting. We also show that we can obtain similar regret bounds even when the uncertainty over the episode lengths is unknown, by estimating the unknown distribution over time. Finally, we compare our learning algorithms with existing value-iteration based episodic RL algorithms on a grid-world environment.


  • 2022
  • J. Gan, R. Majumdar, G. Radanovic, A. Singla.
    Envy-free policy teaching to multiple agents. NeurIPS '22. [pdf | link]

    We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without significant computational or PoF burdens.

  • J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
    Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence. [link]

    Real-world security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multi-defender security games and explore the possibility of improving efficiency by coordinating the defenders — specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.

    (Preliminary versions appeared in AAMAS '18 and '20.)

  • J. Gan, R. Majumdar, G. Radanovic, A. Singla.
    Sequential decision making with information asymmetry. CONCUR '22 (invited talk) [link]

    We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decision-makers. We consider two versions of the problem: persuasion and mechanism design. In persuasion, a more-informed principal influences the actions of a less-informed agent by signaling information. In mechanism design, a less-informed principal incentivizes a more-informed agent to reveal information by committing to a mechanism, so that the principal can make more informed decisions. We define Markov persuasion processes and Markov mechanism processes that model persuasion and mechanism design into dynamic models. Then we survey results on optimal persuasion and optimal mechanism design on myopic and far-sighted agents. These problems are solvable in polynomial time for myopic agents but hard for far-sighted agents.

  • J. Gan, R. Majumdar, G. Radanovic, A. Singla.
    Bayesian persuasion in sequential decision-making. AAAI '22 [pdf | arXiv | link]
    (đŸ’„ Outstanding Paper Honorable Mention @ AAAI '22)

    We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.

  • K. Banihashem, A. Singla, J. Gan, G. Radanovic.
    Admissible policy teaching through reward design. AAAI '22 [arXiv | link]

    We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NP-hard to find an approximately optimal reward modification. We then proceed by formulating a surrogate problem whose optimal solution approximates the optimal solution to the reward design problem in our setting, but is more amenable to optimization techniques and analysis. For this surrogate problem, we present characterization results that provide bounds on the value of the optimal solution. Finally, we design a local search algorithm to solve the surrogate problem and showcase its utility using simulation-based experiments.


  • 2021
  • A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
    Schelling games on graphs. Artificial Intelligence. [link]

    We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.

    (Preliminary versions appeared in IJCAI '19 and AAAI '20.)

  • G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-CossĂ­o, N. Rajgopal, A. Voudouris.
    Optimally deceiving a learning leader in Stackelberg games. Journal of AI Research (JAIR). [pdf]

    Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.

    (A preliminary version appeared in NeurIPS '20.)

  • W. Xiaowei, B. Li, J. Gan.
    Budget-feasible maximum Nash social welfare allocation is almost envy-free. IJCAI '21. [pdf | arXiv]

    The Nash social welfare (NSW) is a well-known social welfare measurement that balances individual utilities and the overall efficiency. In the context of fair allocation of indivisible goods, it has been shown by Caragiannis et al. (EC 2016 and TEAC 2019) that an allocation maximizing the NSW is envy-free up to one good (EF1). In this paper, we are interested in the fairness of the NSW in a budget-feasible allocation problem, in which each item has a cost that will be incurred to the agent it is allocated to, and each agent has a budget constraint on the total cost of items she receives. We show that a budget-feasible allocation that maximizes the NSW achieves a 1/4-approximation of EF1 and the approximation ratio is tight. The approximation ratio improves gracefully when the items have small costs compared with the agents' budgets; it converges to 1/2 when the budget-cost ratio approaches infinity.

  • D. Mutzari, J. Gan, S. Kraus.
    Coalition formation in multi-defender security games. AAAI '21. [pdf]

    We study Stackelberg security game (SSG) with multiple defenders, where heterogeneous defenders need to allocate security resources to protect a set of targets against a strategic attacker. In such games, coordination and cooperation between the defenders can increase their ability to protect their assets, but the heterogeneous preferences of the self-interested defenders often make such cooperation very difficult. In this paper, we approach the problem from the perspective of cooperative game theory and study coalition formation among the defenders. Our main contribution is a number of algorithmic results for the computation problems that arise in this model. We provide a poly-time algorithm for computing a solution in the {\em core} of the game and show that all of the elements in the core are Pareto efficient. We show that the problem of computing the entire core is NP-hard and then delve into a special setting where the size of a coalition is limited up to some threshold. We analyse the parameterized complexity of deciding if a coalition structure is in the core under this special setting, and provide a poly-time algorithm for computing successful deviation strategies for a given coalition.

  • E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
    Protecting elections by recounting ballots. Artificial Intelligence. [link]

    Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.

    (A preliminary version appeared in IJCAI '19.)


  • 2020
  • G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-CossĂ­o, N. Rajgopal, A. Voudouris.
    Optimally deceiving a learning leader in Stackelberg games. NeurIPS '20. [pdf]

    Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near-)optimal payoffs for various scenarios about the learning interaction between leader and follower.

  • J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
    Mechanism design for defense coordination in security games. AAMAS '20. [pdf]

    Recent work studied Stackelberg security games with multiple defenders, in which heterogeneous defenders allocate security resources to protect a set of targets against a strategic attacker. Equilibrium analysis was conducted to characterize outcomes of these games when defenders act independently. Our starting point is the observation that the use of resources in equilibria may be inefficient due to lack of coordination. We explore the possibility of reducing this inefficiency by coordinating the defenders—specifically, by pooling the defenders' resources and allocating them jointly. The defenders’ heterogeneous preferences then give rise to a collective decision-making problem, which calls for a mechanism to generate joint allocation strategies. We seek a mechanism that encourages coordination, produces efficiency gains, and incentivizes the defenders to report their true preferences and to execute the recommended strategies. Our results show that, unfortunately, even these basic properties clash with each other and no mechanism can achieve them simultaneously, which reveals the intrinsic difficulty of achieving meaningful defense coordination in security games. On the positive side, we put forward mechanisms that fulfill some of these properties and we identify special cases of our setting where more of these properties are compatible.

  • A. Agarwal, E. Elkind, J. Gan, A. Voudouris.
    Swap stability in Schelling games on graphs. AAAI '20. [pdf | arXiv]

    We study a recently introduced class of strategic games that is motivated by and generalizes Schelling's well-known residential segregation model. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either occupies a node of the graph and never moves away or aims to maximize the fraction of her neighbors who are of her own type. We consider a variant of this model that we call swap Schelling games, where the number of agents is equal to the number of nodes of the graph, and agents may swap positions with other agents to increase their utility. We study the existence, computational complexity and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective.


  • 2019
  • J. Gan, Q. Guo, L. Tran-Thanh, B. An, M. Wooldridge.
    Manipulating a learning defender and ways to counteract. NeurIPS '19. [pdf | arXiv | poster]

    In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.

  • J. Gan, W. Suksompong, A. Voudouris.
    Envy-freeness in house allocation problems. Mathematical Social Sciences. [pdf | arXiv]

    We consider the house allocation problem, where m houses are to be assigned to n agents so that each agent gets exactly one house. We present a polynomial-time algorithm that determines whether an envy-free assignment exists, and if so, computes one such assignment. We also show that an envy-free assignment exists with high probability if the number of houses exceeds the number of agents by a logarithmic factor.

  • J. Gan, H. Xu, Q. Guo, L. Tran-Thanh, Z. Rabinovich, M. Wooldridge.
    Imitative follower deception in Stackelberg games. EC '19. [pdf | arXiv | poster]

    Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.

  • E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
    Schelling games on graphs. IJCAI '19. [pdf | arXiv]

    We consider strategic games that are inspired by Schelling's model of residential segregation. In our model, the agents are partitioned into k types and need to select locations on an undirected graph. Agents can be either stubborn, in which case they will always choose their preferred location, or strategic, in which case they aim to maximize the fraction of agents of their own type in their neighborhood. We investigate the existence of equilibria in these games, study the complexity of finding an equilibrium outcome or an outcome with high social welfare, and also provide upper and lower bounds on the price of anarchy and stability. Some of our results extend to the setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.

  • E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
    Protecting elections by recounting ballots. IJCAI '19. [pdf | arXiv | poster]

    Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.

  • Q. Guo, J. Gan, F. Fang, L. Tran-Thanh, M. Tambe, B. An.
    On the inducibility of Stackelberg equilibrium for security games. AAAI '19. [pdf | arXiv]

    Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader's payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiments unveil the empirical advantage of the ISE over the SSE.


  • 2018
  • J. Gan, E. Elkind, M. Wooldridge.
    Stackelberg security games with multiple uncoordinated defenders. AAMAS '18. [pdf]

    Stackelberg security games have received much attention in recent years. While most existing work focuses on single-defender settings, there are many real-world scenarios that involve multiple defenders (e.g., multi-national anti-crime actions in international waters, different security agencies patrolling the same area). In this paper, we consider security games with uncoordinated defenders who jointly protect a set of targets, but may have different valuations for these targets; each defender schedules their own resources and selfishly optimizes their own utility. We generalize the standard (single-defender) model of Stackelberg security games to this setting and formulate an equilibrium concept that captures the nature of strategic interaction among the players. We argue that an exact equilibrium may fail to exist, and, in fact, deciding whether it exists is NP-hard. However, under mild assumptions, every multi-defender security game admits an Ï”-equilibrium for every Ï” > 0, and the limit points corresponding to Ï” → 0 can be efficiently approximated.

  • Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
    Optimal electric vehicle fast charging station placement based on game theoretical framework. IEEE Trans. Intelligent Transportation Systems. [pdf]

    To reduce the air pollution and improve the energy efficiency, many countries and cities (e.g., Singapore) are on the way of introducing electric vehicles (EVs) to replace the vehicles serving in current traffic system. Effective placement of charging stations is essential for the rapid development of EVs, because it is necessary for providing convenience for EVs and ensuring the efficiency of the traffic network. However, existing works mostly concentrate on the mileage anxiety from EV users but ignore their strategic and competitive charging behaviors. To capture the competitive and strategic charging behaviors of the EV users, we consider that an EV user's charging cost, which is dependent on other EV users' choices, consists of the travel cost to access the charging station and the queuing cost in charging stations. First, we formulate the Charging Station Placement Problem (CSPP) as a bilevel optimization problem. Then, by exploiting the equilibrium of the EV charging game, we convert the bilevel optimization problem to a single-level one, following which we analyze the properties of CSPP and propose an algorithm Optimizing eleCtric vEhicle chArging statioN (OCEAN) to compute the optimal allocation of charging stations. Due to OCEAN's scalability issue, we furthermore present a heuristic algorithm OCEAN with Continuous variables to deal with large-scale real-world problems. Finally, we demonstrate and discuss the results of the extensive experiments we did. It is shown that our approach outperform baseline methods significantly.


  • 2017
  • J. Gan, B. An, Y. Vorobeychik, B. Gauch.
    Security games on a plane. AAAI '17. [pdf]

    Most existing models of Stackelberg security games ignore the underlying topology of the space in which targets and defence resources are located. As a result, allocation of resources is restricted to a discrete collection of exogenously defined targets. However, in many practical security settings, defense resources can be located on a continuous plane. Better defense solutions could therefore be potentially achieved by placing resources in a space outside of actual targets (e.g., between targets). To address this limitation, we propose a model called Security Game on a Plane (SGP) in which targets are distributed on a 2-dimensional plane, and security resources, to be allocated on the same plane, protect targets within a certain effective distance. We investigate the algorithmic aspects of SGP. We find that computing a strong Stackelberg equilibrium of an SGP is NP-hard even for zerosum games, and these are inapproximable in general. On the positive side, we find an exact solution technique for general SGPs based on an existing approach, and develop a PTAS (polynomial-time approximation scheme) for zero-sum SGP to more fundamentally overcome the computational obstacle. Our experiments demonstrate the value of considering SGP and effectiveness of our algorithms.

  • Y. Zhang, B. An, L. Tran-Thanh, Z. Wang, J. Gan, N. Jennings.
    Optimal escape interdiction on transportation networks. IJCAI '17. [pdf]

    Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to time-dependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with time-dependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defender-attacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixed-integer linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realistic-sized transportation networks with hundreds of intersections.

  • J. Gan, B. An.
    Game-theoretic considerations for optimizing taxi system efficiency. IEEE Intelligent Systems. [pdf]

    Taxi service is an indispensable part of public transport in modern cities. To support its unique features, a taxi system adopts a decentralized operation mode in which thousands of taxis freely decide their working schedules and routes. Taxis compete with each other for individual profits regardless of system-level efficiency, making the taxi system inefficient and hard to optimize. Most research into the management and economics of taxi markets has focused on modeling from a macro level the effects of and relationships between various market factors. Less has been done regarding a more important component--drivers' strategic behavior under the decentralized operation mode. The authors propose looking at the problem from a game-theoretic perspective. Combining game-theoretic solution concepts with existing models of taxi markets, they model taxi drivers' strategy-making process as a game and transform the problem of optimizing taxi system efficiency into finding a market policy that leads to the desired equilibrium.


  • 2016
  • Q. Guo, B. An, Y. Vorobeychik, L. Tran-Thanh, J. Gan, C. Miao.
    Coalitional security games. AAMAS '16. [pdf]

    Game theoretic models of security, and associated computational methods, have emerged as critical components of security posture across a broad array of domains, including airport security and coast guard. These approaches consider terrorists as motivated but independent entities. There is, however, increasing evidence that attackers, be it terrorists or cyber attackers, communicate extensively and form coalitions that can dramatically increase their ability to achieve malicious goals. To date, such cooperative decision making among attackers has been ignored in the security games literature. To address the issue of cooperation among attackers, we introduce a novel coalitional security game (CSG) model. A CSG consists of a set of attackers connected by a (communication or trust) network who can form coalitions as connected subgraphs of this network so as to attack a collection of targets. A defender in a CSG can delete a set of edges, incurring a cost for deleting each edge, with the goal of optimally limiting the attackers’ ability to form effective coalitions (in terms of successfully attacking high value targets). We first show that a CSG is, in general, hard to approximate. Nevertheless, we develop a novel branch and price algorithm, leveraging a combination of column generation, relaxation, greedy approximation, and stabilization methods to enable scalable high-quality approximations of CSG solutions on realistic problem instances.

  • Y. Xiong, J. Gan, B. An, C. Miao.
    Optimal pricing for efficient electric vehicle charging station management. AAMAS '16. [pdf]

    The rapid development of Electric Vehicles (EVs) seen in recent years has been drawing increasing attentions from the public, markets, decision-makers, and academia. Notwithstanding the progress, issues still remain. Because of the widely complained disadvantages of limited battery capacity and long charging time, charging convenience has become a top concern that greatly hinders the adoption of EVs. Specialized EV charging station, which provides more than 10 times faster charging speed than domestic charging, is therefore a critical element for successful EV promotion. While most existing researches focus on optimizing spatial placement of charging stations, they are inflexible and inefficient against rapidly changing urban structure and traffic pattern. Therefore, this paper approaches the management of EV charging stations from the pricing perspective as a more flexible and adaptive complement to established charging station placement. In this paper, we build a realistic pricing model in consideration of residential travel pattern and EV drivers’ self-interested charging behavior, traffic congestion, and operating expense of charging stations. We formulate the pricing problem as a mixed integer non-convex optimization problem, and propose a scalable algorithm to solve it. Experiments on both mock and real data are also conducted, which show scalability of our algorithm as well as our solution’s significant improvement over existing approaches.


  • 2015
  • Y. Yin, H. Xu, J. Gan, B. An, A. Jiang.
    Computing optimal mixed strategies for security games with dynamic payoffs. IJCAI '15. [pdf]

    Security agencies in the real world often need to protect targets with time-dependent values, e.g., tourist sites where the number of travelers changes over time. Since the values of different targets often change asynchronously, the defender can relocate security resources among targets dynamically to make the best use of limited resources. We propose a game-theoretic scheme to develop dynamic, randomized security strategies in consideration of adversary’s surveillance capability. This differs from previous studies on security games by considering varying target values and continuous strategy spaces of the security agency and the adversary. The main challenge lies in the computational intensiveness due to the continuous, hence infinite strategy spaces. We propose an optimal algorithm and an arbitrarily near-optimal algorithm to compute security strategies under different conditions. Experimental results show that both algorithms significantly outperform existing approaches.

  • Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
    Optimal electric vehicle charging station placement. IJCAI '15. [pdf]

    Many countries like Singapore are planning to introduce Electric Vehicles (EVs) to replace traditional vehicles to reduce air pollution and improve energy efficiency. The rapid development of EVs calls for efficient deployment of charging stations both for the convenience of EVs and maintaining the efficiency of the road network. Unfortunately, existing work makes unrealistic assumption on EV drivers’ charging behaviors and focus on the limited mobility of EVs. This paper studies the Charging Station PLacement (CSPL) problem, and takes into consideration 1) EV drivers’ strategic behaviors to minimize their charging cost, and 2) the mutual impact of EV drivers’ strategies on the traffic conditions of the road network and service quality of charging stations. We first formulate the CSPL problem as a bilevel optimization problem, which is subsequently converted to a single-level optimization problem by exploiting structures of the EV charging game. Properties of CSPL problem are analyzed and an algorithm called OCEAN is proposed to compute the optimal allocation of charging stations. We further propose a heuristic algorithm OCEAN-C to speed up OCEAN. Experimental results show that the proposed algorithms significantly outperform baseline methods.

  • J. Gan, B. An, C. Miao.
    Optimizing efficiency of taxi systems: scaling-up and handling arbitrary constraints. AAMAS '15. [pdf]

    Taxi service is an indispensable part of public transport in modern cities. However, due to its decentralized operation mode, taxi services in many cities are inefficient. Besides, the decentralized nature also poses significant challenges to analyzing and regulating taxi services. State of the art computational methods for optimizing taxi market efficiency suffer from two important limitations: 1) they cannot be scaled up efficiently; and 2) they cannot address complex real-world market situations where additional scheduling constraints need to be handled. In this paper, we propose two novel algorithms—FLORA and FLORA-A—to address the inadequacies. Using convex polytope representation techniques, FLORA provides a fully compact representation for taxi drivers’ strategy space and scales up more efficiently than existing algorithms. FLORA-A avoids enumerating the entire exponentially large pure strategy space by gradually expanding the strategy space. It is the first known method capable of handling arbitrary scheduling constraints for optimizing taxi system efficiency. Experimental results show orders of magnitude improvement in speed FLORA provides, and the necessity of using FLORA-A as suggested by changes in the taxi drivers’ operation strategy under different market conditions.

  • J. Gan, B. An, Y. Vorobeychik.
    Security games with protection externality. AAAI'15. [pdf | appendix]

    Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important real-world security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other “neighbouring” targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NP-hard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixed-integer linear programming formulation for the slave problem; 2) an extended greedy approach with a constant-factor approximation ratio to speed up the slave problem; and 3) a linear-scale linear programming that efficiently calculates the upper bounds of target-defined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realistic-sized SPE problem instances.


  • 2013
  • J. Gan, B. An, H. Wang, X. Sun, Z. Shi.
    Optimal pricing for improving efficiency of taxi systems. IJCAI ’13. [pdf]

    In Beijing, most taxi drivers intentionally avoid working during peak hours despite of the huge customer demand within these peak periods. This dilemma is mainly due to the fact that taxi drivers' congestion costs are not reflected in the current taxi fare structure. To resolve this problem, we propose a new pricing scheme to provide taxi drivers with extra incentives to work during peak hours. This differs from previous studies of taxi market by considering market variance over multiple periods, taxi drivers' profit-driven decisions, and their scheduling constraints regarding the interdependence among different periods. The major challenge of this research is the computational intensiveness to identify optimal strategy due to the exponentially large size of a taxi driver's strategy space and the scheduling constraints. We develop an atom schedule method to overcome these issues. It reduces the magnitude of the problem while satisfying the constraints to filter out infeasible pure strategies. Simulation results based on real data show the effectiveness of the proposed methods, which opens up a new door to improving the efficiency of taxi market in megacities (e.g., Beijing).

SERVICES 🧞
Conference PC Member & Reviewer
  • AAAI 19-25, ICLR 22-25,
    SAGT 22, ICML 21-24, IJCAI 19-24, AAMAS 20-24, NeurIPS 20-25,
    EC 22, ECAI 20&23, EAMO 21, MFCS 19, GameSec 19.
Journal Reviewer
  • Artificial Intelligence (AIJ), Journal of Artificial Intelligence Research (JAIR), Theoretical Computer Science (TCS), IEEE Trans. on AI (TAI), IEEE Trans. on Information Forensics and Security (TIFS), Autonomous Agents and Multi-agent Systems (JAAMAS).
TEACHING 🎓

dblp | Google scholar | department webpage

CONTACT 📍
jiarui.gan@cs.ox.ac.uk
https://www.cs.ox.ac.uk/people/jiarui.gan/
Department of Computer Science
University of Oxford
Parks Rd, Oxford OX1 3QD, UK

© Jiarui Gan 2025