I'm a lecturer at the Department of Computer Science, University of Oxford. I work mainly in the area of computational game theory, and more broadly, in the fields of multi-agent systems, EconCS, and AI. My research focuses on questions about finding good strategies for interacting with intelligent agents, especially in real-world scenarios with all sorts of complexities. I am interested in exploring the power of computer science to tackle this challenge, and enjoy solving the related algorithmic "puzzles" and and uncovering what can and cannot be achieved through understanding the computational complexity of the problems. My work aims to deepen our understanding of strategic decision-making and multi-agent interactions, and to develop effective algorithms and models for these domains.
🔍 I'm looking for PhD students. Feel free to get in touch if you're interested! (See more about PhD at Oxford here.) For any application-related inquiries, please directly contact me (email@example.com) or the Department of Computer Science.
PAPERS 🧻 [view full list]
※ denotes alphabetical ordering of authors. (Why?)
J. Gan, R. Majumdar, D. Mandal, G. Radanovic.
Sequential principal-agent problems with communication: efficient computation and learning.
We study a sequential decision making problem between a principal and an agent with incomplete information on both sides. In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The principal and the agent communicate their signals to each other, and select their actions independently based on this communication. Each player receives a payoff based on the state and their joint actions, and the environment moves to a new state. The interaction continues over a finite time horizon, and both players act to optimize their own total payoffs over the horizon. Our model encompasses as special cases stochastic games of incomplete information and POMDPs, as well as sequential Bayesian persuasion and mechanism design problems. We study both computation of optimal policies and learning in our setting. While the general problems are computationally intractable, we study algorithmic solutions under a conditional independence assumption on the underlying state-observation distributions. We present an polynomial-time algorithm to compute the principal's optimal policy up to an additive approximation. Additionally, we show an efficient learning algorithm in the case where the transition probabilities are not known beforehand. The algorithm guarantees sublinear regret for both players.
Y. Chen, X. Deng, J. Gan, Y. Li.
Learning to manipulate a commitment optimizer.
It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.
J. Gan, M. Han, J. Wu, H. Xu.
Robust Stackelberg equilibrium. EC '23.
This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to-δ suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different -- in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.
J. Gan, B. Li, X. Wu.
Approximation algorithm for computing budget-feasible EF1 allocations. AAMAS '23.
We study algorithmic fairness in a budget-feasible resource allocation problem. In this problem, a set of items with varied sizes and values are to be allocated to a group of agents, while each agent has a budget constraint on the total size of items she can receive. An envy-free (EF) allocation is defined in this context as one in which no agent envies another for the items they get and, in addition, no agent envies the charity, who is automatically endowed with all the unallocated items. Since EF allocations barely exist even without budget constraints, we are interested in the relaxed notion of envy-freeness up to one item (EF1). In this paper, we further the recent progress towards understanding the existence and approximations of EF1 (or EF2) allocations. We propose a polynomial-time algorithm that computes a 1/2-approximate EF1 allocation for an arbitrary number of agents with heterogeneous budgets. For the uniform-budget and two-agent cases, we present a polynomial-time algorithm that computes an exact EF1 allocation. We also consider the large budget setting, where the item sizes are infinitesimal relative to the agents' budgets. We show that both the allocations that maximize the Nash social welfare and the allocations that our main algorithm computes are EF1 in the limit.
J. Gan, B. Li, Y. Li.
Your college dorm and dormmates: fair resource sharing with externalities. JAIR.
We study a fair resource sharing problem, where a set of resources are to be shared among a group of agents. Each agent demands one resource and each resource can serve a limited number of agents. An agent cares about what resource they get as well as the externalities imposed by their mates, who share the same resource with them. Clearly, the strong notion of envy-freeness, where no agent envies another for their resource or mates, cannot always be achieved and we show that even deciding the existence of such a strongly envy-free assignment is an intractable problem. Hence, a more interesting question is whether (and in what situations) a relaxed notion of envy-freeness, the Pareto envy-freeness, can be achieved. Under this relaxed notion, an agent envies another only when they envy both the resource and the mates of the other agent. In particular, we are interested in a dorm assignment problem, where students are to be assigned to dorms with the same capacity and they have dichotomous preference over their dormmates. We show that when the capacity of each dorm is 2, a Pareto envy-free assignment always exists and we present a polynomial-time algorithm to compute such an assignment. Nevertheless, the result breaks immediately when the capacity increases to 3, in which case even Pareto envy-freeness cannot be guaranteed. In addition to the existential results, we also investigate the utility guarantees of (Pareto) envy-free assignments in our model.
J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic.
Markov decision processes with time-varying geometric discounting. AAAI '23.
Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective—whereby each time step is treated as an independent decision maker with their own (fixed) discount factor—and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of ϵ-SPE and show that an ϵ-SPE exists under milder assumptions. An algorithm is presented to compute an ϵ-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Envy-free policy teaching to multiple agents. NeurIPS '22.
We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without significant computational or PoF burdens.
J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence.
Real-world security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multi-defender security games and explore the possibility of improving efficiency by coordinating the defenders — specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.
(Preliminary versions appeared in AAMAS '18 and '20.)
J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Bayesian persuasion in sequential decision-making. AAAI '22
(✧ Outstanding Paper Honorable Mention @ AAAI '22)
We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.
A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. Artificial Intelligence.
We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
(Preliminary versions appeared in IJCAI '19 and AAAI '20.)
G. Birmpas, J. Gan, A. Hollender, F. Marmolejo-Cossío, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games. Journal of AI Research (JAIR).
Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
(A preliminary version appeared in NeurIPS '20.)
E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. Artificial Intelligence.
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
(A preliminary version appeared in IJCAI '19.)
J. Gan, Q. Guo, L. Tran-Thanh, B. An, M. Wooldridge.
Manipulating a learning defender and ways to counteract. NeurIPS '19.
In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.
J. Gan, H. Xu, Q. Guo, L. Tran-Thanh, Z. Rabinovich, M. Wooldridge.
Imitative follower deception in Stackelberg games. EC '19.
Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.
AAAI 23/22/21/20, ICLR 23/22,
SAGT 22, ICML 22/21, IJCAI 22/21/20, AAMAS 22/20, NeurIPS 22/21/20,
- EC 22, AAMAS 20, ECAI 20, AAAI 19, IJCAI 19, MFCS 19, GameSec 19.
- Artificial Intelligence (AIJ), Theoretical Computer Science (TCS), IEEE Trans. on AI (TAI), IEEE Trans. on Information Forensics and Security (TIFS), Autonomous Agents and Multi-agent Systems (JAAMAS).
Department of Computer Science
University of Oxford
Parks Rd, Oxford OX1 3QD, UK
© Jiarui Gan 2022 | Powered by Skeleton