 preprints

J. Gan, R. Majumdar, D. Mandal, G. Radanovic.
Stochastic principalagent problems: efficient computation and learning.
[arXiv]
We introduce a stochastic principalagent model. A principal and an agent
interact in a stochastic environment, each privy to observations about the
state not available to the other. The principal has the power of commitment,
both to elicit information from the agent and to provide signals about her own
information. The players communicate with each other and then select actions
independently. Each of them receives a payoff based on the state and their
joint action, and the environment transitions to a new state. The interaction
continues over a finite time horizon. Both players are farsighted, aiming to
maximize their total payoffs over the time horizon. The model encompasses as
special cases extensiveform games (EFGs) and stochastic games of incomplete
information, partially observable Markov decision processes (POMDPs), as well
as other forms of sequential principalagent interactions, including Bayesian
persuasion and automated mechanism design problems.
We consider both the computation and learning of the principal's optimal
policy. Since the general problem, which subsumes POMDPs, is intractable, we
explore algorithmic solutions under hindsight observability, where the state
and the interaction history are revealed at the end of each step. Though the
problem becomes more amenable under this condition, the number of possible
histories remains exponential in the length of the time horizon, making
approaches for EFGbased models infeasible. We present an efficient algorithm
based on the inducible value sets. The algorithm computes an
$\epsilon$approximate optimal policy in time polynomial in $1/\epsilon$.
Additionally, we show an efficient learning algorithm for an episodic
reinforcement learning setting where the transition probabilities are unknown.
The algorithm guarantees sublinear regret $\tilde{O}(T^{2/3})$ for both players
over $T$ episodes.

J. Gan, A. Ghosh, N. Teh.
Persuading a credible agent.
[arXiv]
How to optimally persuade an agent who has a private type? When elicitation is feasible, this amounts to a fairly standard principalagentstyle mechanism design problem, where the persuader employs a mechanism to first elicit the agent's type and then plays the corresponding persuasion strategy based on the agent's report. The optimal mechanism design problem in this setting is relatively wellunderstood in the literature, with incentive compatible (IC) mechanisms known to be optimal and computationally tractable. In this paper, we study this problem given a credible agent, i.e., if the agent claims they are of a certain type in response to the mechanism's elicitation, then they will act optimally with respect to the claimed type, even if they are actually not of that type.
We present several interesting findings in this new setting that differ significantly from results in the noncredible setting. In terms of the structure of optimal mechanisms, we show that not only may IC mechanisms fail to be optimal, but all mechanisms following the standard 'elicitingthenpersuading' mechanism design structure may be suboptimal. To achieve optimality requires two additional instruments  presignaling and nonbinding elicitation  which naturally result in multistage mechanisms. We characterize optimal mechanisms under these design choices. Based on our characterization, we provide a polynomialtime algorithm for computing optimal multistage mechanisms. We also discover that in scenarios that allow for it, partial information elicitation can be employed to improve the principal's payoff even further. Though, surprisingly, an unbounded number of rounds of information exchange between the principal and the agent may be necessary to achieve optimality.

Y. Chen, X. Deng, J. Gan, Y. Li.
Learning to manipulate a commitment optimizer.
[arXiv]
It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true bestresponse behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their firstmover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived bestresponse behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.

D. Mandal, J. Gan.
Socially fair reinforcement learning.
[arXiv]
We consider the problem of episodic reinforcement learning where there are multiple stakeholders with different reward functions. Our goal is to output a policy that is socially fair with respect to different reward functions. Prior works have proposed different objectives that a fair policy must optimize including minimum welfare, and generalized Gini welfare. We first take an axiomatic view of the problem, and propose four axioms that any such fair objective must satisfy. We show that the Nash social welfare is the unique objective that uniquely satisfies all four objectives, whereas prior objectives fail to satisfy all four axioms. We then consider the learning version of the problem where the underlying model i.e. Markov decision process is unknown. We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives  minimum welfare, generalized Gini welfare, and Nash social welfare. Based on optimistic planning, we propose a generic learning algorithm and derive its regret bound with respect to the three different policies. For the objective of Nash social welfare, we also derive a lower bound in regret that grows exponentially with n, the number of agents. Finally, we show that for the objective of minimum welfare, one can improve regret by a factor of $O(H)$ for a weaker notion of regret.
 2024

J. Gan, M. Han, J. Wu, H. Xu.
Generalized principalagency: contracts, information, games and beyond. WINE '24.
[arXiv]
In the principalagent problem formulated by Myerson'82, agents have private information (type) and make private decisions (action), both of which are unobservable to the principal. Myerson pointed out an elegant linear programming solution that relies on the revelation principle. This paper extends Myerson's results to a more general setting where the principal's action space can be infinite and subject to additional design constraints. Our generalized principalagent model unifies several important design problems including contract design, information design, and Bayesian Stackelberg games, and encompasses them as special cases. We first extend the revelation principle to this general model, based on which a polynomialtime algorithm is then derived for computing the optimal mechanism for the principal. This algorithm not only implies new efficient solutions simultaneously for all the aforementioned special cases but also significantly simplifies previously known algorithms designed for special cases. Inspired by the recent interest in the algorithmic design of a single contract and menu of contracts, we study such constrained design problems to our general principalagent model. In contrast to the above unification, our results here illustrate the other facet of diversity among different principalagent design problems and demonstrate how their different structures can lead to different complexities: some are tractable whereas others are APXhard. Finally, we reveal an interesting connection of our model to the problem of information acquisition for decision making and study its algorithmic properties in general.

H. Aziz, J. Gan, G. Lisowski, A. Pourmiri.
The team order problem: maximizing the probability of matching being large enough. SAGT '24.
[link]
We consider a matching problem, which is meaningful in team competitions, as well as in information theory, recommender systems, and assignment problems. In the competitions which we study, each competitor in a team order plays a match with the corresponding opposing player. The team that wins more matches wins. We consider a problem where the input is the graph of probabilities that a team 1 player can win against the team 2 player, and the output is the optimal ordering of team 1 players given the fixed ordering of team 2. Our central result is a polynomialtime approximation scheme (PTAS) to compute a matching whose winning probability is at most $\epsilon$ less than the winning probability of the optimal matching. We also provide tractability results for several special cases of the problem, as well as an analytical bound on how far the winning probability of a maximum weight matching of the underlying graph is from the best achievable winning probability.
 2023

J. Gan, M. Han, J. Wu, H. Xu.
Robust Stackelberg equilibrium. EC '23.
[link 
arXiv]
This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible uptoÎ´ suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different  in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties.
We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is welldefined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NPhard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasipolynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.

W. Lee, D. Hyland, A. Abate, E. Elkind, J. Gan, J. Gutierrez, P. Harrenstein, M. Wooldridge
kPrize weighted voting game. AAMAS '23.
[arXiv 
link]
We introduce a natural variant of weighted voting games, which we refer to as kPrize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the ith largest coalition (by the sum of weights of its members) wins the ith largest prize, which is then shared among its members. We present four solution concepts to analyse the games in this class, and characterise the existence of stable outcomes in games with three players and two prizes, and in games with uniform prizes. We then explore the efficiency of stable outcomes in terms of Pareto optimality and utilitarian social welfare. Finally, we study the computational complexity of finding stable outcomes.

J. Gan, B. Li, X. Wu.
Approximation algorithm for computing budgetfeasible EF1 allocations. AAMAS '23.
[arXiv 
link]
We study algorithmic fairness in a budgetfeasible resource allocation problem. In this problem, a set of items with varied sizes and values are to be allocated to a group of agents, while each agent has a budget constraint on the total size of items she can receive. An envyfree (EF) allocation is defined in this context as one in which no agent envies another for the items they get and, in addition, no agent envies the charity, who is automatically endowed with all the unallocated items. Since EF allocations barely exist even without budget constraints, we are interested in the relaxed notion of envyfreeness up to one item (EF1). In this paper, we further the recent progress towards understanding the existence and approximations of EF1 (or EF2) allocations. We propose a polynomialtime algorithm that computes a 1/2approximate EF1 allocation for an arbitrary number of agents with heterogeneous budgets. For the uniformbudget and twoagent cases, we present a polynomialtime algorithm that computes an exact EF1 allocation. We also consider the large budget setting, where the item sizes are infinitesimal relative to the agents' budgets. We show that both the allocations that maximize the Nash social welfare and the allocations that our main algorithm computes are EF1 in the limit.

J. Gan, B. Li, Y. Li.
Your college dorm and dormmates: fair resource sharing with externalities. JAIR.
[arXiv 
link]
We study a fair resource sharing problem, where a set of resources are to be shared among a group of agents. Each agent demands one resource and each resource can serve a limited number of agents. An agent cares about what resource they get as well as the externalities imposed by their mates, who share the same resource with them. Clearly, the strong notion of envyfreeness, where no agent envies another for their resource or mates, cannot always be achieved and we show that even deciding the existence of such a strongly envyfree assignment is an intractable problem. Hence, a more interesting question is whether (and in what situations) a relaxed notion of envyfreeness, the Pareto envyfreeness, can be achieved. Under this relaxed notion, an agent envies another only when they envy both the resource and the mates of the other agent. In particular, we are interested in a dorm assignment problem, where students are to be assigned to dorms with the same capacity and they have dichotomous preference over their dormmates. We show that when the capacity of each dorm is 2, a Pareto envyfree assignment always exists and we present a polynomialtime algorithm to compute such an assignment. Nevertheless, the result breaks immediately when the capacity increases to 3, in which case even Pareto envyfreeness cannot be guaranteed. In addition to the existential results, we also investigate the utility guarantees of (Pareto) envyfree assignments in our model.

J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic.
Markov decision processes with timevarying geometric discounting. AAAI '23.
[link 
pdf]
Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling timevarying discounting in certain applications. This paper studies a model of infinitehorizon MDPs with timevarying discount factors. We take a gametheoretic perspectiveâ€”whereby each time step is treated as an independent decision maker with their own (fixed) discount factorâ€”and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIMEhardness of computing an SPE. We also turn to the approximate notion of ĎµSPE and show that an ĎµSPE exists under milder assumptions. An algorithm is presented to compute an ĎµSPE, of which an upper bound of the time complexity, as a function of the convergence property of the timevarying discount factor, is provided.

D. Mandal, G. Radanovic, J. Gan, A. Singla, R. Majumdar.
Online reinforcement learning with uncertain episode lengths. AAAI '23.
[link]
Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is drawn from a distribution. We first establish that this problem is equivalent to online reinforcement learning with general discounting where the learner is trying to optimize the expected discounted sum of rewards over an infinite horizon, but where the discounting function is not necessarily geometric. We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths. We then design a reinforcement learning algorithm that minimizes regret with general discounting but acts for the setting with uncertain episode lengths. We instantiate our general bound for different types of discounting, including geometric and polynomial discounting. We also show that we can obtain similar regret bounds even when the uncertainty over the episode lengths is unknown, by estimating the unknown distribution over time. Finally, we compare our learning algorithms with existing valueiteration based episodic RL algorithms on a gridworld environment.
 2022

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Envyfree policy teaching to multiple agents. NeurIPS '22.
[pdf 
link]
We study envyfree policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envyfreeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of costminimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a costminimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multiagent teaching without significant computational or PoF burdens.

J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence.
[link]
Realworld security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multidefender security games and explore the possibility of improving efficiency by coordinating the defenders â€” specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.
(Preliminary versions appeared in AAMAS '18 and '20.)

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Sequential decision making with information asymmetry. CONCUR '22 (invited talk)
[link]
We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decisionmakers. We consider two versions of the problem: persuasion and mechanism design. In persuasion, a moreinformed principal influences the actions of a lessinformed agent by signaling information. In mechanism design, a lessinformed principal incentivizes a moreinformed agent to reveal information by committing to a mechanism, so that the principal can make more informed decisions. We define Markov persuasion processes and Markov mechanism processes that model persuasion and mechanism design into dynamic models. Then we survey results on optimal persuasion and optimal mechanism design on myopic and farsighted agents. These problems are solvable in polynomial time for myopic agents but hard for farsighted agents.

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Bayesian persuasion in sequential decisionmaking. AAAI '22
[arXiv 
link]
(đź’Ą Outstanding Paper Honorable Mention @ AAAI '22)
We study a dynamic model of Bayesian persuasion in sequential decisionmaking settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional realtime information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NPhard to approximate an optimal policy against a farsighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threatbased strategy. This strategy guarantees the principal's payoff as if playing against an agent who is farsighted but myopic to future signals.

K. Banihashem, A. Singla, J. Gan, G. Radanovic.
Admissible policy teaching through reward design. AAAI '22
[arXiv 
link]
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function costefficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NPhard to find an approximately optimal reward modification. We then proceed by formulating a surrogate problem whose optimal solution approximates the optimal solution to the reward design problem in our setting, but is more amenable to optimization techniques and analysis. For this surrogate problem, we present characterization results that provide bounds on the value of the optimal solution. Finally, we design a local search algorithm to solve the surrogate problem and showcase its utility using simulationbased experiments.
 2021

A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. Artificial Intelligence.
[link]
We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
(Preliminary versions appeared in IJCAI '19 and AAAI '20.)

G. Birmpas, J. Gan, A. Hollender, F. MarmolejoCossĂo, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games.
Journal of AI Research (JAIR).
[pdf]
Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the followerâ€™s deception and the leaderâ€™s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
(A preliminary version appeared in NeurIPS '20.)

W. Xiaowei, B. Li, J. Gan.
Budgetfeasible maximum Nash social welfare allocation is almost envyfree. IJCAI '21.
[pdf 
arXiv]
The Nash social welfare (NSW) is a wellknown social welfare measurement that balances individual utilities and the overall efficiency. In the context of fair allocation of indivisible goods, it has been shown by Caragiannis et al. (EC 2016 and TEAC 2019) that an allocation maximizing the NSW is envyfree up to one good (EF1). In this paper, we are interested in the fairness of the NSW in a budgetfeasible allocation problem, in which each item has a cost that will be incurred to the agent it is allocated to, and each agent has a budget constraint on the total cost of items she receives. We show that a budgetfeasible allocation that maximizes the NSW achieves a 1/4approximation of EF1 and the approximation ratio is tight. The approximation ratio improves gracefully when the items have small costs compared with the agents' budgets; it converges to 1/2 when the budgetcost ratio approaches infinity.

D. Mutzari, J. Gan, S. Kraus.
Coalition formation in multidefender security games. AAAI '21.
[pdf]
We study Stackelberg security game (SSG) with multiple defenders, where heterogeneous defenders need to allocate security resources to protect a set of targets against a strategic attacker. In such games, coordination and cooperation between the defenders can increase their ability to protect their assets, but the heterogeneous preferences of the selfinterested defenders often make such cooperation very difficult. In this paper, we approach the problem from the perspective of cooperative game theory and study coalition formation among the defenders. Our main contribution is a number of algorithmic results for the computation problems that arise in this model. We provide a polytime algorithm for computing a solution in the {\em core} of the game and show that all of the elements in the core are Pareto efficient. We show that the problem of computing the entire core is NPhard and then delve into a special setting where the size of a coalition is limited up to some threshold. We analyse the parameterized complexity of deciding if a coalition structure is in the core under this special setting, and provide a polytime algorithm for computing successful deviation strategies for a given coalition.

E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. Artificial Intelligence.
[link]
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a twostage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
(A preliminary version appeared in IJCAI '19.)
 2020

G. Birmpas, J. Gan, A. Hollender, F. MarmolejoCossĂo, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games.
NeurIPS '20.
[pdf]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near)optimal payoffs for various scenarios about the learning interaction between leader and follower.

J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Mechanism design for defense coordination in security games. AAMAS '20.
[pdf]
Recent work studied Stackelberg security games with multiple defenders, in which heterogeneous defenders allocate security resources to protect a set of targets against a strategic attacker. Equilibrium analysis was conducted to characterize outcomes of these games when defenders act independently. Our starting point is the observation that the use of resources in equilibria may be inefficient due to lack of coordination. We explore the possibility of reducing this inefficiency by coordinating the defendersâ€”specifically, by pooling the defendersâ€™ resources and allocating them jointly. The defendersâ€™ heterogeneous preferences then give rise to a collective decisionmaking problem, which calls for a mechanism to generate joint allocation strategies. We seek a mechanism that encourages coordination, produces efficiency gains, and incentivizes the defenders to report their true preferences and to execute the recommended strategies. Our results show that, unfortunately, even these basic properties clash with each other and no mechanism can achieve them simultaneously, which reveals the intrinsic difficulty of achieving meaningful defense coordination in security games. On the positive side, we put forward mechanisms that fulfill some of these properties and we identify special cases of our setting where more of these properties are compatible.

A. Agarwal, E. Elkind, J. Gan, A. Voudouris.
Swap stability in Schelling games on graphs. AAAI '20.
[pdf 
arXiv]
We study a recently introduced class of strategic games that is motivated by and generalizes Schelling's wellknown residential segregation model. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either occupies a node of the graph and never moves away or aims to maximize the fraction of her neighbors who are of her own type. We consider a variant of this model that we call swap Schelling games, where the number of agents is equal to the number of nodes of the graph, and agents may swap positions with other agents to increase their utility. We study the existence, computational complexity and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective.
 2019

J. Gan, Q. Guo, L. TranThanh, B. An, M. Wooldridge.
Manipulating a learning defender and ways to counteract. NeurIPS '19.
[pdf  arXiv  poster]
In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a gametheoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomialtime algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.

J. Gan, W. Suksompong, A. Voudouris.
Envyfreeness in house allocation problems. Mathematical Social Sciences.
[pdf  arXiv]
We consider the house allocation problem, where m houses are to be assigned to n agents so that each agent gets exactly one house. We present a polynomialtime algorithm that determines whether an envyfree assignment exists, and if so, computes one such assignment. We also show that an envyfree assignment exists with high probability if the number of houses exceeds the number of agents by a logarithmic factor.

J. Gan, H. Xu, Q. Guo, L. TranThanh, Z. Rabinovich, M. Wooldridge.
Imitative follower deception in Stackelberg games. EC '19.
[pdf  arXiv  poster]
Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.

E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. IJCAI '19.
[pdf 
arXiv]
We consider strategic games that are inspired by Schelling's model of residential segregation. In our model, the agents are partitioned into k types and need to select locations on an undirected graph. Agents can be either stubborn, in which case they will always choose their preferred location, or strategic, in which case they aim to maximize the fraction of agents of their own type in their neighborhood. We investigate the existence of equilibria in these games, study the complexity of finding an equilibrium outcome or an outcome with high social welfare, and also provide upper and lower bounds on the price of anarchy and stability. Some of our results extend to the setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.

E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. IJCAI '19.
[pdf  arXiv  poster]
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a twostage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.

Q. Guo, J. Gan, F. Fang, L. TranThanh, M. Tambe, B. An.
On the inducibility of Stackelberg equilibrium for security games. AAAI '19.
[pdf 
arXiv]
Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader's payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiments unveil the empirical advantage of the ISE over the SSE.
 2018

J. Gan, E. Elkind, M. Wooldridge.
Stackelberg security games with multiple uncoordinated defenders. AAMAS '18.
[pdf]
Stackelberg security games have received much attention in recent years. While most existing work focuses on singledefender settings, there are many realworld scenarios that involve multiple defenders (e.g., multinational anticrime actions in international waters, different security agencies patrolling the same area). In this paper, we consider security games with uncoordinated defenders who jointly protect a set of targets, but may have different valuations for these targets; each defender schedules their own resources and selfishly optimizes their own utility. We generalize the standard (singledefender) model of Stackelberg security games to this setting and formulate an equilibrium concept that captures the nature of strategic interaction among the players. We argue that an exact equilibrium may fail to exist, and, in fact, deciding whether it exists is NPhard. However, under mild assumptions, every multidefender security game admits an Ďµequilibrium for every Ďµ > 0, and the limit points corresponding to Ďµ â†’ 0 can be efficiently approximated.

Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
Optimal electric vehicle fast charging station placement based on game theoretical framework. IEEE Trans. Intelligent Transportation Systems.
[pdf]
To reduce the air pollution and improve the energy efficiency, many countries and cities (e.g., Singapore) are on the way of introducing electric vehicles (EVs) to replace the vehicles serving in current traffic system. Effective placement of charging stations is essential for the rapid development of EVs, because it is necessary for providing convenience for EVs and ensuring the efficiency of the traffic network. However, existing works mostly concentrate on the mileage anxiety from EV users but ignore their strategic and competitive charging behaviors. To capture the competitive and strategic charging behaviors of the EV users, we consider that an EV user's charging cost, which is dependent on other EV users' choices, consists of the travel cost to access the charging station and the queuing cost in charging stations. First, we formulate the Charging Station Placement Problem (CSPP) as a bilevel optimization problem. Then, by exploiting the equilibrium of the EV charging game, we convert the bilevel optimization problem to a singlelevel one, following which we analyze the properties of CSPP and propose an algorithm Optimizing eleCtric vEhicle chArging statioN (OCEAN) to compute the optimal allocation of charging stations. Due to OCEAN's scalability issue, we furthermore present a heuristic algorithm OCEAN with Continuous variables to deal with largescale realworld problems. Finally, we demonstrate and discuss the results of the extensive experiments we did. It is shown that our approach outperform baseline methods significantly.
 2017

J. Gan, B. An, Y. Vorobeychik, B. Gauch.
Security games on a plane. AAAI '17.
[pdf]
Most existing models of Stackelberg security games ignore the underlying topology of the space in which targets and defence resources are located. As a result, allocation of resources is restricted to a discrete collection of exogenously defined targets. However, in many practical security settings, defense resources can be located on a continuous plane. Better defense solutions could therefore be potentially achieved by placing resources in a space outside of actual targets (e.g., between targets). To address this limitation, we propose a model called Security Game on a Plane (SGP) in which targets are distributed on a 2dimensional plane, and security resources, to be allocated on the same plane, protect targets within a certain effective distance. We investigate the algorithmic aspects of SGP. We find that computing a strong Stackelberg equilibrium of an SGP is NPhard even for zerosum games, and these are inapproximable in general. On the positive side, we find an exact solution technique for general SGPs based on an existing approach, and develop a PTAS (polynomialtime approximation scheme) for zerosum SGP to more fundamentally overcome the computational obstacle. Our experiments demonstrate the value of considering SGP and effectiveness of our algorithms.

Y. Zhang, B. An, L. TranThanh, Z. Wang, J. Gan, N. Jennings.
Optimal escape interdiction on transportation networks. IJCAI '17.
[pdf]
Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to timedependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with timedependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defenderattacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixedinteger linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realisticsized transportation networks with hundreds of intersections.

J. Gan, B. An.
Gametheoretic considerations for optimizing taxi system efficiency. IEEE Intelligent Systems.
[pdf]
Taxi service is an indispensable part of public transport in modern cities. To support its unique features, a taxi system adopts a decentralized operation mode in which thousands of taxis freely decide their working schedules and routes. Taxis compete with each other for individual profits regardless of systemlevel efficiency, making the taxi system inefficient and hard to optimize. Most research into the management and economics of taxi markets has focused on modeling from a macro level the effects of and relationships between various market factors. Less has been done regarding a more important componentdrivers' strategic behavior under the decentralized operation mode. The authors propose looking at the problem from a gametheoretic perspective. Combining gametheoretic solution concepts with existing models of taxi markets, they model taxi drivers' strategymaking process as a game and transform the problem of optimizing taxi system efficiency into finding a market policy that leads to the desired equilibrium.
 2016

Q. Guo, B. An, Y. Vorobeychik, L. TranThanh, J. Gan, C. Miao.
Coalitional security games. AAMAS '16.
[pdf]
Game theoretic models of security, and associated computational methods, have emerged as critical components of security posture across a broad array of domains, including airport security and coast guard. These approaches consider terrorists as motivated but independent entities. There is, however, increasing evidence that attackers, be it terrorists or cyber attackers, communicate extensively and form coalitions that can dramatically increase their ability to achieve malicious goals. To date, such cooperative decision making among attackers has been ignored in the security games literature. To address the issue of cooperation among attackers, we introduce a novel coalitional security game (CSG) model. A CSG consists of a set of attackers connected by a (communication or trust) network who can form coalitions as connected subgraphs of this network so as to attack a collection of targets. A defender in a CSG can delete a set of edges, incurring a cost for deleting each edge, with the goal of optimally limiting the attackersâ€™ ability to form effective coalitions (in terms of successfully attacking high value targets). We first show that a CSG is, in general, hard to approximate. Nevertheless, we develop a novel branch and price algorithm, leveraging a combination of column generation, relaxation, greedy approximation, and stabilization methods to enable scalable highquality approximations of CSG solutions on realistic problem instances.

Y. Xiong, J. Gan, B. An, C. Miao.
Optimal pricing for efficient electric vehicle charging station management.
AAMAS '16.
[pdf]
The rapid development of Electric Vehicles (EVs) seen in recent years has been drawing increasing attentions from the public, markets, decisionmakers, and academia. Notwithstanding the progress, issues still remain. Because of the widely complained disadvantages of limited battery capacity and long charging time, charging convenience has become a top concern that greatly hinders the adoption of EVs. Specialized EV charging station, which provides more than 10 times faster charging speed than domestic charging, is therefore a critical element for successful EV promotion. While most existing researches focus on optimizing spatial placement of charging stations, they are inflexible and inefficient against rapidly changing urban structure and traffic pattern. Therefore, this paper approaches the management of EV charging stations from the pricing perspective as a more flexible and adaptive complement to established charging station placement. In this paper, we build a realistic pricing model in consideration of residential travel pattern and EV driversâ€™ selfinterested charging behavior, traffic congestion, and operating expense of charging stations. We formulate the pricing problem as a mixed integer nonconvex optimization problem, and propose a scalable algorithm to solve it. Experiments on both mock and real data are also conducted, which show scalability of our algorithm as well as our solutionâ€™s significant improvement over existing approaches.
 2015

Y. Yin, H. Xu, J. Gan, B. An, A. Jiang.
Computing optimal mixed strategies for security games with dynamic payoffs. IJCAI '15.
[pdf]
Security agencies in the real world often need to protect targets with timedependent values, e.g., tourist sites where the number of travelers changes over time. Since the values of different targets often change asynchronously, the defender can relocate security resources among targets dynamically to make the best use of limited resources. We propose a gametheoretic scheme to develop dynamic, randomized security strategies in consideration of adversaryâ€™s surveillance capability. This differs from previous studies on security games by considering varying target values and continuous strategy spaces of the security agency and the adversary. The main challenge lies in the computational intensiveness due to the continuous, hence infinite strategy spaces. We propose an optimal algorithm and an arbitrarily nearoptimal algorithm to compute security strategies under different conditions. Experimental results show that both algorithms significantly outperform existing approaches.

Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
Optimal electric vehicle charging station placement. IJCAI '15.
[pdf]
Many countries like Singapore are planning to introduce Electric Vehicles (EVs) to replace traditional vehicles to reduce air pollution and improve energy efficiency. The rapid development of EVs calls for efficient deployment of charging stations both for the convenience of EVs and maintaining the efficiency of the road network. Unfortunately, existing work makes unrealistic assumption on EV driversâ€™ charging behaviors and focus on the limited mobility of EVs. This paper studies the Charging Station PLacement (CSPL) problem, and takes into consideration 1) EV driversâ€™ strategic behaviors to minimize their charging cost, and 2) the mutual impact of EV driversâ€™ strategies on the traffic conditions of the road network and service quality of charging stations. We first formulate the CSPL problem as a bilevel optimization problem, which is subsequently converted to a singlelevel optimization problem by exploiting structures of the EV charging game. Properties of CSPL problem are analyzed and an algorithm called OCEAN is proposed to compute the optimal allocation of charging stations. We further propose a heuristic algorithm OCEANC to speed up OCEAN. Experimental results show that the proposed algorithms significantly outperform baseline methods.

J. Gan, B. An, C. Miao.
Optimizing efficiency of taxi systems: scalingup and handling arbitrary constraints. AAMAS '15.
[pdf]
Taxi service is an indispensable part of public transport in modern cities. However, due to its decentralized operation mode, taxi services in many cities are inefficient. Besides, the decentralized nature also poses significant challenges to analyzing and regulating taxi services. State of the art computational methods for optimizing taxi market efficiency suffer from two important limitations: 1) they cannot be scaled up efficiently; and 2) they cannot address complex realworld market situations where additional scheduling constraints need to be handled. In this paper, we propose two novel algorithmsâ€”FLORA and FLORAAâ€”to address the inadequacies. Using convex polytope representation techniques, FLORA provides a fully compact representation for taxi driversâ€™ strategy space and scales up more efficiently than existing algorithms. FLORAA avoids enumerating the entire exponentially large pure strategy space by gradually expanding the strategy space. It is the first known method capable of handling arbitrary scheduling constraints for optimizing taxi system efficiency. Experimental results show orders of magnitude improvement in speed FLORA provides, and the necessity of using FLORAA as suggested by changes in the taxi driversâ€™ operation strategy under different market conditions.

J. Gan, B. An, Y. Vorobeychik.
Security games with protection externality.
AAAI'15.
[pdf  appendix]
Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important realworld security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other â€śneighbouringâ€ť targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NPhard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approachâ€”CLASPEâ€”to solve SPEs. CLASPE features the following novelties: 1) a novel mixedinteger linear programming formulation for the slave problem; 2) an extended greedy approach with a constantfactor approximation ratio to speed up the slave problem; and 3) a linearscale linear programming that efficiently calculates the upper bounds of targetdefined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realisticsized SPE problem instances.
 2013

J. Gan, B. An, H. Wang, X. Sun, Z. Shi.
Optimal pricing for improving efficiency of taxi systems.
IJCAI â€™13.
[pdf]
In Beijing, most taxi drivers intentionally avoid working during peak hours despite of the huge customer demand within these peak periods. This dilemma is mainly due to the fact that taxi drivers' congestion costs are not reflected in the current taxi fare structure. To resolve this problem, we propose a new pricing scheme to provide taxi drivers with extra incentives to work during peak hours. This differs from previous studies of taxi market by considering market variance over multiple periods, taxi drivers' profitdriven decisions, and their scheduling constraints regarding the interdependence among different periods. The major challenge of this research is the computational intensiveness to identify optimal strategy due to the exponentially large size of a taxi driver's strategy space and the scheduling constraints. We develop an atom schedule method to overcome these issues. It reduces the magnitude of the problem while satisfying the constraints to filter out infeasible pure strategies. Simulation results based on real data show the effectiveness of the proposed methods, which opens up a new door to improving the efficiency of taxi market in megacities (e.g., Beijing).