 2023

J. Gan, A. Hennes, R. Majumdar, D. Mandal, G. Radanovic.
Markov decision processes with timevarying geometric discounting. AAAI '23.
[pdf]
We study envyfree policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envyfreeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of costminimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a costminimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multiagent teaching without significant computational or PoF burdens.

D. Mandal, G. Radanovic, J. Gan, A. Singla, R. Majumdar.
Online reinforcement learning with uncertain episode lengths. AAAI '23.
Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is drawn from a distribution. We first establish that this problem is equivalent to online reinforcement learning with general discounting where the learner is trying to optimize the expected discounted sum of rewards over an infinite horizon, but where the discounting function is not necessarily geometric. We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths. We then design a reinforcement learning algorithm that minimizes regret with general discounting but acts for the setting with uncertain episode lengths. We instantiate our general bound for different types of discounting, including geometric and polynomial discounting. We also show that we can obtain similar regret bounds even when the uncertainty over the episode lengths is unknown, by estimating the unknown distribution over time. Finally, we compare our learning algorithms with existing valueiteration based episodic RL algorithms on a gridworld environment.
 2022

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Envyfree policy teaching to multiple agents. NeurIPS '22.
[pdf 
link]
We study envyfree policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents' reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modification programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envyfreeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of costminimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modifications, but otherwise always exists. 2) Computing a costminimizing EF solution can be formulated as convex optimization and hence solved efficiently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multiagent teaching without significant computational or PoF burdens.

J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Defense coordination in security games: equilibrium analysis and mechanism design. Artificial Intelligence.
[link]
Realworld security scenarios sometimes involve multiple defenders: security agencies of two or more countries might patrol the same border areas, and domestic security agencies might also operate in the same locations when their areas of jurisdiction overlap. Motivated by these scenarios and the observation that uncoordinated movements of the defenders may lead to an inefficient defense, we introduce a model of multidefender security games and explore the possibility of improving efficiency by coordinating the defenders — specifically, by pooling the defenders' resources and allocating them jointly. The model generalizes the standard model of Stackelberg security games, where a defender (now a group of defenders) allocates security resources to protect a set of targets, and an attacker picks the best target to attack. In particular, we are interested in the situation with heterogeneous defenders, who may value the same target differently. Our task is twofold. First, we need to develop a good understanding of the uncoordinated situation, as the baseline to be improved. To this end we formulate a new equilibrium concept, and prove that an equilibrium under this concept always exists and can be computed efficiently. Second, to coordinate the heterogeneous defenders we take a mechanism design perspective and aim to find a mechanism to generate joint resource allocation strategies. We seek a mechanism that improves the defenders' utilities upon the uncoordinated baseline, achieves Pareto efficiency, and incentivizes the defenders to report their true incentives and execute the recommended strategies. Our analysis establishes several impossibility results, indicating the intrinsic difficulties of defense coordination. Specifically, we show that even the basic properties listed above are in conflict with each other: no mechanism can simultaneously satisfy them all, or even some proper subsets of them. In terms of positive results, we present mechanisms that satisfy all combinations of the properties that are not ruled out by our impossibility results, thereby providing a comprehensive profile of the mechanism design problem with respect to the properties considered.
(Preliminary versions appeared in AAMAS '18 and '20.)

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Sequential decision making with information asymmetry. CONCUR '22 (invited talk)
[link]
We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decisionmakers. We consider two versions of the problem: persuasion and mechanism design. In persuasion, a moreinformed principal influences the actions of a lessinformed agent by signaling information. In mechanism design, a lessinformed principal incentivizes a moreinformed agent to reveal information by committing to a mechanism, so that the principal can make more informed decisions. We define Markov persuasion processes and Markov mechanism processes that model persuasion and mechanism design into dynamic models. Then we survey results on optimal persuasion and optimal mechanism design on myopic and farsighted agents. These problems are solvable in polynomial time for myopic agents but hard for farsighted agents.

J. Gan, R. Majumdar, G. Radanovic, A. Singla.
Bayesian persuasion in sequential decisionmaking. AAAI '22
[arXiv 
link]
(✧ Outstanding Paper Honorable Mention @ AAAI '22)
We study a dynamic model of Bayesian persuasion in sequential decisionmaking settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advise the user (the agent) on possible choices between actions based on additional realtime information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NPhard to approximate an optimal policy against a farsighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threatbased strategy. This strategy guarantees the principal's payoff as if playing against an agent who is farsighted but myopic to future signals.

K. Banihashem, A. Singla, J. Gan, G. Radanovic.
Admissible policy teaching through reward design. AAAI '22
[arXiv 
link]
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function costefficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NPhard to find an approximately optimal reward modification. We then proceed by formulating a surrogate problem whose optimal solution approximates the optimal solution to the reward design problem in our setting, but is more amenable to optimization techniques and analysis. For this surrogate problem, we present characterization results that provide bounds on the value of the optimal solution. Finally, we design a local search algorithm to solve the surrogate problem and showcase its utility using simulationbased experiments.
 2021

A. Agarwal, E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. Artificial Intelligence.
[link]
We study strategic games inspired by Schelling's seminal model of residential segregation. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either aims to maximize the fraction of her neighbors who are of her own type, or occupies a node of the graph and never moves away. We consider two natural variants of this model: in jump games agents can jump to empty nodes of the graph to increase their utility, while in swap games they can swap positions with other agents. We investigate the existence, computational complexity, and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective. Some of our results extend to a more general setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
(Preliminary versions appeared in IJCAI '19 and AAAI '20.)

G. Birmpas, J. Gan, A. Hollender, F. MarmolejoCossío, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games.
Journal of AI Research (JAIR).
[pdf]
Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
(A preliminary version appeared in NeurIPS '20.)

W. Xiaowei, B. Li, J. Gan.
Budgetfeasible maximum Nash social welfare allocation is almost envyfree. IJCAI '21.
[pdf 
arXiv]
The Nash social welfare (NSW) is a wellknown social welfare measurement that balances individual utilities and the overall efficiency. In the context of fair allocation of indivisible goods, it has been shown by Caragiannis et al. (EC 2016 and TEAC 2019) that an allocation maximizing the NSW is envyfree up to one good (EF1). In this paper, we are interested in the fairness of the NSW in a budgetfeasible allocation problem, in which each item has a cost that will be incurred to the agent it is allocated to, and each agent has a budget constraint on the total cost of items she receives. We show that a budgetfeasible allocation that maximizes the NSW achieves a 1/4approximation of EF1 and the approximation ratio is tight. The approximation ratio improves gracefully when the items have small costs compared with the agents' budgets; it converges to 1/2 when the budgetcost ratio approaches infinity.

D. Mutzari, J. Gan, S. Kraus.
Coalition formation in multidefender security games. AAAI '21.
[pdf]
We study Stackelberg security game (SSG) with multiple defenders, where heterogeneous defenders need to allocate security resources to protect a set of targets against a strategic attacker. In such games, coordination and cooperation between the defenders can increase their ability to protect their assets, but the heterogeneous preferences of the selfinterested defenders often make such cooperation very difficult. In this paper, we approach the problem from the perspective of cooperative game theory and study coalition formation among the defenders. Our main contribution is a number of algorithmic results for the computation problems that arise in this model. We provide a polytime algorithm for computing a solution in the {\em core} of the game and show that all of the elements in the core are Pareto efficient. We show that the problem of computing the entire core is NPhard and then delve into a special setting where the size of a coalition is limited up to some threshold. We analyse the parameterized complexity of deciding if a coalition structure is in the core under this special setting, and provide a polytime algorithm for computing successful deviation strategies for a given coalition.

E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. Artificial Intelligence.
[link]
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a twostage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
(A preliminary version appeared in IJCAI '19.)
 2020

G. Birmpas, J. Gan, A. Hollender, F. MarmolejoCossío, N. Rajgopal, A. Voudouris.
Optimally deceiving a learning leader in Stackelberg games.
NeurIPS '20.
[pdf]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near)optimal payoffs for various scenarios about the learning interaction between leader and follower.

J. Gan, E. Elkind, S. Kraus, M. Wooldridge.
Mechanism design for defense coordination in security games. AAMAS '20.
[pdf]
Recent work studied Stackelberg security games with multiple defenders, in which heterogeneous defenders allocate security resources to protect a set of targets against a strategic attacker. Equilibrium analysis was conducted to characterize outcomes of these games when defenders act independently. Our starting point is the observation that the use of resources in equilibria may be inefficient due to lack of coordination. We explore the possibility of reducing this inefficiency by coordinating the defenders—specifically, by pooling the defenders’ resources and allocating them jointly. The defenders’ heterogeneous preferences then give rise to a collective decisionmaking problem, which calls for a mechanism to generate joint allocation strategies. We seek a mechanism that encourages coordination, produces efficiency gains, and incentivizes the defenders to report their true preferences and to execute the recommended strategies. Our results show that, unfortunately, even these basic properties clash with each other and no mechanism can achieve them simultaneously, which reveals the intrinsic difficulty of achieving meaningful defense coordination in security games. On the positive side, we put forward mechanisms that fulfill some of these properties and we identify special cases of our setting where more of these properties are compatible.

A. Agarwal, E. Elkind, J. Gan, A. Voudouris.
Swap stability in Schelling games on graphs. AAAI '20.
[pdf 
arXiv]
We study a recently introduced class of strategic games that is motivated by and generalizes Schelling's wellknown residential segregation model. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either occupies a node of the graph and never moves away or aims to maximize the fraction of her neighbors who are of her own type. We consider a variant of this model that we call swap Schelling games, where the number of agents is equal to the number of nodes of the graph, and agents may swap positions with other agents to increase their utility. We study the existence, computational complexity and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective.
 2019

J. Gan, Q. Guo, L. TranThanh, B. An, M. Wooldridge.
Manipulating a learning defender and ways to counteract. NeurIPS '19.
[pdf  arXiv  poster]
In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a gametheoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomialtime algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.

J. Gan, W. Suksompong, A. Voudouris.
Envyfreeness in house allocation problems. Mathematical Social Sciences.
[pdf  arXiv]
We consider the house allocation problem, where m houses are to be assigned to n agents so that each agent gets exactly one house. We present a polynomialtime algorithm that determines whether an envyfree assignment exists, and if so, computes one such assignment. We also show that an envyfree assignment exists with high probability if the number of houses exceeds the number of agents by a logarithmic factor.

J. Gan, H. Xu, Q. Guo, L. TranThanh, Z. Rabinovich, M. Wooldridge.
Imitative follower deception in Stackelberg games. EC '19.
[pdf  arXiv  poster]
Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.

E. Elkind, J. Gan, A. Igarashi, W. Suksompong, A. Voudouris.
Schelling games on graphs. IJCAI '19.
[pdf 
arXiv]
We consider strategic games that are inspired by Schelling's model of residential segregation. In our model, the agents are partitioned into k types and need to select locations on an undirected graph. Agents can be either stubborn, in which case they will always choose their preferred location, or strategic, in which case they aim to maximize the fraction of agents of their own type in their neighborhood. We investigate the existence of equilibria in these games, study the complexity of finding an equilibrium outcome or an outcome with high social welfare, and also provide upper and lower bounds on the price of anarchy and stability. Some of our results extend to the setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.

E. Elkind, J. Gan, S. Obraztsova, Z. Rabinovich, A. Voudouris.
Protecting elections by recounting ballots. IJCAI '19.
[pdf  arXiv  poster]
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a twostage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.

Q. Guo, J. Gan, F. Fang, L. TranThanh, M. Tambe, B. An.
On the inducibility of Stackelberg equilibrium for security games. AAAI '19.
[pdf 
arXiv]
Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader's payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiments unveil the empirical advantage of the ISE over the SSE.
 2018

J. Gan, E. Elkind, M. Wooldridge.
Stackelberg security games with multiple uncoordinated defenders. AAMAS '18.
[pdf]
Stackelberg security games have received much attention in recent years. While most existing work focuses on singledefender settings, there are many realworld scenarios that involve multiple defenders (e.g., multinational anticrime actions in international waters, different security agencies patrolling the same area). In this paper, we consider security games with uncoordinated defenders who jointly protect a set of targets, but may have different valuations for these targets; each defender schedules their own resources and selfishly optimizes their own utility. We generalize the standard (singledefender) model of Stackelberg security games to this setting and formulate an equilibrium concept that captures the nature of strategic interaction among the players. We argue that an exact equilibrium may fail to exist, and, in fact, deciding whether it exists is NPhard. However, under mild assumptions, every multidefender security game admits an ϵequilibrium for every ϵ > 0, and the limit points corresponding to ϵ → 0 can be efficiently approximated.

Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
Optimal electric vehicle fast charging station placement based on game theoretical framework. IEEE Trans. Intelligent Transportation Systems.
[pdf]
To reduce the air pollution and improve the energy efficiency, many countries and cities (e.g., Singapore) are on the way of introducing electric vehicles (EVs) to replace the vehicles serving in current traffic system. Effective placement of charging stations is essential for the rapid development of EVs, because it is necessary for providing convenience for EVs and ensuring the efficiency of the traffic network. However, existing works mostly concentrate on the mileage anxiety from EV users but ignore their strategic and competitive charging behaviors. To capture the competitive and strategic charging behaviors of the EV users, we consider that an EV user's charging cost, which is dependent on other EV users' choices, consists of the travel cost to access the charging station and the queuing cost in charging stations. First, we formulate the Charging Station Placement Problem (CSPP) as a bilevel optimization problem. Then, by exploiting the equilibrium of the EV charging game, we convert the bilevel optimization problem to a singlelevel one, following which we analyze the properties of CSPP and propose an algorithm Optimizing eleCtric vEhicle chArging statioN (OCEAN) to compute the optimal allocation of charging stations. Due to OCEAN's scalability issue, we furthermore present a heuristic algorithm OCEAN with Continuous variables to deal with largescale realworld problems. Finally, we demonstrate and discuss the results of the extensive experiments we did. It is shown that our approach outperform baseline methods significantly.
 2017

J. Gan, B. An, Y. Vorobeychik, B. Gauch.
Security games on a plane. AAAI '17.
[pdf]
Most existing models of Stackelberg security games ignore the underlying topology of the space in which targets and defence resources are located. As a result, allocation of resources is restricted to a discrete collection of exogenously defined targets. However, in many practical security settings, defense resources can be located on a continuous plane. Better defense solutions could therefore be potentially achieved by placing resources in a space outside of actual targets (e.g., between targets). To address this limitation, we propose a model called Security Game on a Plane (SGP) in which targets are distributed on a 2dimensional plane, and security resources, to be allocated on the same plane, protect targets within a certain effective distance. We investigate the algorithmic aspects of SGP. We find that computing a strong Stackelberg equilibrium of an SGP is NPhard even for zerosum games, and these are inapproximable in general. On the positive side, we find an exact solution technique for general SGPs based on an existing approach, and develop a PTAS (polynomialtime approximation scheme) for zerosum SGP to more fundamentally overcome the computational obstacle. Our experiments demonstrate the value of considering SGP and effectiveness of our algorithms.

Y. Zhang, B. An, L. TranThanh, Z. Wang, J. Gan, N. Jennings.
Optimal escape interdiction on transportation networks. IJCAI '17.
[pdf]
Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to timedependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with timedependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defenderattacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixedinteger linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realisticsized transportation networks with hundreds of intersections.

J. Gan, B. An.
Gametheoretic considerations for optimizing taxi system efficiency. IEEE Intelligent Systems.
[pdf]
Taxi service is an indispensable part of public transport in modern cities. To support its unique features, a taxi system adopts a decentralized operation mode in which thousands of taxis freely decide their working schedules and routes. Taxis compete with each other for individual profits regardless of systemlevel efficiency, making the taxi system inefficient and hard to optimize. Most research into the management and economics of taxi markets has focused on modeling from a macro level the effects of and relationships between various market factors. Less has been done regarding a more important componentdrivers' strategic behavior under the decentralized operation mode. The authors propose looking at the problem from a gametheoretic perspective. Combining gametheoretic solution concepts with existing models of taxi markets, they model taxi drivers' strategymaking process as a game and transform the problem of optimizing taxi system efficiency into finding a market policy that leads to the desired equilibrium.
 2016

Q. Guo, B. An, Y. Vorobeychik, L. TranThanh, J. Gan, C. Miao.
Coalitional security games. AAMAS '16.
[pdf]
Game theoretic models of security, and associated computational methods, have emerged as critical components of security posture across a broad array of domains, including airport security and coast guard. These approaches consider terrorists as motivated but independent entities. There is, however, increasing evidence that attackers, be it terrorists or cyber attackers, communicate extensively and form coalitions that can dramatically increase their ability to achieve malicious goals. To date, such cooperative decision making among attackers has been ignored in the security games literature. To address the issue of cooperation among attackers, we introduce a novel coalitional security game (CSG) model. A CSG consists of a set of attackers connected by a (communication or trust) network who can form coalitions as connected subgraphs of this network so as to attack a collection of targets. A defender in a CSG can delete a set of edges, incurring a cost for deleting each edge, with the goal of optimally limiting the attackers’ ability to form effective coalitions (in terms of successfully attacking high value targets). We first show that a CSG is, in general, hard to approximate. Nevertheless, we develop a novel branch and price algorithm, leveraging a combination of column generation, relaxation, greedy approximation, and stabilization methods to enable scalable highquality approximations of CSG solutions on realistic problem instances.

Y. Xiong, J. Gan, B. An, C. Miao.
Optimal pricing for efficient electric vehicle charging station management.
AAMAS '16.
[pdf]
The rapid development of Electric Vehicles (EVs) seen in recent years has been drawing increasing attentions from the public, markets, decisionmakers, and academia. Notwithstanding the progress, issues still remain. Because of the widely complained disadvantages of limited battery capacity and long charging time, charging convenience has become a top concern that greatly hinders the adoption of EVs. Specialized EV charging station, which provides more than 10 times faster charging speed than domestic charging, is therefore a critical element for successful EV promotion. While most existing researches focus on optimizing spatial placement of charging stations, they are inflexible and inefficient against rapidly changing urban structure and traffic pattern. Therefore, this paper approaches the management of EV charging stations from the pricing perspective as a more flexible and adaptive complement to established charging station placement. In this paper, we build a realistic pricing model in consideration of residential travel pattern and EV drivers’ selfinterested charging behavior, traffic congestion, and operating expense of charging stations. We formulate the pricing problem as a mixed integer nonconvex optimization problem, and propose a scalable algorithm to solve it. Experiments on both mock and real data are also conducted, which show scalability of our algorithm as well as our solution’s significant improvement over existing approaches.
 2015

Y. Yin, H. Xu, J. Gan, B. An, A. Jiang.
Computing optimal mixed strategies for security games with dynamic payoffs. IJCAI '15.
[pdf]
Security agencies in the real world often need to protect targets with timedependent values, e.g., tourist sites where the number of travelers changes over time. Since the values of different targets often change asynchronously, the defender can relocate security resources among targets dynamically to make the best use of limited resources. We propose a gametheoretic scheme to develop dynamic, randomized security strategies in consideration of adversary’s surveillance capability. This differs from previous studies on security games by considering varying target values and continuous strategy spaces of the security agency and the adversary. The main challenge lies in the computational intensiveness due to the continuous, hence infinite strategy spaces. We propose an optimal algorithm and an arbitrarily nearoptimal algorithm to compute security strategies under different conditions. Experimental results show that both algorithms significantly outperform existing approaches.

Y. Xiong, J. Gan, B. An, C. Miao, A. Bazzan.
Optimal electric vehicle charging station placement. IJCAI '15.
[pdf]
Many countries like Singapore are planning to introduce Electric Vehicles (EVs) to replace traditional vehicles to reduce air pollution and improve energy efficiency. The rapid development of EVs calls for efficient deployment of charging stations both for the convenience of EVs and maintaining the efficiency of the road network. Unfortunately, existing work makes unrealistic assumption on EV drivers’ charging behaviors and focus on the limited mobility of EVs. This paper studies the Charging Station PLacement (CSPL) problem, and takes into consideration 1) EV drivers’ strategic behaviors to minimize their charging cost, and 2) the mutual impact of EV drivers’ strategies on the traffic conditions of the road network and service quality of charging stations. We first formulate the CSPL problem as a bilevel optimization problem, which is subsequently converted to a singlelevel optimization problem by exploiting structures of the EV charging game. Properties of CSPL problem are analyzed and an algorithm called OCEAN is proposed to compute the optimal allocation of charging stations. We further propose a heuristic algorithm OCEANC to speed up OCEAN. Experimental results show that the proposed algorithms significantly outperform baseline methods.

J. Gan, B. An, C. Miao.
Optimizing efficiency of taxi systems: scalingup and handling arbitrary constraints. AAMAS '15.
[pdf]
Taxi service is an indispensable part of public transport in modern cities. However, due to its decentralized operation mode, taxi services in many cities are inefficient. Besides, the decentralized nature also poses significant challenges to analyzing and regulating taxi services. State of the art computational methods for optimizing taxi market efficiency suffer from two important limitations: 1) they cannot be scaled up efficiently; and 2) they cannot address complex realworld market situations where additional scheduling constraints need to be handled. In this paper, we propose two novel algorithms—FLORA and FLORAA—to address the inadequacies. Using convex polytope representation techniques, FLORA provides a fully compact representation for taxi drivers’ strategy space and scales up more efficiently than existing algorithms. FLORAA avoids enumerating the entire exponentially large pure strategy space by gradually expanding the strategy space. It is the first known method capable of handling arbitrary scheduling constraints for optimizing taxi system efficiency. Experimental results show orders of magnitude improvement in speed FLORA provides, and the necessity of using FLORAA as suggested by changes in the taxi drivers’ operation strategy under different market conditions.

J. Gan, B. An, Y. Vorobeychik.
Security games with protection externality.
AAAI'15.
[pdf  appendix]
Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important realworld security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other “neighbouring” targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NPhard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixedinteger linear programming formulation for the slave problem; 2) an extended greedy approach with a constantfactor approximation ratio to speed up the slave problem; and 3) a linearscale linear programming that efficiently calculates the upper bounds of targetdefined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realisticsized SPE problem instances.
 2013

J. Gan, B. An, H. Wang, X. Sun, Z. Shi.
Optimal pricing for improving efficiency of taxi systems.
IJCAI ’13.
[pdf]
In Beijing, most taxi drivers intentionally avoid working during peak hours despite of the huge customer demand within these peak periods. This dilemma is mainly due to the fact that taxi drivers' congestion costs are not reflected in the current taxi fare structure. To resolve this problem, we propose a new pricing scheme to provide taxi drivers with extra incentives to work during peak hours. This differs from previous studies of taxi market by considering market variance over multiple periods, taxi drivers' profitdriven decisions, and their scheduling constraints regarding the interdependence among different periods. The major challenge of this research is the computational intensiveness to identify optimal strategy due to the exponentially large size of a taxi driver's strategy space and the scheduling constraints. We develop an atom schedule method to overcome these issues. It reduces the magnitude of the problem while satisfying the constraints to filter out infeasible pure strategies. Simulation results based on real data show the effectiveness of the proposed methods, which opens up a new door to improving the efficiency of taxi market in megacities (e.g., Beijing).