Chris Miles Sushil J. Louis Nicholas Cole
Evolutionary Computing Systems Lab
Department of Computer Science
John McDonnell
SPAWAR Systems Center
San Diego, CA
We use case injected genetic algorithms to learn how to competently play computer strategy games. Strategic computer games involve long range planning across complex dynamics and imperfect knowledge presented to players requires them to anticipate opponent moves and adapt their strategies accordingly. In this paper, we address the problem of acquiring knowledge learned from human players, in particular we learn general routing information from a human player in the context of a strike planning game. By incorporating case injection into a genetic algorithm, we show methods for learning general knowledge from human players to incorporate into future plans. Results show that with an appropriate representation, case injection is effective at biasing the genetic algorithm toward producing plans that contain important strategic elements used by human players.
We use case injected genetic algorithms to learn to play strategic games [1]. In particular we attack the problem of using case injection to bias a genetic algorithm player such that it better emulates the playing styles of human players it has seen in the past. Our research focuses on a strike force asset allocation game which maps to a broad category of resource allocation problems in industry and the military. Genetic algorithms can be used in our game to robustly search for effective strategies. These strategies may approach game optimal strategies but they do not necessarily approach real world optima as the game is an imperfect reflection of reality. Humans with past experience playing the real world game tend to include external knowledge when producing strategies for the simulated game. Incorporating knowledge from the way these humans play should allow us to carry over some of this external knowledge. Our results show that case injection combined with a flexible representation can bias the genetic algorithm towards producing strategies similar to those learned from human players. Beyond playing similarly in a particular mission, the genetic algorithm can use strategic knowledge across a range of similar missions to continue to play as the human would.
We seek to produce a genetic algorithm player (GAP) that can play on a strategic level and learn to emulate aspects of strategies used by human players. Our goals in learning to play like humans are:
These roles require GAP to play with objectives in mind besides that of winning - these objectives would be difficult to quantify inside the evaluator. As humans can function effectively in these regards, learning from them should help GAP better fulfill these responsibilities.
We attack the problem of modeling humans through case-injected genetic algorithms. The genetic algorithm searches for an optimal strategy while case injection biases it to contain elements from strategies used by humans in the past, how it does this is explained in the next paragraph. Used in conjunction we hope to produce near optimal strategies that incorporate important information external to the evaluation function. We work on a strategic game, specifically, strike force asset allocation which consists primarily of allocating a collection of strike assets to a set of targets. We have implemented this game on top of a professional game engine making it more interesting than a pure optimization problem. The game involves two sides: Blue and Red, Blue allocates a set of platforms (aircraft) to attack Red's targets (buildings). Red has defensive installations (threats) that complicate Blue's planning, as does the varying effectiveness of Blue's weapons against each target. Potential new threats and targets can also "pop-up" on Red's command in the middle of a mission, requiring Blue to be able to respond to changing game dynamics. Both players seek to minimize the damage they receive while maximizing the damage dealt to their opponent. Red plays by organizing defenses in order to best protect its targets. Red's ability to play popups can also affect its strategy. For example, feigning vulnerability can lure Blue into a pop-up trap, or keep Blue from exploiting a weakness out of fear of such a trap. Blue plays by allocating its platforms and the assets (weapons) they carry as efficiently as possible in order to destroy the targets while minimizing risk. Risk is determined by many factors, including the platform's route, the effect of accompanying wingmen, and the presence of threats around chosen targets. GAP develops strategies for the attacking strike force, including flight plans and weapon targeting for all available aircraft. When confronted with popups, GAP responds by replanning with the genetic algorithm to produce a new plan of action. Beyond producing near optimal strategies we would like to bias it towards producing solutions similar to those it has seen used by humans playing Blue in the past. We do this so that GAP can be a better and more versatile player by learning strategies from human experts. Specifically we want to show that when using case injection the genetic algorithm more frequently produces plans similar to those a human has played in the past on the same mission. When that information from the human is used by GAP playing a different mission, we show that GAP continues to play strategies closer to the style of the human than GAP's non injected counterpart.
Case-injected genetic algorithms work by saving individuals from the population of a GA, and later introducing them into a GA solving a similar but different problem. Louis showed that case injection improves convergence speed and the quality of solutions found by biasing the current search toward promising regions identified from experience [1,2]. In this paper, the system acquires cases from humans for injection into GAP's population. The idea is to automatically acquire cases by instrumenting the game interface to record all human decision making during game play. Our goal is not only to improve the GA's performance, but to bias the search using knowledge that is external to the actual evaluation and fitness of an individual plan - knowledge being expressed by expert humans in their formation of game plans.
Previous work in strike force asset allocation has been done in optimizing the allocation of assets to targets, the majority of it focusing on static pre-mission planning. Griggs [3] formulated a mixed-integer problem (MIP) to allocate platforms and assets for each objective. The MIP is augmented with a decision tree that determines the best plan based upon weather data. Li [4] converts a nonlinear programming formulation into a MIP problem. Yost [5] provides a survey of the work that has been conducted to address the optimization of strike allocation assets. Louis [6] applied case injected genetic algorithms to strike force asset allocation, showing results consistent with the effectiveness of our GA.
A large body of work exists in which evolutionary methods have been applied to games [7,8,9,10,11]. However the majority of this work has been applied to board, card, and other well defined games. Such games have many differences from popular real time strategy (RTS) games such as Starcraft, Total Annihilation, and Homeworld[12,13,14]. Many traditional (board, card, paper) games use entities (pieces) that have a limited space of positions (such as on a board) and restricted sets of actions (well defined movement). Players in these games also have well defined roles and the domain of knowledge available to each player is clearly identified. These characteristics make the game state easier to specify and analyze. In contrast, entities in our game exist and interact over time in continuous three dimensional space. Entities are not directly controlled by players but instead sets of algorithms control them in order to meet goals outlined by players. This adds a level of abstraction not found in those traditional games. In most of these computer games, players have incomplete knowledge of the game state, and even this domain of each a player's knowledge is difficult to identify. John Laird [15,16,17] surveys the state of research in using Artificial Intelligence (AI) techniques in interactive computers games. He describes the importance of such research and provides a taxonomy of games. Several military simulations share some of our game's properties [18,19,20], these however are military simulations while ours is not intended to perfectly model the dynamics of real situations but to provide a platform for research in strategic planning and to have fun.
In this paper we first define the problem being attacked, including the mission being played. Then we outline how we use a GA to play the game, including delving into the architecture and how case injection works. Next we discuss how case injection is used to acquire and use knowledge from human players. Finally we show results that the GA can play the game, and by using case injection we can significantly increase its likelihood of playing like a human.
The mission being played is shown in Figure 2. This
mission was chosen to be simple, to have easily analyzable results,
and to allow the GA to learn external knowledge from the human. As
many games show similar dynamics, this mission is a good arena for
examining the general effectiveness of using case injection for
learning from humans. The mission takes place in Northern Nevada and
California, Lake Tahoe is visible near the bottom of the map. Blue
possesses one platform which is armed with 8 assets (weapons) and the
platform takes off from and returns to the lower left hand corner of
the map. Red possesses eight targets distributed in the top right
region of the map, and six threats that defend them. The first stage
in Blue's planning is determining the allocation of the eight assets.
Each asset can be allocated to any of the eight targets, giving
allocations. The second stage in Blue's planning involves
finding routes for each of the platforms to follow during their
mission. These routes should be short and simple but still minimize
exposure to risk. We categorize Blue's possible routes into two
categories. Yellow routes fly through the corridor between the
threats, while green routes fly around. The evaluator has no direct
knowledge of potential danger presented to platforms inside the
corridor area. Because of this, the evaluator optimal solution is the
yellow route, since it is the shortest. The human expert however,
understands the potential for danger as the corridor provides the
greatest potential for a pop-up trap. Knowing this the green route is
the human optimal solution. Our goal is now to bias the GA to produce
the green route, while still optimizing the allocation. Teaching GAP
to learn from the human and produce green strategies even though
yellow strategies have higher fitness is the goal of this research.
Our way of measuring the state of entities in the game is
probabilistic and we describe it in the next section.
In many games, entities posses hit-points which represents their
ability to take damage. Each attack then removes a number of
hit-points and when reduced to zero (
) hit-points that entity is
destroyed. In reality weapons have a more hit or miss effect, whereby
they entirely destroy things or leave them functional. A single
attack may destroy an entity or multiple attacks may have no effect.
This paradigm introduces a high amount of stochastic error into the
game. Evaluating a plan can result in outcomes ranging from total
failure to perfect success, which makes it difficult to compare two
plans. By taking a statistical analysis we achieve better results.
Consider the state of each entity at the end of the mission as a
random variable. Identifying the expected values for those variables
becomes one means to judge the effectiveness of a plan. These
expected values can be estimated by playing a number of games for each
plan and averaging the results. However doing multiple runs to
determine a single evaluation increases the computational expense
many-fold.
We use a different approach based on probabilistic health metrics.
Instead of monitoring whether or not an object has been destroyed we
monitor the probability of its survival up until that point in time.
Being attacked no longer destroys objects and removes them from the
game, it reduces their probability of survival from then on according
to Equation 1.
Figure 3 shows our system's architecture. The two players, human and GAP, are presented with the mission and given time to prepare their strategy. GAP works by applying its GA to the mission. The GA creates populations of bit strings, which are converted into plans and evaluated. Based on this evaluated fitness individuals are recombined and new plans are produced. We combine a steady state population model, roulette wheel selection, two point crossover and bitwise mutation to form our GA. Production of a full flight plan is complicated, so we next detail the steps involved.
GAP must be able to produce routing data for Blue's platforms in order to play the game. Figure 4 shows how the A* algorithm is used to build routes [21]. From the allocation of assets to targets we produce an ordered list of waypoints for each platform to visit. In order for platform P1 to use asset A1 on target T3, it has to fly to T3's location. Applying this to all of P1's assets produces a list of waypoints for P1 to visit during its mission. Future work would include ordering information into the chromosome, currently, targets are visited according to the ordering of their respective assets. From the list of waypoints we use a path-finding algorithm to produce more intelligent routes. A number of path-finding algorithms exist, A* was chosen as it is very widely used and comprises the vast majority of path-finding algorithms in games.
Path-finding between destinations is accomplished in two stages. First, the system discretizes the world into a voxel grid. A voxel is a three dimensional cube, with each voxel having a value representing the danger presented to platforms inside it. The voxels are then formed into a graph, where edges connect adjacent voxels in 3D space. By starting A* at the voxel containing the start location, and searching out the most promising neighbor voxels we can explore the graph until we locate a route to the voxel containing the destination. A* is guaranteed to always find the shortest route if it is given a proper underestimate of distance to the goal. This route is a list of waypoints to fly to in order to reach the destination and in two dimensions it is equivalent to a list of street intersections to drive through in order to get from A to B. Due to the large number of waypoints, the routes produced are too cumbersome to use efficiently. Because platforms are not limited to moving through space orthogonally we can remove the majority of waypoints from the route. Imagine a rubber band stretched around a set of pegs placed along a grid, removing unnecessary pegs from the graph leaves pegs outlining threats to avoid, and pegs marking places to visit. Specifically the algorithm works by starting at the beginning of the route, and removing subsequent pegs (waypoints) so long as they do not increase the total risk involved in the route. Once a node has been found that cannot be removed the algorithm repeats from that location.
A* is shown to always find the optimal route based on its cost
functions, in our case the shortest route avoiding known threats.
Since our game includes traps, the shortest route is not always
desirable. In order to do more interesting routing, we must be able
to bias A* towards producing routes that are longer, or more dangerous
then those immediately apparent. We do this by modifying the graph A*
searches, producing a variety of effects on the kinds of routes
produced with relative ease. For example penalizing each voxel based
on how far south it is provides a bias that tends to produce north
traveling routes, thus producing an overall strategy of attacking from
the north. Routing two groups of platforms, one with a southern bias
and one with a northern bias is likely to produce pincer attacks.
However the human in our game is trying to avoid confined areas, and
to do this we need to modify the voxels in order to identify areas
that are confined. If we can increase the cost of those voxels, then
the router will avoid those areas. In our representation we identify
these confined areas by extending the effective radii of threats when
we build the voxel graph. The extension is calculated by a simple
multiplication of each radius by a coefficient
, which determines
the kind of routes produced. Figure 5 shows the
effect
has on routing. The inner circle outlines the range of
one of the threats, the outer sphere outlines the radius being used by
the routing system for this particular mission. At this
the
threats have expanded together and filled the corridor, leading the
router to produce green routes.
In our mission,
values less
then
lead to yellow routes while larger
values lead to
green routes. As encoded,
uses 8 bits to produce a range from 0
to 3. Future work will include additional parameters to deal with the
different factors involved in human routing.
Imagine playing a game and seeing your opponents do something you had not considered that worked out to great effect. Seeing something new, you are likely to try to learn some of the dynamics of that move so you can incorporate it into your own play and become a more versatile player. Ideally you would like perfect understanding of when and where this move is effective and ineffective, and how to best execute the move under those circumstances. Whether the move is using a combination of chess pieces in a particular way, bluffing in poker, or doing a reaver drop in Starcraft the general idea remains. In order to imitate this process we use a two step approach with case injection. First we learn knowledge from human players by saving their decision making during game play and encoding it in for storage in the case-base. Second we apply this knowledge by periodically injecting these stored cases into GAP's evolving population.
Actually converting human strategies into chromosomes to save in the case-base is non-trivial. The focus of this research is on knowledge application so the chromosome has been directly engineered by the human player. Representation is key, and we are investigating different encodigs and different methods of translating human decisions to encoded plans.
Unless otherwise stated, GAP uses a population size of
, two-point
crossover with a probability of
, and point mutation with a
probability of
. We use elitist selection, where offspring and
parents compete for population slots in the next
generation [22]. All results are averages over
runs.
We first show that GAP can form efficient strategies. GAP plays the
mission 50 times, and we graph the average fitness of individuals
inside the population against their generation in
Figure 7. The graph shows a strong approach
toward the optimum. GAP approaches within
of optimal allocation
and routing
of the time. This indicates that GAP can form
effective strategies for playing the game to the extent possible from
the evaluator.
|
To test GAP's ability to capture external knowledge and emulate humans we produce green route plans, convert them to chromosomes, and then try to bias our search towards those chromosomes. The human plan is show in white in Figure 8. Converting this to the closest plan representable in our encoding gives the plan shown in green in Figure 8. The plans are not identical because the chromosome does not contain exact routing information. Note the overall fitness difference between these two plans is less then 2%.
The category of routes produced is determined by the values of
.
GAP's ability to produce the human like route (green) is based on the
values of
it chooses. Figures 9 and
10 show the distribution of
produced by the
non-injected genetic algorithm and the case-injected genetic
algorithm. Comparing Figure 9 with
Figure 10 shows a significant shift in the
's
produced, which leads to a large increase in the number of green
routes generated by the case injected GA. Without case injection GAP
produced no green routes, using case injection biased GAP to produce
64% green routes, this difference is statistically significant.
These results were based on
different runs of the system with
different random seeds and show that case injection does bias the
search towards the human strategy.
Moving to the mission shown in Figure 11 and repeating the
process produces the histograms shown in
Figures 12 and 13. The
same effect on
can be observed even though the missions are
significantly different, and even though we use the cases from the
previous mission. Our general routing representation allows GAP to
learn to avoid confined areas from play by the human expert.
|
Case injection applies a bias to the GA search, the number and frequency of individuals injected determines the strength of this bias. However the fitness function also contains a term that biases against producing longer routes. As the number of evaluations allotted to the GA is increased, the bias against longer routes outweighs the bias towards human strategies and fewer green routes are produced. The effect is shown in Figure V.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 paper.tex
The translation was initiated by Sushil Louis on 2004-11-30