Evolving Competent Game AI
Sushil J. Louis et al.
We use case-injected genetic algorithms for learning how to competently play computer strategy games. Case-injected genetic algorithms combine genetic algorithm search with a case-based memory of past problem solving attempts to improve performance on subsequent similar problems. The case-injected genetic algorithm improves performance on later problems in the sequence by learning from cases recorded earlier in the sequence. Since game-play in strategy games usually boils down to optimally allocating resources to achieve in-game mission objectives, we describe how a case-injected genetic algorithm player can play our game by solving the sequence of resource allocation problems generated by opponent moves during game-play. When retrieving and using cases recorded from human game-play, results show that case injection effectively biases the genetic algorithm toward producing plans that contain appropriate elements of plans produced by human players.
We have developed a 3D strategy game as a research platform for investigating our non-traditional approach to computer game playing. Casting game-play in our strategy game as a resource allocation optimization problem makes it suitable for a genetic algorithm-based approach. However, to play well, a genetic algorithm player must learn to anticipate opponent moves from game playing experience. This anticipatory knowledge can come from cases recorded during past CIGAR game playing experience or from cases extracted during human game-play. This paper describes the design of a case-injected Genetic Algorithm Player (GAP) that uses CIGAR to play strategy games and attacks the problem of automatically acquiring game playing knowledge from human players. Results show that GAP can play competently and that injecting cases into the evolving population of a genetic algorithm effectively biases GAP toward producing plans that contain important strategic elements used by human players.
We introduce case-injected genetic algorithms in the next section. We then describe our real-time strategy game. Section 6 explains CIGAR's representation of allocations and routing. The last two sections provide results and conclusions.
Genetic algorithms (GAs) are randomized population-based search
algorithms that search from a population of
points [1,2]. GAs use feedback about the utility
(fitness) of points in the search space to progress towards a goal
(the optimum). Current genetic algorithm-based machine learning
systems, such as classifier systems, use rules to store past
experience in order to improve their performance over
time [1,2,3,4,5]. However,
many application areas are more suited to a case-based storage of past
experience [6,7].
CIGAR combines genetic algorithm search with case-based memory to
improve performance with experience. Previous results on a variety
of problems show 1) that CIGAR, with experience, takes less time
to solve new problems and produces better quality solutions and 2) that
simple syntactic similarity metrics for case indexing lead to this performance
improvement [8].
Search as a problem solving technique has a long history in AI research. Casting game-play as a resource allocation problem means that we can define a search space of possible allocations of assets to targets. Each point in the search space represents a possible allocation. In the absence of admissible heuristics, if we can evaluate the fitness of points in this search space, we can devise search algorithms that use this fitness measure to guide heuristic search. Genetic algorithms were designed to search poorly-understood spaces such as those that arise in combinatorial optimization problems like the resource allocation problem. GAs are the search method of choice in this paper.
Typically, a genetic algorithm randomly initializes its starting population of individuals (candidate solutions to the problem) so that the GA can proceed from an unbiased sample of the search space. However, problems do not usually exist in isolation. We expect a system deployed in an industrial setting to confront a large number of problems over the system's lifetime and many of these problems may be similar. In this case it makes little sense to start a problem solving search attempt from scratch (random initialization) when previous searches may have yielded useful information about the problem domain. Instead, injecting a genetic algorithm's population with relevant solutions or partial solutions from previously solved similar problems can provide information (a search bias) that reduces the time taken to find an acceptable solution. Our approach borrows ideas from case-based reasoning (CBR) in which old problem and solution information, stored as cases in a case-base, helps solve new problems [9,10,11]. In our system, the case-base of problems and their solutions supplies the genetic problem solver with a long term memory.
Figure 1 depicts the genetic algorithm and its interaction with the case-base and game. As the genetic algorithm searches through the space of possible allocations, it periodically saves high fitness individuals to the case-base. Cases in CIGAR are individuals from a genetic algorithm's population along with some ancillary information like a unique case identifier, the individual's fitness, pointers to the individual's parents, and a problem identifier. These individuals, or cases, encode candidate solutions or partial solutions to the game's underlying resource allocation problem. The case-base also obtains cases from human game-play - we reverse-engineer human player generated allocations (game-play decisions) into the encoding used by the genetic algorithm and store these cases in the case-base. On subsequent missions in the game, the system injects appropriate cases into the genetic algorithm player's evolving population thus biasing search towards solutions similar to injected cases. If injected cases come from human players, the knowledge encapsulated in these cases biases GAP. Alternatively, if cases come from previous GAP attempts at playing the game, then the system learns from its own previous game playing experience. The system does not require a case-base to start with and can bootstrap itself by learning new cases from the genetic algorithm's attempts at solving a problem.
Using CBR terminology, the genetic algorithm adapts retrieved cases to generate solutions to the current in-game mission's resource allocation problem. We do not need domain specific adaptation operators because domain knowledge encapsulated within the fitness function and encoding suffice to guide genetic algorithm-based domain-independent adaptation of injected cases. The GA takes care of adaptation - indexing for retrieval, however, remains an issue.
Assuming that similar problems have similar solutions, the typical CBR
approach would inject cases representing solutions to similar
problems. When we have problem similarity metrics, this approach
works quite well and is described
in [12]. However, genetic algorithms are supposed
to work in poorly-understood spaces and we may thus find it difficult,
if not impossible, to come up with a problem similarity metric in such
spaces. Thus, instead of assuming that similar problems have similar
solutions, we assume that similar solutions must have come from
similar problems. If this assumption is true, because genetic
algorithms use simple string representations of solutions, equally
simple similarity metrics like hamming distance and euclidean distance
should work well for indexing cases. Previous work provides empirical
evidence in support of the assumption and
approach [8,13]. This paper uses hamming
distance between bit-strings that encode game-playing strategies for
its distance metric.
Figure 2 depicts a case-injected genetic algorithm that uses solution similarity metrics for case-retrieval and injection.
Initially the case-base is empty. When confronted with the first problem,When CIGAR operates on the basis of solution similarity, we periodically retrieve and inject a small number of solutions similar to the current best member of the GA population into the current population. Injected individuals replace the worst members in the current population and the GA continues searching with this combined population. The idea is to cycle through the following steps. Let the GA make some progress. Next, find cases (individuals) in the case-base that are similar to the current best individual in the population and inject these individuals into the population replacing the worst individuals. If injected individuals contain useful cross problem information, the GA's performance will be boosted.
We have described one particular implementation of such a system. Other less elitist approaches for choosing population members to replace are possible, as are different strategies for choosing individuals from the case-base. We can also vary the injection percentage; the fraction of the population replaced by chosen injected cases.
We have to periodically inject solutions based on the makeup of the current population because we do not know which previously solved problems are similar to the current one. We do not have a problem similarity metric. By finding and injecting cases in the case-base that are similar to the current best individual in the population, we are assuming that similar solutions must have come from similar problems and that these similar solutions retrieved from the case-base contain useful information to guide genetic search.
What happens if our similarity measure is noisy and/or leads to unsuitable retrieved cases? By definition, unsuitable cases will have low fitness and will quickly be eliminated from the GA's population. CIGAR may suffer from a slight performance hit in this situation but will not break or fail - the genetic search component will continue making progress towards a solution. CIGAR is thus robust.
Note also that every individual in the population is a potential case. For a population of size 50, run for 50 generations, we could end up with 2500 potential cases on every new problem attempted. We reduce the number of cases stored in the case-base by only storing individuals that represent an increase in the maximum fitness seen thus far in the evolving population.
The system that we have described injects individuals in the case-base that are deterministically closest, in hamming distance, to the current best individual in the population. We can also choose schemes other than injecting the closest to the best. For example, we have experimented with injecting cases that are the furthest (in the case-base) from the current worst member of the population. Probabilistic versions of both have proved effective.
We have designed and built OPS, an air strike attack planning and defense game. OPS maps to a broad category of resource allocation problems in industry and the military. The game's objective is to plan an air attack against defended targets (Blue player) or to defend against such an air attack (Red player). Blue's goal is thus allocating weapons on the air platforms to targets on the ground and Red's goal is to allocate threats to defend targets. Target priorities and the varying effectiveness of Blue's weapon types against target types further complicate planning. Potential new threats and targets can also "pop-up" on Red's command in the middle of a mission, requiring Blue to be able to respond to changing game dynamics. Both players seek to minimize the damage they receive while maximizing the damage dealt to their opponent. Red plays by organizing defenses in order to best protect its targets and Red's ability to play popups can affect this planning. For example, feigning vulnerability can lure Blue into a pop-up trap, or keep Blue from exploiting a weakness out of fear of such a trap. Blue plays by allocating platforms and their assets (weapons) as efficiently as possible in order to destroy targets while minimizing risk. Many factors determine risk: the platform's route, the effect of accompanying wingmen, weather, and the presence of threats around chosen targets. Using a case-injected genetic algorithm, GAP develops strategies for the attacking strike force, including flight plans and weapon targeting for all available aircraft. When confronted with popups, GAP responds by re-planning with the case injected genetic algorithm to produce a new plan of action.
The left side of Figure 3 shows a screen shot of our game while the right side shows a top-level view of a game mission.
We have implemented this game on top of an open-source game engine making it more interesting than a pure optimization problem. An attractive, ``fun,'' game would entice more people to play and allow us to acquire player knowledge during game-play. After all, beyond producing near optimal strategies, we would like to bias CIGAR toward producing solutions similar to those used by humans playing Blue in the past. We do this so that GAP can be a better and more versatile player by learning strategies from human experts. Specifically, we want to show that GAP more frequently produces plans similar to those a human has played in the past on the same mission. When that information from the human is used by GAP playing a different mission, we show that GAP continues to play strategies closer to the style of the human than GAP's non injected counterpart.
A large body of work exists in which evolutionary methods have been applied to games [14,15,16,17,18]. However the majority of this work has been applied to board, card, and other well defined games which have many differences from popular real time strategy (RTS) computer games such as Starcraft, Total Annihilation, and Homeworld [19,20,21]. Entities in our game exist and interact over time in continuous three-dimensional space, they are are not directly controlled by players but instead sets of algorithms (Game AIs) control them in order to meet strategic goals outlined by players. This adds a level of abstraction not found in traditional games. In addition, computer game players usually have incomplete knowledge of the game state. John Laird [22,23,24] surveys the state of research in using Artificial Intelligence (AI) techniques in interactive computers games. He describes the importance of such research and provides a taxonomy of games.
In simulation games like SimCity, the decision maker controls many non-linearly related variables (inhabitants and environment) and makes decisions that affect the health of her world (in this case a city). Fasciano's work using a case-based approach to build the mayor for the original SimCity game stands out as an example of using computer gaming to study decision support and planning issues [25,26,27].
Several military simulations share some of our game's properties [28,29,30] however, not only do we want to provide a platform for research in strategic planning - we also want to have fun.
Figure 4 outlines our system's architecture. Starting at the left, the Red team (human) and the Blue team (GAP) get the scenario and have time to prepare their strategy. GAP works by applying the genetic algorithm to the underlying resource allocation and routing problem. We chose the best plan produced by the GA in the time available, to play against Red. These plans then execute and during execution, Red can script the emergence of a popup threat. When GAP detects the popup, the genetic algorithm re-plans and begins execution of the new plan.
To play the game GAP must produce routing data for each of Blue's platforms.
Figure 5 shows how routes are built using the A* algorithm [31]. A* builds routes from current platform locations to target locations and back and tends to prefer short routes that avoid threats while seeking targets.
In order to avoid traps the routing system must be somehow
parameterized to avoid areas with particular characteristics. Note
that traps are most effective in areas confined by other threats. If
we artificially inflate threat radii, threats will expand to fill in
potential trap corridors and A* will find routes that go around these
expanded threats. We therefore add a multiplier parameter
that
modifies threats' effective radii. Larger
's expand threats and
fill in confined areas. A* then routes around those confined areas.
Combined with case injection,
allows GAP to learn coefficients
that avoid traps and re-use them in new scenarios. In our scenario
produce shorter, more direct routes, that might go through
a threat's area of influence,
produce direct routes
that avoid threats, and
produce roundabout routes.
is limited to the range
and encoded with eight (
) bits at
the end of our chromosome.
Most of the encoding specifies the asset to target allocation with
encoded at the end as detailed above. Figure 6
shows how we represent the allocation data as an enumeration of assets
to targets. The scenario involves two platforms (P1, P2), each with a
pair of assets, attacking four targets. The left box illustrates the
allocation of asset A1 on platform P1 to target T3, asset A2 to target
T1 and so on. Tabulating the asset to target allocation gives the
table in the center. Letting the position denote the asset and
reducing the target id to binary then produces a binary string
representation for the allocation.
Blue's goals are to maximize damage done to red targets, while
minimizing damage done to its platforms. Shorter simpler routes are
also desirable, so we include a penalty in the fitness function based
on the total distance traveled. This gives the fitness shown in
Equation 1
We use the formulas above for computing the fitness of a particular individual in the genetic algorithm's population. Evaluating an individual thus means that we have to run the game (simulation). Although our game can run much faster than real-time, evaluating a population of, say 50, individuals takes far too long for interactive game-play. However, parallelizing CIGAR by simultaneously evaluating multiple individuals on our cluster significantly speeds up evaluations. At the current time, we can respond competently to a human opponent's moves in less than one minute.
Figure 7-Left shows an overview of our test scenario - chosen to be simple, easily analyzable but to still encapsulate the dynamics of traps and anticipation.
The scenario takes place in Northern Nevada and California with Walker Lake
visible below (south) of the popup on the bottom of the map. Red
has four targets on the right-hand side of the map with their
locations denoted by the white cross-hair. As the targets represent
different buildings comprising a larger facility, they appear as a
single cross-hair from our point of view which is at a significant
distance. Red has twenty two (
) threats placed to defend the targets
and the translucent blue hemispheres show the effective radii of these
threats. Red has the potential to play a popup threat to trap
platforms venturing into the corridor formed by the threats and this
trap is displayed as the solid white circle near the middle.
Blue has eight platforms, all of which start in the lower left-hand
corner. Each platform has one weapon, with three classes of weapons
being distributed among the platforms. A weapon-target effectiveness
table determines the effectiveness of each weapon against each target.
Each of the eight weapons can be allocated to any of the four targets,
giving
allocations. This space is exhaustively
searchable, but more complex scenarios quickly become
intractable. The second stage in Blue's planning involves finding
routes for each of the platforms to follow during their mission.
These routes should be short and simple but still minimize exposure to
risk. We categorize Blue's possible routes into two
categories: yellow routes fly through the corridor between the
threats, while green routes fly around. The evaluator has no direct
knowledge of potential danger presented to platforms inside the
corridor area. Because of this, the evaluator optimal solution is the
yellow route, since it is the shortest. The human expert however,
understands the potential for danger as the corridor provides the
greatest potential for a pop-up trap. Knowing this, the green route is
the human optimal solution. Our goal is now to bias the GA to produce
the green route, while still optimizing the allocation. Teaching GAP
to learn from a human and produce green strategies even though
yellow strategies have higher fitness is a goal of this research.
We first present results showing that GAP can play the game effectively. GAP runs against our test scenario 50 times, and we plot the minimum, maximum, and average population fitness over the number of genetic algorithm iterations in Figure 8-Left. The graph shows a strong approach toward the optimum and in more the 95% of runs it gets within 5% of the optimum. This indicates that GAP can form effective strategies for playing the game.
To deal with opponent moves and the dynamic nature of the game we look at the effects of re-planning.
|
How do we obtain green routes that avoid traps despite CIGAR's
eventual preference for shorter more direct routes? CIGAR's preference
for shorter routes is driven by our evaluation function - knowledge
acquired from human players must change this preference. We thus
artificially inflate the fitness of individuals descended from
injected cases to simulate the change in the evaluation function.
With fitness inflation we get strong CIGAR convergence to
trap-avoiding green routes. Figure 9 shows the value of
the routing parameter,
, on the x-axis and the number of runs of
the GA or CIGAR that converged to a particular value of
on the
y-axis.
|
This paper considered two broad issues in developing competent opponents for computer gaming - using a genetic algorithm for playing strategic computer games and knowledge acquisition from expert players to bias genetic search. We showed that our genetic algorithm player can play the game competently. We also showed that GAP learns to prefer solutions similar to injected cases acquired from humans and to deal with the external-to-the-evaluator concept of a trap. Our preliminary results thus indicate that case injection seems to be a promising approach toward acquiring knowledge and biasing genetic search toward human preferred solutions. We are currently using these techniques in developing a training simulation (computer game) for strike planning.
This material is based upon work supported by the Office of Naval Research under contract number N00014-03-1-0104.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 iccbr
The translation was initiated by Sushil Louis on 2005-08-20