Trap Avoidance
Chris Miles
Sushil J. Louis
Rich Drewes
We use case injected genetic algorithms to learn to competently play computer strategy games. Such games are characterized by player decision in anticipation of opponent moves and imperfect knowledge of game state. Within the broad goal of developing effective and general methods of anticipatory play, this paper investigates anticipation in the context of trap avoidance in an immersive, 3D strike planning game. Case injection allows acquiring player knowledge from experience and incorporating acquired knowledge into future game play. Results show that with an appropriate representation case injection is effective at biasing the genetic algorithm toward producing plans that both avoid traps and carry out the mission effectively.
The computer gaming industry is now bigger than the movie industry and both gaming and entertainment drive research in graphics and modeling. Although AI research has in the past been interested in games like checkers and chess, popular computer games like starcraft and counter-strike are very different from chess and checkers. These games are situated in a virtual world, are asymmetrical, involve both long-term and reactive planning, and provide an immersive, fun experience. At the same time, we can pose many business, training, planning, and scientific problems as games where player decisions determine the final solution. A decision support system for a player in such games corresponds closely with a decision support system in the ``real'' world.
This paper applies a case-injected genetic algorithm that combines genetic algorithms with case-based reasoning to provide player decision support in the context of domains modeled by computer games [1]. The genetic algorithm ``plays'' the game by attempting to solve the underlying decision problem. Specifically, we develop and use a strike force asset allocation game, which maps to a broad category of resource allocation problems in industry, as our test problem. Strike force planning consists of allocating a collection of strike assets on flying platforms to a set of targets and threats on the ground. The problem is dynamic; weather and other environmental factors affect asset performance, unknown threats can popup, and new targets can be assigned. These complications as well as the varying effectiveness of assets on targets make the problem suitable for genetic and evolutionary computing approaches.
The idea behind a case-injected genetic algorithm is that as the genetic algorithm component iterates over a problem it selects members of its population and caches them (in memory) for future storage into a case base. Cases are therefore members of the genetic algorithm's population and represent an encoded candidate solution to the problem at hand. Periodically, the system injects appropriate cases from the case base, containing cases from previous attempts at other problems, into the evolving population replacing low fitness population members. When done with the current problem, the system stores the cached population members into the case base for retrieval and use on new problems
Case injection is used to handle the dynamic nature of the game which places a premium on re-planning or re-allocation of assets when needed. We have shown that case-injected genetic algorithms learn to increase performance with experience at solving similar problems [1,2,3,4,5]. This implies that a case-injected genetic algorithm should quickly produce new plans (a new allocation) in response to changing game dynamics. Beyond purely responding to immediate scenario changes we use case injection in order to produce plans that anticipate opponent moves in the future. Doing this teaches our Genetic Algorithm Player (GAP) where traps are likely to occur, so that GAP acts in anticipation of changing game states. Specifically we try to influence GAP to produce plans that avoid areas similar to those in which it has encountered traps in the past. The scenario being used in this paper is simple so that we know the optimum and can better analyze results. We have found that the system works well on more complicated scenarios. Our results show that GAP makes an effective Blue player with the ability to quickly replan to deal with changing game dynamics, and that case-injection can bias GAP to produce solutions that are suboptimal with respect to the game simulation's evaluation function but that avoid potential traps.
In the rest of the paper we define the game, the particular scenario being played, and the trap being encountered. We outline GAP's architecture; detailing the use of genetic algorithms, the encoding of strategy, the routing system, and the incorporation of case injection. Section 6 presents results showing that GAP can effectively play the game in the absence of the trap, and that GAP can quickly re-plan in the face of changing game dynamics. Preliminary results also show the effectiveness of case injection in acquiring and using player knowledge in learning to avoid traps. The last section presents our conclusions and directions for future work.
The strike planning game is based on an underlying resource allocation and routing problem and the genetic algorithm plays by solving the underlying problem. Our game involves two sides: Blue and Red, both seeking to allocate their respective resources to minimize damage received while maximizing the effectiveness of the strike.
Blue plays by allocating its resources, a set of assets on aircraft (platforms), to Red's buildings (targets) and defensive installations (threats). Blue determines which targets to attack, which weapons (assets) to use on them, and how to route each platform to minimize risk and maximize effectiveness.
Red's defensive installations (threats) protect targets by threatening platforms that come within range. Red plays by placing threats in space and time to best protect targets. Potential threats and targets can also "pop-up" on Red's command in the middle of a mission, allowing a range of strategic game-playing options. By cleverly locating threats Red can feign vulnerability and lure Blue into a deviously located popup trap, or keep Blue from exploiting such a weakness out of fear of a trap. The scenario in this paper involves Red presenting Blue with a corridor of easy access to unprotected targets, a corridor containing a popup threat.
In this paper, a human player scripts Red's play while a Genetic Algorithm Player (GAP) plays Blue. The fitness of an individual in GAP's population solving the underlying allocation problem is evaluated, essentially, by running the game. We explain our fitness evaluation in more detail in a later section. GAP develops strategies for the attacking strike force, including flight plans and weapon targeting for all available aircraft. When confronted with popups, GAP responds by replanning with the genetic algorithm in order to produce a new plan of action that responds to changes. Beyond purely responding to immediate scenario changes we use case injection in order to produce plans that anticipate opponent moves in the future.
Previous work in strike force asset allocation has been done in optimizing the allocation of assets to targets, the majority of it focusing on static pre-mission planning. Griggs [6] formulated a mixed-integer problem (MIP) to allocate platforms and assets for each objective. The MIP is augmented with a decision tree that determines the best plan based upon weather data. Li [7] converts a nonlinear programming formulation into a MIP problem. Yost [8] provides a survey of the work that has been conducted to address the optimization of strike allocation assets. Louis [9] applied case injected genetic algorithms to strike force asset allocation. A large body of work exists in which evolutionary methods have been applied to games [10,11,12,13,14]. However the majority of this work has been applied to board, card, and other well defined games. Such games have many differences from popular real time strategy (RTS) games such as Starcraft, Total Annihilation, and Homeworld[15,16,17]. Chess, checkers and many others use entities (pieces) that have a limited space of positions (such as on a board) and restricted sets of actions. Players in these games also have well defined roles and the domain of knowledge available to each player is well identified. These characteristics make the game state easier to specify and analyze.
In contrast, entities in our game exist and interact over time in continuous three dimensional space. Entities are not directly controlled by players but instead sets of parametrized algorithms control them in order to meet goals outlined by players. This adds a level of abstraction not found in more traditional games. In most such computer games, players have incomplete knowledge of the game state and even the domain of this incomplete knowledge is difficult to determine. John Laird [18,19,20] surveys the state of research in using Artificial Intelligence (AI) techniques in interactive computers games. He describes the importance of such research and provides a taxonomy of games. Several military simulations share some of our game's properties [21,22,23], however these attempt to model reality while ours is designed to provide a platform for research in strategic planning, knowledge acquisition and re-use, and to have fun. The next section describes the scenario (or mission) used in our experiments.
Figure 1-Left shows an overview of our test scenario - chosen to be simple, easily analyzable but to still encapsulate the dynamics of traps and anticipation.
The scenario takes place in Northern Nevada and California, Lake Tahoe
is visible below (south) of the popup on the bottom of the map. Red
has four targets on the right hand side of the map with their
locations denoted by the white cross-hair. As the targets represent
different buildings comprising a larger facility, they appear as a
single cross-hair from our point of view which is at a significant
distance. Red has twenty two (
) threats placed to defend target
and the translucent blue hemispheres show the effective radii of these
threats. Red has the potential to play a popup threat to trap
platforms venturing into the corridor formed by the threats and this
trap is displayed as the solid white circle near the middle.
Blue has eight platforms, all of which start within the region in the
lower left hand corner. Each platform has one weapon, with three
classes of weapons being distributed among the platforms. A
weapon-target effectiveness table determines the effectiveness of each
weapon against each target. Each of the eight weapons can be
allocated to any of the four targets, giving
allocations. This space is exhaustively search-able, but more complex scenarios
quickly become intractable.
In this scenario, GAP's router can produce the three broad typese of routes shown in Figure 1-Right.
Black routes expose platforms to unneccessary risk from threats and thus receive very low fitness. The naively optimal strategy contains yellow routes which are the shortest most direct routes to the target that still manage to avoid known danger. However in the presence of the popup, Green routes are optimal although they are longer than yellow routes. With experience GAP should learn to anticipate traps and to prefer green routes even though green routes have lower fitness then yellow routes.
In order to search for good routes and allocations, GAP must be able to compute and compare their fitnesses. Computing this fitness is dependent on the representation of entities states inside the game, and our way of representing this state is rather unusual so we next detail it.
In many games, entities (platforms, threats and targets in our game) posses hit-points which represents their ability to take damage. Each attack removes a number of hit-points and when reduced to zero hit-points the entity is destroyed and cannot participate further. However, weapons have a more hit or miss effect, entirely destroying an entity or leaving it functional. A single attack may be effective while multiple attacks may have no effect. Although more realistic, this introduces a degree of stochastic error into the game. In the worst case, evaluating a individual plan can result in outcomes ranging from total failure to perfect success making it difficult to compare two plans based on a single evaluation. Lacking a good comparison it is difficult to search for an optimal strategy. By taking a statistical analysis of survival we can achieve better results. Consider the state of each entity at the end of the mission as a random variable. Comparing the expected values for those variables becomes an effective means to judge the effectiveness of a plan. These expected values can then be estimated by executing each plan a number of times and averaging the results. However, doing multiple runs to determine a single evaluation increases the computational expense many-fold.
We use a different approach based on probabilistic health
metrics. Instead of monitoring whether or not an object has been
destroyed we monitor the probability of its survival. Being attacked
no longer destroys objects and removes them from the game, it just
reduces their probability of survival according to
Equation 1 below.
The gaming system's architecture reflects the flow of action in the game and is described next.
To play the game GAP must produce routing data for each of Blue's platforms.
Figure 3 shows how routes are built using the A* algorithm [24]. A* builds routes from current platform locations to target locations and back and tends to prefer short routes that avoid threats while seeking targets.
In order to avoid traps the routing system must be somehow
parameterized to avoid areas with particular characteristics. Note
that traps are most effective in areas confined by other threats. If
we artificially inflate threat radii, threats will expand to fill in
potential trap corridors and A* will find routes that go around these
expanded threats. We therefore add a multiplier parameter
that
increases threats' effective radii. Larger
's expand threats and
fill in confined areas. A* then routes around those confined areas.
Combined with case injection,
allows GAP to learn coefficients
that avoid traps and re-use them in new scenarios. In our scenario
produce black routes,
produce yellow
routes and
produce green routes.
is limited to the
range
and encoded with eight (
) bits at the end of our
chromosome. We are encoding a single
for each plan, future work
may include
's for each section of routing contained in the plan.
Most of the encoding specifies the asset to target allocation with
encoded at the end as detailed above. Figure 4
shows how we represent the allocation data as an enumeration of assets
to targets. The scenario involves two platforms (P1, P2), each with a
pair of assets, attacking four targets. The left box illustrates the
allocation of asset A1 on platform P1 to target T3, asset A2 to target
T1 and so on. Tabulating the asset to target allocation gives the
table in the center. Letting the position denote the asset and
reducing the target id to binary then produces a binary string
representation for the allocation.
Blue's goals are to maximize damage done to red targets, while
minimizing damage done to its platforms. Shorter simpler routes are
also desirable, so we include a penalty in the fitness function based
on the total distance traveled. This gives the fitness calculated as
shown in Equation 3
GAP records games played against opponents and runs offline after a game playing episode in order to determine the optimal way to win that game. The simulation now contains knowledge about opponents moves, in our case, the game contains the popup trap. Allowing the search to progress towards the optimal strategy in the presence of the popup, GAP saves individuals from this search into the case-base, building a case-base with routes that go around the popup trap - green routes. When faced with other opponents, GAP then injects individuals from the case-base, biasing the current search towards containing this learned anticipatory knowledge.
In this paper GAP first plays the scenario, likely picking a yellow route and falling into Red's trap. Afterwards GAP replays the game, including Red's trap into the evaluator. Yellow routes then receive poor fitness, and GAP searches towards the optimal green route. Saving individuals to the case-base from this search stores a cross-section of plans containing ''trap avoiding'' knowledge.
The process produces a case-base of individuals containing important knowledge about how we should play, but how can we use that knowledge in order to play smarter in the future? Case Injection has been shown [2] to increase the search speed and the quality of the final solution produced by a GA working on a similar problem. It also tends to produce answers similar to old ones by biasing the search to look in areas that were previously successful - exploiting this effect gives our GA its learning behavior. When playing the game we periodically inject a number of individuals from the case-base into the population, biasing our current search towards information from those individuals. Injection occurs by replacing the worst members of the population with individuals chosen from the case database through a "Probabilistic Closest to the Best" strategy [1]. Those individuals bring their ''trap avoiding'' knowledge into the population, increasing the likelihood of that knowledge being used in the final solution and therefore increasing GAP's ability to avoid the trap.
We first show that GAP can form efficient strategies. GAP is run against our test scenario 50 times, and we graph the min, max, and average population fitness against generation in Figure 5-Left. The graph shows a strong approach toward the optimum and in more the 95% of runs it gets within 5% of the optimum. This indicates that GAP can form effective strategies for playing the game.
To deal with opponent moves and the dynamic nature of the game we look at the effects of re-planning.
|
GAP's ability to learn to avoid the trap is shown in Figure
6. The figure compares the histograms of
values
produced by GAP with and without case injection. Case injection leads
to a strong shift in the kinds of
's produced, biasing the
population towards using green routes. The effect of this bias being
a large and statistically significant increase in the frequency at
which strategies containing green routes were produced (
). These results were based on
independent runs of the
system and show that case injection does bias the search toward
avoiding the trap.
Figure 7-left compares the fitnesses with and without case injection. Without case injection the search shows a strong approach toward the optimal yellow plan; with injection the population quickly converges toward the optimal green plan. Case injection applies a bias towards green routes, however the GA has a tendency to act in opposition of this bias, trying to search towards ever shorter routes. The ability of the GA to overcome the bias through manipulation of injected material is dependent on the size of the population and the number of generations it runs. Figure 7-Right illustrates this effect. As the number of evaluations alloted to the GA is increased, the frequency of green routes being produced as a final solution decrease. Counteracting this tendency requires a careful balance of GA and case-injetcion parameters.
|
There are a large number of interesting avenues in which to continue this research. Fitness inflation appears to solve one of our major problem in using case injection, further exploration of this technique is underway. We are also interested in capturing information from human players in order to better emulate their style of play. The game itself is also under major expansion, the next phase of research should involve a symmetric game involving aspects of resource management and much deeper strategies than those seen at the current level.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 414
The translation was initiated by Sushil Louis on 2004-11-30