next_inactive up previous


Case-Initialized Genetic Algorithms for Knowledge Extraction and Incorporation Case-Initialized Genetic Algorithms

Judy Johnson

Sushil J. Louis

Abstract:

This paper investigates case-initialized genetic algorithms for extracting knowledge from past problem solving to solve subsequent problems. We develop a test problem class with similar solutions and the genetic algorithm is run for randomly chosen problems from the class. As the algorithm runs on a particular problem, solution strings are stored in a case-base and on subsequent problems, solutions from the case-base are used to initialize the population of a genetic algorithm. We investigate the effect of selection strategy and choice of appropriate cases for injection. Scaled roulette and scaled elitist selection both show improvement over a randomly initialized GA and elitist selection performs better than roulette. Over $50$ problems the case-initialized genetic algorithm system shows a statistically significant decrease in the time taken to the best solution and solutions are of higher fitness. Several strategies for choosing cases from the case base for injection all provide measurable improvement over random initialization.

Introduction

Problems seldom exist in isolation. Any useful system must expect to confront many related problems over its lifetime and we would like such a system to be able to improve its performance with experience. Such a learning system requires memory; a place for storing past experiences to guide future operations. The storage area may be distributed or localized, but a system without a memory is forced to start from scratch in trying to solve every given problem. Genetic algorithms (GAs) are randomized parallel search algorithms that search from a population of points [6,4]. Current genetic algorithm based machine learning systems, such as classifier systems, use rules to store past experience in order to improve their performance over time [6,4,18,8,5]. However, many application areas are more suited to a case-based storage of past experience [14,7,19,3]. We investigate a system that uses a case-base as a long term knowledge store in a new genetic algorithm based machine learning system. Constructing a well understood test problem, we investigate the effect of selection strategy and algorithms to choose cases from the case base for injection. Results on the test problem show that our system, with experience, takes less time to solve new problems and produces better quality solutions. To combat premature convergence due to biasing the initial population we use linear scaling with both roulette wheel and the CHC elitist selection strategy [2]. We show that scaled roulette and scaled elitist selection both show improvement over a randomly initialized GA and elitist selection performs better than roulette. Over $50$ problems the case-initialized genetic algorithm system shows a statistically significant decrease in the time taken to the best solution and solutions are of higher fitness. Several strategies for choosing cases from the case base for injection all provide measurable improvement over random initialization.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E [13].

Although we cater to the definition of machine learning in Mitchell's book, unlike classifier systems and many other machine learning algorithms, we do not explicitly induce patterns (generalize) from experiencing data nor do we expect to obtain concept descriptions from exposure to exemplars, or to learn weights, or rules. Instead think of repeated searches using the system as exploring a domain, gleaning useful information about the domain, storing this in a long term memory, then retrieving and using this information to bias future searches in the domain. More specifically, we use cases to store domain information in a case-base, then retrieve a subset of these cases from the case-base and inject them into the genetic algorithm's population to bias future searches in the domain. Note that cases stored in long term memory may in fact implicitly embody a domain model. It must be pointed out that our system differs significantly from classifier systems [6]. One way that classifier systems differ is that they use rules to represent domain knowledge; our system uses cases.

Although we only consider using genetic algorithms and a case-base of past experience, we believe that our approach is not limited to either genetic algorithms or to case-based memory. We conjecture that properly combining a robust search algorithm with some implementation of an associative memory can result in a learning system that learns, with experience, to solve similar problems quickly. What we mean by similarity and how to identify similar problems and solutions will be discussed at length in subsequent sections.

Genetic Algorithms

A Genetic Algorithm (GA) is a randomized parallel search algorithm that searches from a population of points [6]. GAs provide an efficient tool for searching large poorly understood spaces, and have been applied successfully to many NP-hard problems [4]. The genetic algorithm works with a population of potential solutions encoded in some way, usually as bit strings. It proceeds by evaluating and modifying the strings using the genetic operators of selection, crossover, and mutation. The classical genetic algorithm begins with a random population of strings. Evaluation of each solution string is based on a fitness function that is problem dependent and this fitness function determines which of the individuals are better. The classical genetic algorithm uses roulette wheel selection, which is a probabilistic method of choosing the fittest individuals for mutation and crossover; the most fit strings will be chosen more often in direct correspondence to their relative fitness.

Crossover allows for information exchange between two strings. One point crossover is implemented by randomly choosing a crossover point in the selected strings and exchanging complementary substrings as shown in figure 1.

Figure 1: One Point Crossover
\includegraphics[height=1.0in,width=3.0in]{crossover.ps}
Selection according to fitness combined with crossover provides the main power behind the GA. The selection operator can cause the loss of genetic material, thus decreasing the exploration of the search space and causing premature convergence. Mutation guards against this loss of genetic material by periodically, with low probability, flipping a bit position in the string [4].

Although GAs work well for finding solutions to large, poorly understood problems, they don't learn from previous problem solving attempts and their performance with each new attempt remains the same. We often apply GAs in poorly understood domains, thus, it is desirable to learn from previous attempts and be able to store the knowledge gained from them.

Case-Base Reasoning

Case-based reasoning attempts to mimic the technique of solving new problems by remembering similar problems, their results, and possibly the reasoning behind those results. A case-base reasoning system proceeds on the assumption that the best way to solve a problem is by reference to prior experience, stored as cases [17]. When given a new problem, the system gets the most similar case from memory and either uses or adapts it to solve the new problem. A case-based reasoning system gets it's power from a large library of cases [16].
The strategy involves answering the following questions:
  1. How are cases organized in the memory?
  2. How are cases determined to be relevant retrieved from memory?
  3. How can the retrieved cases be adapted to the new problem?
  4. How are the cases initially acquired?

Combining the GA with the Case-Base

The method described in this paper attempts to combine the two approaches. By combining these two approaches, a problem solving search can make use of information already gathered about the search space during a previous problem solving attempt.

The idea is that systems in an application setting expect to confront similar problems. When using a genetic algorithm based system the process of running the GA extracts and uses knowledge of the search space in converging to a solution. When members of the population are stored in a case-base this knowledge of the search space for the solved problems is available for future problem solving attempts. Our premise in this paper is that this knowledge can be extracted and stored in a memory or case-base and used to initialize the population of the GA to solve the new problem. The GA will be given a head start in finding a solution, since it can make use of information previously gleaned by running the GA on the earlier problem. If the solutions to these problems are similar, seeding a genetic algorithm with the solutions to previously solved problems should produce better results in a shorter period of time, thus improving the efficiency of the GA.

In this paper, a case is a member of the population (a candidate solution) together with other information including its fitness and the timestep at which this case was generated [12]. During GA search, we periodically store the best individual in the population to the case-base.

Our GA starts with no case-base and saves cases to memory as it solves more problems. Finding cases to store causes no difficulty. A GA with a population of $100$ running for $100$ generations creates $10,000$ possible cases for the case-base. Choosing which of these potential cases to save poses a greater problem. Our CBR-GA system uses the stored cases to initialize it's population when a new problem is attempted. Cases are chosen for injection based on an index of similarity between the new problem and the problems stored in the case-base. As the GA proceeds, these solutions are changed by the GA operators of crossover and mutation to adapt to become solutions to the new problem.

The schema theorem of genetic algorithms says that short, low order, above-average schemata are given exponentially increasing trials in subsequent generations. These schemata are sampled, recombined, and sampled again to form potentially higher fitness strings; they become building blocks to form strings with higher fitness [4]. We hope that, by storing solutions to similar problems in a case-base, we will be storing building blocks common to solutions to our new problems. When we inject these stored solutions into our initial population for the new problem, we will already have some of the schemata that are needed to build the solution we want.

Early work in this field was done by Ramsey and Grefenstette [15]. They used a case-base to initialize populations for a GA finding the best strategies in a tracker/target simulation with a periodically changing environment. Solution strategies for similar environments were used to initialize the population each time the environment changed. Improvements in results with the case-base were observed both when compared to the GA running with no knowledge of the change in environment and when the GA was restarted when the environment was altered.

Louis, McGraw [12] and Wyckoff used a similar approach to explain the process of generating solutions using a genetic algorithm. Their approach was more concerned with using the case-base to gather information about the development of the solutions over the intermediate generations of the genetic algorithm. This study also had initial promising results from seeding their populations with promising schema generated early in the GA run and later lost [12]. Later work by Louis dealt with the open-shop scheduling problem and with circuit design. The thrust of this work involved research into selecting appropriate cases to inject and the number of cases to use in seeding the population. Again, results were promising, with better solutions being found in a shorter time period [9] and [20].

Work in using solution similarity metrics for choosing cases to inject shows that solution similarity metrics work well for improving performance [10,11] with experience. This paper investigates the properties of case-initialization for problem sets for which a problem similarity metric exists and studies injection of appropriate cases into the initial population of a genetic algorithm.

In section 2 we look at two selection algorithms, roulette and a modified version of CHC or elitist selection to test the feasibility of our system and to choose a selection operator. In this section we study only one limited set of problems, and we index based on a simple linear problem distance calculation. Section 3 looks at an expanded version of the problem set from section two, using only an elitist selection strategy. We examined the time taken to find the best solutions as the system solves more problems to test how well our system learns. Section 4 discusses different methods of indexing to choose solutions for injection from the case-base. We studied the effect of changing the problem distance calculation from linear to quadratic, exponential, random, and misleading which adds a random integer to the linear problem distance for a specified proportion of the problems in the case-base. We also looked at a hamming distance based method of choosing solutions. This allows us to draw some conclusions about how well our system will work with other sets of problems where the similarity of solutions may not be strictly linear or may be unknown. Section 5 contains our conclusions and some suggestions for future work.

Methodology

The Problem Set

We tested the feasibility of our combined system using a problem set with the following properties:

  1. Similar problems have demonstrably similar solution strings. To test the validity of combining the GA with the case-base we need a good measure of similarity for the problem solutions we inject. Liu discovered in his work that the degree of similarity between two problems is an important factor in choosing which case to use when seeding the population [9]. Previous solutions of higher fitness can be seeded with good results when problems are very similar, but less similar problems did better with seeded individuals of lower fitness. This measure of similarity should provide for easy indexing. We want a set of problems that can be sorted by the similarity of the solutions so that we can measure the improvements caused by injecting more or less similar solutions.
  2. A large problem set. We want a large number of possible problems to study the effect of experience on our system. We are developing a system that will generate better solutions in less time than a standard GA. We need a large number of problems to test what happens as more and more problems are solved and the case-base grows.

We considered several problems before choosing a variation of One-Max. One-Max is the problem of maximizing the number of ones in a string consisting of ones and zeros [1]. One variation we looked at was the set of problems consisting of counting the number of ones in a binary string. The number of possible problems in the class is the length of the string plus one. Some examples of possible problems and solutions are shown in table 1.


Table 1: The class of size $10$ unordered problems
Problem Index ($P_i$) Possible Solution Possible Solution
0 0000000000 0000000000
1 0010000000 0000100000
2 1000010000 0010000010
3 0101010000 0011000010
. .  
. .  
. .  
9 1111111110 0111111111
10 1111111111 1111111111


This set of problems fulfilled properties $2$ and $3$ but, as can be seen in table 1 solutions to problems which are close (in problem space) are not necessarily close to each other (in solution space). Even for the same problem, it is possible to have two widely different solution strings. For example, the strings

1001111000 0110000111

both solve problem number $5$ but the the two strings don't share any bit positions; they are complements.

The other set of problems we looked at was the set of problems consisting of a string of ones followed by a string of zeros. Here the number of possible problems in the class is also the length of the string plus one. If the string length is ten, there are eleven possible problems whose solutions are shown in table 2.1. This set is a subset of the previous, unordered set of problems.

Table 2: The class of size $10$ ordered problems
Problem Index ($P_i$) Solution
0 0000000000
1 1000000000
2 1100000000
3 1110000000
. .
. .
. .
9 1111111110
10 1111111111


The Sequential One-Max (SOM) set of problems had all of the properties we were looking for. Indexing is easy; the problem index, $P_i$ is a measure of the degree of similarity between two problems. The distance between two problems $P_i$ and $P_j$ is $\mid P_i - P_j \mid$. When the distance between two problems is small it is easy to see in table 2.1 that the solution strings are similar; if two problems are within one of each other, the difference in the solution is one bit position. The number of problems in the class is equal to the length of the string used for the solution, therefore the class can contain any number of possible problems.

Roulette Selection

We first tested our system on the SOM problem set with the usual roulette selection operator and one point crossover described in Section 1 to try out the case-base with the classical GA. The GA was run for $200$ generations. In the initial run there is no case-base and it is built as more problems are solved. The best solution from the population is saved at regularly chosen intervals of $40$ generations. The best string found over all generations was also saved; thus six solutions were saved to the case-base. As more problems are solved, the case-base grows and solutions to more problems are available. The GA is able to take advantage of the information saved in the case-base in solving subsequent problems.

The GA was first run using a scaled roulette selection scheme. Scaling was used to prevent premature convergence with a scaling factor of $1.2$. String length was set to $100$ and we used a population size of $200$. This provided for $101$ possible problems in the class. For this initial test of our system, we restricted the problem set to the problems between $45$ and $55$ or eleven possible problems. The GA was run for $10$ problems with the possibility that the same problem could be attempted by the GA more than once. Probability of crossover was set to $0.66$, probability of mutation was $0.001$ and the percent of the population initialized from the case-base was $5$%, with the first run always randomly initialized since the case-base did not contain any stored solutions.

Choosing the correct cases to inject is an important parameter. Liu found the following general trends in case selection [9].

  1. As problem distance increases, injecting cases with lower fitness results in better solutions.
  2. This trend is emphasized with larger problem sizes.
  3. A quicker flattening out of the performance curve (average or maximum fitness versus time) is seen when higher fitness individuals are injected.
Keeping these trends in mind, we wanted an indexing strategy that would chose lower fitness individuals when problem distance was high and higher fitness individuals when problem distance was low. Our indexing was done as follows: a threshold value was set. The linear distance, $\mbox{dist}(P_i,P-j)$, of the new problem to a problem stored in the case-base was calculated.

\begin{displaymath}\mbox{dist}(P_i,P_j) = \mid P_i - P_j \mid\end{displaymath}

where $P_i$ is the new problem and $P_j$ is a problem from the case-base. If this distance was less than the threshold value, the solutions to problem $P_j$ were put in the usable list. Seed individuals were chosen from the strings in the usable list using distance proportional selection. The probability $\mbox{Prob}_{P_j}$ of solutions to problem $\mbox{P}_j$ being chosen for injection is:

\begin{displaymath}\mbox{Prob}_{P_j} = 1 - \frac{\mbox{dist}(P_i, P_j)}{\sum_{P_j \in CB} \mbox{dist}(P_i,P_j)}\end{displaymath}

where CB denotes the case-base and $\mbox{dist}(P_i,P_j)$ designates the linear distance between problems $P_i$ and $P_j$. The distance proportional selection is the same as the roulette selection used for the classical GA. More strings are chosen from those solutions in the case-base that are solutions to a problem linearly close to the new problem. If no previous solutions are within the threshold distance, problems are randomly chosen from the case-base and solutions to those problems are placed in the usable list. In either case, individuals are chosen from generations based on the problem distance, with the probability of choosing an individual from a particular generation inversely proportional to the problem distance. When $P_j$ is close to the new problem, $P_i$, solutions are chosen from later generations or from the best solutions which are also stored in the case-base. If $P_j$ is not close to the new problem, $P_i$, the GA is seeded with solutions from earlier generations which are presumed to be of lower fitness in solving the $P_j$, the problem stored in the case-base.

The solutions to problems that are distance one apart have solutions that are one bit different using the SOM problem set, therefore we initially attempted to use only mutation with a probability of crossover of $0.00$. The expectation was that mutation would change the one bit and thus find a solution. This had poor results. When probability of mutation is .01, one bit in each string will be changed in each generation. The probability of changing the wrong bit and decreasing the fitness of the seeded string is $99$%. In the next generation, the number of bits that need to be changed to arrive at the solution is $2$ with $98$% probability that the wrong bits are changed. With each succeeding generation, the number of incorrectly placed bits increases and the probability of finding the correct solution using only mutation decreases.

We believe that using the crossover operator is the correct approach. Solutions to similar problems contain schemata which are building blocks for the solution to the new problem. The building block hypothesis says that short, low-order, highly fit schemata are recombined to form strings of potentially higher fitness (Goldberg, 1989). The premise here is that our seeded individuals will become the building blocks for solutions to the new problems.

Figure 2: Left: Average fitness with roulette selection. Right: Maximum fitness with roulette selection.

Using the above parameters, average fitness, is approximately the same with and without the Case Initialized Genetic AlgoRithm (CIGAR) in the early generations and displays a gradual increase in fitness (Figure 2 left). The run of $10$ problems using CIGAR shows a higher average fitness and maintains a fairly stable advantage over the unseeded run but does not show a great improvement in fitness. Maximum fitness, shown in Figure 2 (right), displays a different behavior. Initial generations have much higher maximum fitness with CIGAR than without and then dip to lower fitness before climbing again, though fitness never drops below the random GA. This dip can be explained by the nature of the initialized cases. The strings are close to the correct solution and are made up of schemata of long defining length and high order. These schemata appear in the early generations when the population has not converged. Therefore, the probability that the schemata will be disrupted is high. This accounts for the dip in fitness displayed in Figure 2 (right). Maximum fitness increases after this dip as the schemata of shorter length are recombined to find better solutions of longer length. In later generations, the maximum fitness for the unseeded run approaches that of the seeded run but does not overtake it.

Elitist Selection

Roulette selection did not offer a large improvement for the case-base over random initialization. We tried an elitist selection scheme next, attempting to see a greater improvement in performance than we achieved with the classical GA. We used a modified version of CHC selection [2] where, if N is the population size, the best N individuals in the combined pool of parents and offspring make up the next generation. [*]

Elitist selection emphasizes exploitation over exploration; therefore, to increase exploration we increased the probability of crossover to $100$% and probability of mutation to $0.05$%. The GA was run for $100$ generations with population size of $200$ and string length of $100$. Maintaining the number of cases saved to the case-base, we stored the best solution every $20$th generation instead of every $40$th generation plus we also stored the best overall solution.

The average fitness with this type of selection was higher with seeded individuals than without. Elitist selection got better results in less time with this problem than roulette selection either with or without CIGAR. Initializing the population from the case-base, average fitness starts out approximately the same as random initialization, but it takes a sharp upward curve as shown in Figure 3 (left).

Figure 3: Left: Average fitness with elitist selection. Right: Maximum fitness with elitist selection.
Maximum fitness values are also higher with seeded individuals than without. Elitist selection starts at a high fitness with seeded individuals and makes small gains over the 100 generations, finishing with a slightly higher fitness than the randomly initialized populations achieved in the final generations(Figure 3 right).

Figure 4 shows that elitist selection generates higher maximum and higher average fitness than roulette selection when both are used with case-based injection. The average fitness for elitist selection is higher than the maximum fitness for roulette selection after the initial generations. There is an even larger disparity in the maximum fitness of the solutions found with elitist selection over the maximum fitness found using roulette.

Figure 4: Elitist and roulette selection, maximum and average fitness

Comparison

Further comparing roulette selection with elitist, we looked at the maximum fitness in the zero generation. We saw that the beginning maximum fitnesses for both types of selection rose as more problems were solved. Elitist selection showed consistently higher maximum fitnesses with each problem solved, while roulette selection displayed beginning maximum fitnesses that varied more. With an elitist selection method, the best strings are not lost over generations. This guarantees that the best strings will be saved to the case-base and used to initialize new populations for the GA. and accounts for the sharp increase in the initial maximum fitness manifested in Figure 5 (left) using elitist selection.

Figure 5: Left: Beginning maximum fitness. Right: Ending maximum fitness.
Roulette selection can and often does lose the best individuals over generations, thus, the cases saved are more varied than with elitist selection. Even though the absolute best string is saved to the case-base, other strings with high fitness may be lost if they do not appear in the generations in which solutions are stored. This accounts for the larger variance in fitness among the seeded individuals (Figure 5 left).

The maximum fitnesses at the last generation of each GA run were also higher with seeded populations than with random populations. This result was consistently true with elitist selection, with the maximum fitness increasing steadily as more problems were solved. The solution was found for the last two problems when the population was seeded from the case-base (Figure 5 right). Elitist selection had higher maximum fitness at the final generation with or without CIGAR than roulette selection was able to find. The ending maximum fitness for roulette selection was higher for seeded populations in all but two cases, one of which was the second problem to be solved, when the case-base contained the fewest cases available for injection. Maximum fitness did not rise as consistently as more problems were solved as it did with elitist selection, but it was higher in most cases with seeding than without (Figure 5 right).

The hamming distance between two strings is the number of bit positions which are different from one string to the other. The average hamming distance of the population is the average of the hamming distance between each pair of strings in the population. As more cases are input to the case-base, we expected the average hamming distance between strings in the case-base to decrease. With each problem attempted, the GA generates higher fitness solutions to be stored to the case-base. High fitness solutions are strings that have a block of ones followed by a block of zeros. These strings are close to each other in hamming distance, therefore, as the case-base stores more of these high fitness strings, the average hamming distance within the case-base decreases. The injected strings become more alike, since the case-base has converged upon similar strings, thus, the initial population hamming distance is expected to decrease. Assuming the hamming distance between the seeded individuals approaches zero, the population hamming distance should approach:

\begin{displaymath}\frac{(n - p*n) * (n + p*n - 1)}{n * (n -1)}\ * \frac{l}{2}\end{displaymath}

where $p$ is the percent of the population initialized, $n$ is the population size and $l$ is the length of the string.

Both roulette and elitist selection had lower hamming distance in the first generation for seeded populations than random populations. Elitist selection displayed decreasing hamming distance as more problems were solved. Random initialization had an initial hamming distance of approximately $50$ for both types of selection. This is the expected value of $\frac{l}{2}$ for a random population (Figure 6 left).

Figure 6: Left: Beginning hamming distance. Right: Endings hamming distance.
Looking at roulette selection, we saw that seeded populations had lower hamming distances than random for all iterations, but without the uniform decrease displayed with elitist selection. This may be explained by the higher hamming distance within the case-base with roulette selection. Since roulette selection did not arrive at solutions that were as good as those found with elitist selection, the solutions stored are not as good and are not as close to each other.

The ending hamming distances are also lower with seeded populations for both types of selection with elitist again being lower than roulette selection. By the last generations of the GA run the population has converged more with CIGAR than without. (Figure 6 right).

We also studied the performance of the descendants of seeded individuals. Table 3 shows the percentage of the population descended from injected individuals in the last generation divided into segments of the population based on fitness. For all segments of the population, more than $5$% of its members are descendants of injected individuals. This indicates that the injected strings were of higher fitness than random strings and thus reproduced more often. Once again elitist selection performed better than roulette. When we looked at the strings that composed the top 10% of the population in terms of fitness, almost half were descendants of seeded individuals as compared to 17% descendants using roulette selection.

Table 3: Percent in population segment descended from injected individuals
Category Roulette Elitist
Best 10% 17.22% 47.78%
Best 25% 14.44% 42.00%
Best 50% 12.78% 40.22%
Worst 50% 18.00% 28.56%
Worst 25% 17.11% 26.44%
Worst 10% 15.00% 27.78%


Time to Best Solution

Elitist selection provided results that were much better than roulette selection, and we used it for the rest of the work presented in this paper. In this section we study the time taken to find the best solution to the problem. We expanded the problem set to $51$ possible problems and tested whether our system would find it's best solutions in less time than a randomly initialized GA. The GA was run $10$ times with different random seeds, solving $50$ randomly chosen problems between $25$ and $75$. Results were averaged over these $10$ runs. Each run of the GA generated it's own set of $50$ problems, and during each run of $50$ problems it was possible that the same problem could be chosen more than once.

Figure 7 (left) shows the number of generations taken to find the best solution, plotted on the y-axis, for each problem attempt, plotted on the x-axis. Using injected cases, we see a decrease in the number of generations taken to arrive at the best solution. Without injected cases the time taken to the best solution remains approximately constant. By the time approximately $1/2$ of the problems have been attempted, there is a statistically significant decrease in the time take to solve a new problem.

Figure 7: Random vs CIGAR initialization, different problem sets. Left: Time to best solution. Right: Best fitness

When we look at the quality of solutions generated, it can be seen in Figure 7 (right) that the best fitness found with injected cases is higher than without. The maximum fitness found is fairly constant without injection. Using CIGAR the maximum fitness increases sharply as more problems are solved.

We next ran the GA for $50$ problems using the same set of problems in the same order for both the case-based initialized GA and the random GA and for each of the $10$ runs of the GA. It was still possible to repeat a problem during any of the $10$ runs. Once again, the time taken to the best solution was shorter with our case-base injection than without (Figure 8 left). The best fitness found was also better with the injected cases than without (Figure 8 right).

Figure 8: Random vs. CIGAR initialization, same problems. Left: Time to best solution. Right: Best fitness.

Looking at the set of $50$ problems where each problem is only attempted once and the same problems are evaluated in the same order in each of the $10$ runs, we get similar results again. Time taken to best solution is shorter (Figure 9 left) and best fitness found is higher (Figure 9 right). In this case the quality of the solutions found using CIGAR was approximately the same as when repetition of problems was allowed. The time taken to the best solutions was slightly longer on average without repetition than when repetition was allowed, but once again there was a statistically significant decrease in time to best solution once approximately $1/2$ of the problems were solved.

Figure 9: Random vs CIGAR initialization, no repetition. Left: Time to best solution. Right: Best fitness.

These are the results we want from a learning system; the system improves with experience, taking less time to arrive at better solutions. Our system conforms to our earlier definition of a learning system as one in which an improvement in information processing ability results from it's information processing activity.

We tested our system on the unordered One-Max problem class, described in section $2$. The first thing we noted was that this problem was much easier for the GA alone to solve. To make the problem harder for the GA, we used the problem ranges $0$ to $24$ and $75$ to $100$. The same problems were solved by the case-base initialized and the random GA in each of the $10$ runs, with the possibility of repeated problems.

Figure 10: Time to best solution, random vs CIGAR initialization, unordered One-Max

Looking at figure 10 it is clear that, while the best solutions are found more quickly with cases injected than without, the difference in time to best solution is not as dramatic as with the earlier problem. This can be explained by the nature of the new problem. The indexing scheme outlined earlier chooses solution strings to inject based on the distance between the problem to be solved and the problems stored in the case-base. For the SOM problem set, a small problem distance implies a small hamming distance between solution strings. Injecting strings with this small hamming distance to the solution string provides the GA with building blocks to a solution for the new problem. In the new set of problems, a low problem distance does not imply small hamming distance to the solution that the GA ultimately finds. The strings

1001111000 0110000111

both solve problem number 5 but the hamming distance between them is 10; as far apart as possible. The injected cases do not provide as much improvement in performance for this problem, because of this difference in the solution strings.

Indexing

Our system does generate better results in less time using the SOM set of problems. Real world problems, however, don't usually have the properties of this problem set that allowed us to use a linear problem distance indexing scheme. To test our system on other types of problem distances we changed the method we used to choose cases for injection.

The linear indexing method that we had been using was based on a probabalistic or roulette type selection. We next tested simply choosing solutions that solved problems in the case-base that were closest in distance to the new problem. We sorted the cases based on their distance to the new problem. Again, distance was measured as $\mid \mbox{Prob}_{new} - \mbox{Prob}_j \mid$ where $P_j$ is a problem stored in the case-base. The best solution from each of the closest problems in the case-base was chosen regardless of the problem distance. This is different from the previous indexing method, where the cases chosen for injection were selected from generations in inverse proportion to the problem distance. Again both methods outperform random initialization, with neither one clearly outperforming the other with respect to the time taken (Figure 11 left). $CIGAR$ refers to the indexed method of choosing cases and $CIGAR_{Take Best}$ corresponds to choosing the best solutions to the closest available problems. Take best initialization also produces higher fitness solutions than random, getting similar results to the linearly initialized GA (Figure 11 (right). This similarity to the indexed type of initialization seems to contradict earlier work that got better results by using lower fitness cases when problem distance is high and higher fitness cases when problem distance is low. Take best initialization simply takes the best solution to the closest problems in the case-base. It is possible that the case-base contained problems close enough to the new problem that taking the best solution did not cause premature convergence.

Figure 11: Take best initialization vs linear. Left: time to best solution. Right: Best fitness

We studied the cases where the distance was not linear next. First we looked at a randomized case, where the problems were randomly inserted into an array and the array indices were used to calculate problem distance. Distance was calculated as

\begin{displaymath}\mbox{Dist}_{calc} =
\mid \mbox{i} - \mbox{j} \mid \end{displaymath}

where i and j are indices into the array of randomly ordered problem numbers. The actual distance is

\begin{displaymath}\mbox{Dist}_{actual} = \mid \mbox{ProbArray[i]} - \mbox{ProbArray[j]} \mid \end{displaymath}

and, therefore,

\begin{displaymath}\mbox{Dist}_{actual} \not= \mbox{Dist}_{Calc}\end{displaymath}

This allowed us to compare a random problem distance based injection with the linear problem distance based injection. Cases to inject are chosen based on the probabalistic indexing method described in Section $2$ using the indices to measure problem distance. Thus, a problem in the case-base whose index in the problem array is close to the index of the new problem would get more copies of it's solutions in the injected cases than problems whose indices are farther from the index of the new problem. The generation of the solution chosen is based on the indices also, with cases from later generations chosen when the distance between the case-base problem index and the new problem index is small, and earlier solutions chosen when this index distance is large.

As can be seen from Figure 12 (left) the linear array arrived at the best fitness faster than the random array based injection, but, the random injection still reaches it's best fitness faster than a purely random initialization. Figure 12 (right) shows the best fitness found is higher with a linear distance calculation than with random distance, but both are better than a purely random initialization. This is the expected behavior, since using the random array to choose cases for injection creates a situation where cases are chosen randomly with no respect to problem distance. We expect that using the true measure of problem distance to choose cases for injection will result in selecting more appropriate cases than a system which merely chooses them at random. The random case injection still has better performance than random initialization because, even though the cases injected are randomly chosen, they are still solutions to a problem which requires a block of ones in a row and then a block of zeros. These blocks of ones and zeros become building blocks to solutions for problems which require similar blocks of ones and zeros, therefore they help the GA to find a solution faster than a random population.

Figure 12: Random Array Initialization vs Linear. Left: Time to best solution. Right: Best fitness

Quadratic and exponential problem distances were studied by creating functions which returned the $\mbox{Dist}_{calc} = \mbox{problem
distance}^2$ and $\mbox{Dist}_{calc} = \mbox{2}^{problem
distance}$. For this test, we used a population of $50$ and ran the GA for $50$ generations. In Figure 13 (left) it can be seen that the quadratic distance gets to it's best fitness in the least time, with linear indexing producing similar results Exponential distances get poor improvement in the time to best solution, producing similar results to random initialization.

Figure 13: Quadratic, exponential and linear initialization. Left: Time to best solution. Right: Best fitness.

Looking at Figure 13 (right), both linear and quadratic indexing achieve similar best fitnesses with exponential indexing generating lower fitness solutions. Changing the distance function to a quadratic rather than a linear function, places more emphasis on choosing cases from those problems in the case-base that are close to the new problem. This additional emphasis helps with this problem set, and similar results are achieved in less time than with a linear distance function. However, when we increase this emphasis on closeness even more by using an exponential distance function, we decrease the fitness of the solutions we generate and increase the time taken to arrive at those solutions. For this problem, exponential distances place too much emphasis on the cases closer to the new problem, causing too much exploitation and not enough exploration.

Misleading problem distances were tested by adding a randomly chosen number between $-5$ and $5$ to the actual problem distance. This was studied for a $20$, $50$ or $80$ % probability of adding the random factor. With respect to the time taken to the best solution, basing injection on a misleading distance calculation produced results similar to linear distance injection. There was a downward trend in the time taken as more problems were solved

When we looked at the best fitness found, the misleading distances generated better fitness solutions than both the random initialization and the strictly linear distance indexing. The different probabilities did not make a great difference in results. Figure 14 shows random initialization, linear injection and misleading injection with a probability of $80$%. In this figure $CIGAR$ is linear indexing injection, $CIGAR_{80}$ is misleading injection with $80$% of the problem distances being misleading and $RANDOM$ is the randomly initialized GA. Here again we see a tradeoff between exploration and exploitation. The linear injection is able to exploit the closeness of the injected solutions to the new problem solution and arrive at the best fitness in less time than the GA using the misleading injection technique. Misleading injection, however, was able to achieve better fitness solutions because of increased exploration of the search space.

Figure 14: Misleading vs Linear Initialization, best fitness found

Conclusions

We proposed and evaluated a new technique for combining genetic algorithms with case-based reasoning for learning from experience. In our system, while a genetic algorithm is solving a problem, the system saves members of the population to a case-base. Simultaneously, appropriate cases from previously solved problems in the case-base are injected into the initial population of the genetic algorithm. This paper investigated the properties of our case-initialized genetic algorithm and showed that the combined system learns to improve performance with experience.

The results show an elitist selection strategy getting better performance than the classical roulette strategy. Both maximum and average fitness increased as the case-base grows and individuals are injected into the population. Hamming distances decrease as the population of the case-base converges more tightly around better solutions. Descendants of seeded individuals make up a large proportion of the best individuals in the population with an elitist selection strategy and injected individuals tend to survive and multiply at high rates. This result is not as pronounced with roulette selection, but the descendants of seeded individuals still make up a greater percentage of the population than their original 5% share.

The time taken to find the best solution decreases as more problems are solved and more solutions are saved to the case-base. Better performance is achieved when problems are allowed to repeat, but improvement is statistically significant with and without repetition.

Changing the means of choosing cases still provides measurable improvement in performance. Even a random or misleading selection of cases to inject produces improvement over the randomly initialized GA run. This seems to indicate that using a case-base will improve performance even among problems where the similarities in solutions are not clearly understood and an indexing measure is not readily apparent. Our current work in using this technique on a more significant set of problems than SOM considers variations of this technique and shows wide applicability on real world problems [10,11]. We believe that this fairly straightforward addition to a genetic algorithm can pay dividends in many industrial settings.

Much work remains to be done. The effect of different recombination operators was not considered in this paper. Also, individuals can be injected into the population at generations other than the initial generation. This may be effective when using a roulette wheel selection strategy. Injecting individuals into later generations when the population has had time to converge may prevent some of the loss in fitness observed when cases are injected into the initial population. Later injection of individuals may also improve performance using the hamming distance method of selecting cases for injection. Selecting cases that are close to the best individual in a later generation should inject strings that are closer to the solution of the problem than the strings that we selected based on their nearness to the best individual in generation zero.

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 9624130 and in part upon work supported by the Office of Naval Research under contract number N00014-03-1-0104.

Bibliography

1
David A. Ackley.
A Connectionist Machine for Genetic Hillclimbing.
Kluwer Academic Publishers, 1987.

2
Larry J. Eshelman.
The chc adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination.
In Gregory J. E. Rawlins, editor, Foundations of Genetic Algorithms-1, pages 265-283. Morgan Kauffman, 1991.

3
A. Goel and B. Chandresekaran.
Case-based design: A task analysis.
In Christopher Tong and Duvvuru Sriram, editors, Artificial Intelligence in Engineering Design, Vol II, pages 165-184. Academic Press, Inc., 1992.

4
David E. Goldberg.
Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, 1989.

5
J. Grefenstette, C. Ramsey, and A. Shultz.
Learning sequential decision rules using simulation models and competition.
Machine Learning, 5:355-381, 1990.

6
John Holland.
Adaptation In Natural and Artificial Systems.
The University of Michigan Press, Ann Arbour, 1975.

7
M. Huhns and R. Acosta.
Argo: An analogical reasoning system for solving design problems.
In Christopher Tong and Duvvuru Sriram, editors, Artificial Intelligence in Engineering Design, Vol II, pages 105-144. Academic Press, Inc., 1992.

8
Cezary Z. Janikow.
A knowledge-intensive genetic algorithm for supervised learning.
Machine Learning, 13:189-228, 1993.

9
Xiaohua Liu.
Combining Genetic Algorithm and Case-based Reasoning for Structure Design.
University of Nevada, Reno, 1996.
M.S. Thesis, Department of Computer Science.

10
Sushil J. Louis.
Learning from experience: Case injected genetic algorithm design of combinational logic circuits.
In Proceedings of the Fifth International Conference on Adaptive Computing in Design and Manufacturing, page to appear. Springer-Verlag, 2002.

11
Sushil J. Louis and John McDonnell.
Learning with case injected genetic algorithms.
IEEE Transactions on Evolutionary Computation, To Appear in 2004.

12
Sushil J. Louis, Gary McGraw, and Richard Wyckoff.
Case-based reasoning assisted explanation of genetic algorithm results.
Journal of Experimental and Theoretical Artificial Intelligence, 5:21-37, 1993.

13
Tom M. Mitchell.
Machine Learning.
WCB McGraw-Hill, Boston, MA, 1997.

14
J. Mostow, M. Barley, and T. Weinrich.
Automated reuse of design plans in bogart.
In Christopher Tong and Duvvuru Sriram, editors, Artificial Intelligence in Engineering Design, Vol II, pages 57-104. Academic Press, Inc., 1992.

15
C. Ramsey and J. Grefensttete.
Case-based initialization of genetic algorithms.
In Stephanie Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 84-91, San Mateo, California, 1993. Morgan Kauffman.

16
Elaine Rich and Kevin Knight.
Artificial Intelligence.
McGraw Hill, Inc., 1991.

17
C. K. Riesbeck and R. C. Schank.
Inside Case-Based Reasoning.
Lawrence Erlbaum Associates, Cambridge, MA, 1989.

18
D Smith.
Bin packing with adaptive search.
In Proceedings of an International Conference on Genetic Algorithms, pages 202-206. Morgan Kauffman, 1985.

19
K. Sycara and D. Navinchandra.
Retrieval strategies in case-based design system.
In Christopher Tong and Duvvuru Sriram, editors, Artificial Intelligence in Engineering Design, Vol II, pages 145-164. Academic Press, Inc., 1992.

20
Zhijie Xu and Sushil J. Louis.
Genetic algorithms for open shop scheduling and re-scheduling.
In Proceedings of the ISCA 11th International Conference on Computers and Their Applications., pages 99-102, Raleigh, NC, USA, 1996. International Society for Computers and Their Applications.


Index

About this document ...

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 chapter

The translation was initiated by Sushil Louis on 2005-01-06


next_inactive up previous
Sushil Louis 2005-01-06