Indexing, or how we define similarity, is a basic issue in all the systems described above. Previous work has dealt with the simpler case where a similarity metric exists in the problem space [Louis & Johnson1997b,Ramsey & Grefensttete1993]. Simply put, when we know that two problems are similar, the system can use information from attempting one problem to improve performance on the other. However, a problem similarity metric is not easy to come by and remains a major issue for case-based reasoning systems. In this paper, we study the more realistic case where we do not need a similarity measure on the problem space. Since genetic algorithms solutions are encoded as binary or real strings, purely syntactic similarity measures lead to surprisingly good performance - a needed property for applications to poorly-understood systems. Solution similarity measures completely avoid CBR's indexing problems.
Combinational circuit design is an example of a configuration design
problem. The general problem can be stated as: Given a function and a target
technology to work within, design an artifact that performs the function
subject to constraints. For parity checkers, the function is parity checking
and the target technology is combinational logic gates such as boolean AND and
OR gates.
A genetic algorithm can be used to
design combinational circuits as described in [Louis & Rawlins1991].
A five-bit parity checker is one of
225 = 232 = 4294967296 different
five-input one-output combinational circuits (boolean functions). If we are
trying to solve this class of problems, one way of indexing, defining
similarity of problems, can be:
Concatenate the output bit for all possible 5-bit input combinations counting
from 0 through 31 in binary. This results in a binary string of output
bits, So, of length 32. Strings that are one bit different from Sodefine a set of boolean functions, as do strings that are two bits different
and so on. This way of naming boolean functions provides a simple distance
metric and indexing mechanism for the combinational circuit design problems
that we consider in this paper. That is, we have a metric for measuring problem similarity.
Problems are constructed from the parity problem by randomly changing
bits in the output bit string. The fitness of a candidate circuit is
the number of correct output bits. Thus 5-bit problems would have a
maximum fitness of 25 = 32. As a simple example, consider the 3-bit
parity checker as shown in Table
. The first column
in Table
shows all possible inputs for a 3-bit party
checker, the second column shows the correct output for each input. We
construct a 3-input, 1-output problem that is similar to the 3-bit
parity checker by randomly choosing and flipping one of the output
bits of the 3-bit parity checker as shown by the boxed 0 in the
table. The correct output bits for the new problem (1 bit different
from the 3-bit parity checker) is shown in the third table. This new
problem is distance 1 away from the parity problem in terms of our
problem similarity metric.
|
The solution similarity metric needs to measure the distance between encoded circuits (solutions). We are assuming that similar circuits have similar function. Since we use a binary string to encode combinational circuits, the hamming distance between these bit strings suffices as our similarity metric on this problem. We show that this simple similarity metric works well for these parity checker design problems. Real number strings could use Euclidean distance.
Using the parity problem as a basis, we randomly generate 50problems that are similar in terms of our problem similarity metric by
randomly flipping
of the output bits of the 6-bit parity
checker's output bit string.
Figure 4
In this paper, CIGAR uses a population of size 30 run for a maximum
of 30 generations to solve 6-bit combinational circuit design
problems (similar to parity checkers). This results in a 150 length
chromosome (6 rows, 5 columns, 5 bits per gate) with a corresponding
search space of 2150. The probability of crossover is 0.95 and
the probability of mutation is 0.05. We inject three cases (
of population size) into the population every five generations.
Previous work explains these parameter values [Louis & Johnson1997a]. All plots are
averages over ten (10) runs.
Figure 5 compares performance in terms of time taken to find
a solution for CIGAR and a randomly initialized genetic algorithm.
The figure plots time taken to find the best solution on the vertical
axis and the number of problems solved on the horizontal axis.
Figure 7 plots the distribution of
solutions' similarities produced by CIGAR and RIGA.
Figure 8 plots the solution quality as a function of experience for the following injection strategies.