next up previous
Next: Discussion and Conclusions Up: Learning from Experience: Case Previous: Related Work

Subsections

Indexing and similarity

 

Indexing, or how we define similarity, is a basic issue in all the systems described above. Previous work has dealt with the simpler case where a similarity metric exists in the problem space [Louis & Johnson1997b,Ramsey & Grefensttete1993]. Simply put, when we know that two problems are similar, the system can use information from attempting one problem to improve performance on the other. However, a problem similarity metric is not easy to come by and remains a major issue for case-based reasoning systems. In this paper, we study the more realistic case where we do not need a similarity measure on the problem space. Since genetic algorithms solutions are encoded as binary or real strings, purely syntactic similarity measures lead to surprisingly good performance - a needed property for applications to poorly-understood systems. Solution similarity measures completely avoid CBR's indexing problems.

Problem Similarity

Combinational circuit design is an example of a configuration design problem. The general problem can be stated as: Given a function and a target technology to work within, design an artifact that performs the function subject to constraints. For parity checkers, the function is parity checking and the target technology is combinational logic gates such as boolean AND and OR gates. A genetic algorithm can be used to design combinational circuits as described in  [Louis & Rawlins1991]. A five-bit parity checker is one of 225 = 232 = 4294967296 different five-input one-output combinational circuits (boolean functions). If we are trying to solve this class of problems, one way of indexing, defining similarity of problems, can be: Concatenate the output bit for all possible 5-bit input combinations counting from 0 through 31 in binary. This results in a binary string of output bits, So, of length 32. Strings that are one bit different from Sodefine a set of boolean functions, as do strings that are two bits different and so on. This way of naming boolean functions provides a simple distance metric and indexing mechanism for the combinational circuit design problems that we consider in this paper. That is, we have a metric for measuring problem similarity. Problems are constructed from the parity problem by randomly changing bits in the output bit string. The fitness of a candidate circuit is the number of correct output bits. Thus 5-bit problems would have a maximum fitness of 25 = 32. As a simple example, consider the 3-bit parity checker as shown in Table [*]. The first column in Table [*] shows all possible inputs for a 3-bit party checker, the second column shows the correct output for each input. We construct a 3-input, 1-output problem that is similar to the 3-bit parity checker by randomly choosing and flipping one of the output bits of the 3-bit parity checker as shown by the boxed 0 in the table. The correct output bits for the new problem (1 bit different from the 3-bit parity checker) is shown in the third table. This new problem is distance 1 away from the parity problem in terms of our problem similarity metric.

 
Table: Outputs of the 3-bit parity checker (a 3-0) and a 3-1 problem.
Inputs 3-bit parity problem 3-1 problem
000 0 0
001 1 \fbox{ {\bf 0} }
010 1 1
011 0 0
100 1 1
101 0 0
110 0 0
111 1 1
 

Representation


  
Figure: Mapping from 1D chromosome to 2D circuit.
\begin{figure}
\centerline{
\psfig{figure=figs/mp12d.ps,height=1.2in,width=2.0in}
}
{}
\end{figure}

Since a solution similarity metric depends on how solutions/circuits are encoded, we describe our representation. An individual in the GA's population encoded as a bit string maps to a two-dimensional circuit as shown in figures 2 and  3. We use four (4) bits to encode all 221 = 16 possible two-input, one-output gates. An additional bit specifies the location of the second input. A gate Si,j gets its first input from Si,j-1 and its second, from one of Si+1,j-1 or Si-1,j-1 as shown in figure 3.
  
Figure: A gate in a 2D template gets its second input from one of two gates in the previous column.
\begin{figure}
\centerline{
\psfig{figure=figs/gate_in.ps,height=1.2in,width=2.0in}
}
{}
\end{figure}

If the gate is in the first or last rows, the row number for the second input is calculated modulo the number of rows. The gates in the first column, Si,0 receive the input to the circuit.

Solution Similarity

The solution similarity metric needs to measure the distance between encoded circuits (solutions). We are assuming that similar circuits have similar function. Since we use a binary string to encode combinational circuits, the hamming distance between these bit strings suffices as our similarity metric on this problem. We show that this simple similarity metric works well for these parity checker design problems. Real number strings could use Euclidean distance.

Configuration Design

Using the parity problem as a basis, we randomly generate 50problems that are similar in terms of our problem similarity metric by randomly flipping $12\%$ of the output bits of the 6-bit parity checker's output bit string.

Figure 4

  
Figure: Frequency distribution of problem similarities
\begin{figure}
\centerline{
\psfig{figure=figs/5pProbDist0.5.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}

shows the distribution of problems in terms of problem similarity. Most problems' output strings are between 6 and 7 bits different, no two problems are the same, and the similarities are uniformly distributed.

In this paper, CIGAR uses a population of size 30 run for a maximum of 30 generations to solve 6-bit combinational circuit design problems (similar to parity checkers). This results in a 150 length chromosome (6 rows, 5 columns, 5 bits per gate) with a corresponding search space of 2150. The probability of crossover is 0.95 and the probability of mutation is 0.05. We inject three cases ($10\%$of population size) into the population every five generations. Previous work explains these parameter values [Louis & Johnson1997a]. All plots are averages over ten (10) runs. Figure 5 compares performance in terms of time taken to find a solution for CIGAR and a randomly initialized genetic algorithm. The figure plots time taken to find the best solution on the vertical axis and the number of problems solved on the horizontal axis.

  
Figure: Convergence speed. As more problems are attempted, CIGAR reduces the time to convergence.
\begin{figure}
\centerline{
\psfig{figure=figs/5pTime0.5.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}

The Randomly Initialized Genetic Algorithm (RIGA) uses the same set of genetic algorithm parameters. We ran both ten times with different random seeds.
  
Figure: Solution quality. As more problems are solved CIGAR produces better solutions.
\begin{figure}
\centerline{
\psfig{figure=figs/5pFit0.5.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}

Figure 6 plots the fitness of the best solution found on the vertical axis and number of problems solved on the horizontal axis. The figures clearly show that CIGAR takes less time to produce better quality solutions as it gains experience from attempting more problems.

Figure 7 plots the distribution of solutions' similarities produced by CIGAR and RIGA.

  
Figure: Solution distribution.
\begin{figure}
\centerline{
\psfig{figure=figs/5pSolnDist0.5.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}

CIGAR solutions tend to be more similar indicating that CIGAR is storing and reusing solutions and partial solutions to previously solved problems.

Figure 8 plots the solution quality as a function of experience for the following injection strategies.

1.
Closest to the best
2.
Probabilistic closest to the best
3.
Furthest from the worst
4.
Probabilistic furthest from the worst
5.
Randomly choosing individuals from the case-base to inject
6.
Creating and injecting randomly generated individuals

  
Figure: Quality of design solution versus number of problems attempted (experience).
\begin{figure}
\centerline{
\psfig{figure=figs/10pFit0.12all.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}

In all cases, we replace the worst individuals in the population with the individuals chosen (or created) by the the injection strategy. The figure shows that the probabilistic strategies and the strategy of randomly injecting individuals from the case-base result in the largest increase in quality with experience. As expected, creating and injecting random individuals does not lead to better quality with experience. Both deterministic strategies show intermediate gains in quality. Figure 9 shows a similar trend for the time to solution measure of performance.
  
Figure: Average time to find the best design versus number of problems attempted (experience).
\begin{figure}
\centerline{
\psfig{figure=figs/10pTime0.12all.ps,height=1.6in,width=2.0in}
}
{}
\end{figure}


next up previous
Next: Discussion and Conclusions Up: Learning from Experience: Case Previous: Related Work
Sushil Louis
2001-10-24