Explanation Facilities
in
Neural Nets

Research Project
for
CS 692
(Database Mining)

Matthias E. Johnson

matthias.johnson@mankato.msus.edu

Also available as postscript

Abstract:

Artificial neural networks have proven very useful and powerful in learning systems. Unfortunately neural nets are a kind of black box, which provide output without explanation of how this output was computed. This paper will examine two approaches to incorporating explanation facilities in neural nets. The discussed approaches include the extension of the neural net with a fuzzy logic structure and processing of weights with genetic algorithms.

Introduction

Artificial neural networks(ANNs) are powerful machine learning constructs. One of their key advantages is their robustness and ability to deal with erroneous or missing data. Unfortunately no explanation is provided as to how a certain output was generated. This is the primary obstacle in a more wide spread use of ANNs. It is the purpose of this paper to examine two possible ways of providing an explanation facility for neural networks.

Both of the discussed approaches deal with providing explanation facilities for back-propagation networks. There are other explanatory mechanisms not covered in this discussion which are useful to more complex types of neural networks such as connectionist semantic networks[Die92].

Fuzzy Logic

The algorithm discussed in [HNY96] uses fuzzy logic to provide an explanation facility for neural networks. Narazaki et al. describe the application of fuzzy logic to a logical formula as the extension of a truth value to a real number. For example, given the height of a person we can classify the individual as either tall or not tall using traditional sets. There is no different description for people who are somewhat tall. Fuzzy sets remedy this by providing degrees of membership[PT92]. This is illustrated in Figure 1.
Figure 1: Illustration of fuzzy set
\includegraphics{fuzzy.ps}

At the heart of the algorithm lies a grouping of training instances into disjoint subsets \(D_i\) based on their sensitivity patterns. The sensitivity pattern of an input pattern \(x\) is given as follows

\begin{displaymath}S(x) = \left(Sign\left(\frac{\delta z}{\delta
x_1}\right),Si...
..., \ldots
, Sign\left(\frac{\delta z}{\delta x_n}\right)\right)\end{displaymath}

where \(z\) is the output of the neural network and \(n\) is the size of the input vector. These patterns provide the fuzzy structure needed for the rule production and can be generated using the same input vectors used to train the network.

After this the algorithm then proceeds to further partition the obtained subsets into more disjoint subsets

\begin{displaymath}D_i = D_{i,1} \cup
D_{i,2} \cup \ldots \cup D_{i,j}, D_{i,j} \cap D_{i,k} = \phi\
\mathrm{if}\ j \neq k\end{displaymath}

This partitioning is done given the following conditions
  1. all points \(p\) which fall on the segment between the points in \(D_{i,j}\) and the center \(G(D_{i,j})\) are part of \(D_{i,j}\).
  2. \(S(p) = S(D_{i,j})\) i.e. the sensitivity pattern for \(p\) matches that of the subset it is in.

The resulting disjoint subsets are then used to calculate a closed interval by projecting the instances in \(D_{i,j}\) onto each input variable. The interval is representative of a monotonic region of the input instances i.e. all instances which fall in this region are part of the subset used for finding the region.

At this point a rule can be generated by using the center of the subset \(G(D_{i,j})\).

It is interesting to note that while this algorithm does provide an explanation facility via rule extraction, it does not provide the same accuracy as a decision tree for example. According to [HNY96] the algorithm is an approximation since there is a trade-off between readability and accuracy. The authors give the example that

we say ``Birds fly'', knowing that there is at least on bird(e.g. a penguin) that does not fly. In this example, that accuracy gives way to the simplicity of the knowledge description to a certain degree, and the occurrence of exceptions is allowed.

Genetic Algorithms

Genetic algorithms can also be the basis for an explanation facility in neural networks[ED91]. This type of algorithm attempts to model biological systems with respect to evolution[NS92]. The primary operators used by genetic algorithms are reproduction, crossover and mutation. Reproduction is used to populate the initial mating pool. Crossover describes the sharing of genetic information to produce the next generation. The last operator is mutation and serves the purpose of retaining good genetic material as well as disposing of unwanted material. The overall performance of the genetic algorithm is controlled by a fitness function, which helps to determine the goodness of a given solution.

Genetic algorithms can be used to find points on the decision hyper-surface of the input space, which provides the separation of one classification from another. Eberhardt[Ebe92] does not offer an actual mechanism for rule extraction. However, it is clear that once the input space is better mapped only a partitioning process is needed to group the points. The groupings can then be used to extract rules. This process is notably similar to the later steps in the fuzzy set algorithm.

The points on the decision hyper-surface are obtained by using an initially randomized genetic algorithm population as the input for the network. The weight matrix of the neural net is used as the fitness function governing the genetic operations. The input provided by the genetic algorithm is then propagated through the network in an attempt to find the activation value of the decision hyper-surface. This value is many times \(0.5\) and serves to distinguish between classifications. During this process the weights in the neural net are frozen.

An additional point of interest is that this technique may also be used to generate additional training samples for partially trained neural nets, if only few training samples are available. This is done by presenting the instances close to the activation value to an expert. After the expert classifies the instance it can be used as part of the training set.

Conclusion

While the genetic algorithm did not provide an explicit explanation facility it was apparent, as stated earlier, that a mechanism for rule extraction could be created fairly easily. It also seems that the combination of the two algorithms may yield even better results. Given that the fuzzy set approach provides an elegant means of partitioning the input space and the genetic algorithms could provide additional input instances to make this partitioning more precise, it should be possible to obtain generally useful rules.

Bibliography

Die92
Joachim Diederich.
Explanation in artificial neural networks.
International Journal on Man-Machine Studies, 37:335-355, 1992.

Ebe92
R. C. Eberhart.
The role of genetic algorithms in neural network query-based learning and explanation facilities.
In International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92), pages 169-183, 1992.

ED91
R. C. Eberhart and R. W. Dobbins.
Designing neural network explanation facilities using genetic algorithms.
In Proceedings of 1991 IEEE International Joint Conference on Neural Networks, volume 2, pages 1758-1763, New York, NY, Nov 1991. IEEE.

HNY96
Toshihiko Watanabe Hiroshi Narazaki and Masaki Yamamoto.
Reorganizing knowledge in neural networks: An explanatory mechanism for neural networks in data classification problems.
IEEE Transactions on Systems, Man, and Cybernetics, 26(1):107-117, February 1996.

NS92
Kendall E. Nygard and Susan M. Schilling.
Metastrategies for heuristic search.
In Proceedings of the Small College Computing Symposium, pages 213-222. SCCS, 1992.

PT92
Peter Pacini and Andrew Thorson.
Fuzzy logic primer.
Technical report, Togai InfraLogic, Inc., 1992.