KNOWLEDGE EXTRACTION FROM NEURAL NETS
THROUGH
GRAPH THEORETICAL ALGORITHMS

by

Matthias E. Johnson

Mankato, Minnesota
August, 1999

Also available in Postscript and DVI.

Date

This paper has been examined and approved.

Examining Committee:

Chairperson








Abstract:

KNOWLEDGE EXTRACTION FROM NEURAL NETS THROUGH
GRAPH THEORETICAL ALGORITHMS
Johnson, Matthias E., M.S. Minnesota State University, Mankato, 1999.

The possibilities of treating an artificial neural net as a weighted directed graph are explored. The extraction of knowledge from neural nets is attempted through the use of various algorithms from the field of graph theory. Convex subsets, directed vertex covers, and spanning trees are examined as possible approaches for rule extraction from back-propagation neural networks.


Contents


List of Tables

  1. Boolean expressions used for training and testing
  2. Total Number of Non-trivial Convex Subsets
  3. Overlap of Convex Subsets
  4. Number of Directed Vertex Covers in each Digraph
  5. Overlap of Directed Vertex Covers
  6. Overlap of Maximum Weight Spanning Tree in 2-Variable Expressions
  7. Overlap of Minimum Weight Spanning Tree in 2-Variable Expressions
  8. Overlap of Maximum Weight Spanning Tree in 3-Variable Expressions
  9. Overlap of Minimum Weight Spanning Tree in 3-Variable Expressions
  10. Overlap of Maximum Weight Spanning Tree in other Expressions
  11. Overlap of Minimum Weight Spanning Tree in other Expressions
  12. Edge Weights for $x_1 \wedge x_2$
  13. Edge Weights for $\neg x_1 \wedge x_2$
  14. Edge Weights for $( x_1 \wedge x_2 ) \vee x_3$
  15. Edge Weights for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
  16. Edge Weights for $x_1 \vee x_2$
  17. Edge Weights for $\neg x_1 \wedge x_2 \wedge x_3$
  18. Edge Weights for $\neg x_1 \wedge \neg x_2 \wedge x_3$
  19. Edge Weights for $\neg x_1 \vee x_2$
  20. Edge Weights for $( x_1 \vee x_2 ) \wedge x_3$
  21. Edge Weights for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
  22. Convex subsets for $( x_1 \wedge x_2 ) \vee x_3$
  23. Convex subsets for $x_1 \wedge x_2$
  24. Convex subsets for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
  25. Convex subsets for $\neg x_1 \wedge x_2$
  26. Convex subsets for $\neg x_1 \vee x_2$
  27. Convex subsets for $\neg x_1 \wedge x_2 \wedge x_3$
  28. Convex subsets for $( x_1 \vee x_2 ) \wedge x_3$
  29. Convex subsets for $\neg x_1 \wedge \neg x_2 \wedge x_3$
  30. Convex subsets for $x_1 \vee x_2$
  31. Convex subsets for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
  32. Spanning Trees for $x_1 \wedge x_2$
  33. Spanning Trees for $( x_1 \wedge x_2 ) \vee x_3$
  34. Spanning Trees for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
  35. Spanning Trees for $\neg x_1 \wedge x_2$
  36. Spanning Trees for $\neg x_1 \wedge x_2 \wedge x_3$
  37. Spanning Trees for $\neg x_1 \wedge \neg x_2 \wedge x_3$
  38. Spanning Trees for $\neg x_1 \vee x_2$
  39. Spanning Trees for $x_1 \vee x_2$
  40. Spanning Trees for $( x_1 \vee x_2 ) \wedge x_3$
  41. Spanning Trees for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
  42. Directed Vertex Covers for $x_1 \wedge x_2$
  43. Directed Vertex Covers for $\neg x_1 \wedge x_2$
  44. Directed Vertex Covers for $( x_1 \wedge x_2 ) \vee x_3$
  45. Directed Vertex Covers for $( x_1 \vee x_2 ) \wedge x_3$
  46. Directed Vertex Covers for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
  47. Directed Vertex Covers for $\neg x_1 \wedge x_2 \wedge x_3$
  48. Directed Vertex Covers for $\neg x_1 \wedge \neg x_2 \wedge x_3$
  49. Directed Vertex Covers for $\neg x_1 \vee x_2$
  50. Directed Vertex Covers for $x_1 \vee x_2$
  51. Directed Vertex Covers for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$


List of Figures

  1. Layers of a neural net
  2. A simple neural network with weights
  3. Example of Convex Subset, Winners, Losers
  4. Example of a Directed Vertex Cover
  5. Example of a Maximum Absolute Weight Spanning Tree
  6. Illustration of fuzzy set

INTRODUCTION

Artificial neural nets (ANNs) have been a tool to the artificial intelligence community since the 1960's. Numerous applications have since been found for these nets due to their ability to deal with dirty and incomplete data. Artificial neural nets have successfully been applied to problems as diverse as the diagnosis of medical problems and speech recognition [MP95]. Neural networks are generally used as classifiers to determine the membership of a given data instance in one group or another.

Neural networks are under-used, despite that the answers offered by such networks are statistically equal or better, when compared to those provided by other artificial intelligence constructs [Die92]. Neural networks are today used predominantly by researchers and academics. Outside of this realm, artificial neural nets are met with some skepticism due to the absence of explanation facilities [Die92]. Other artificial intelligence constructs, such as decision trees and genetic algorithms, provide their human users with explanations in the form of rules and probabilities, based on statistical analysis of the applicability of the given rule. Although neural networks are powerful tools which seem to perform well under many circumstances, their human users want an understanding of the results and not just a result.

This research examines a possible approach for the extraction of rule-based knowledge from artificial neural nets. An attempt is made to extract knowledge from neural networks by treating the networks as weighted directed graphs and partitioning the graphs based on graph theoretical properties.

Purpose

Very little work has been done to find a means of extracting rules from neural nets. Much of the research into knowledge extraction from neural nets attempts to extract knowledge primarily through additional data structures and supplemental constructs instead of relying on the data and information stored in the neural net. The only alternative to the use of additional data structures discussed in the current literature is an exhaustive brute force method to extract such knowledge.

The extraction of knowledge from neural nets should be based directly on the neural net, since the addition of extra data structures may have an adverse impact on the rules found. Any approach to finding rules and knowledge in ANNs which draws only on the structure and information in the neural nets should therefore provide a cleaner set of answers and rules. Since neural nets provide statistically appropriate answers without the help of additional data structures, construction of rules encoded within the neural net should be based only on the network itself.

Theory

Artificial neural networks, also known as connectionist systems, are commonly described as ``massively parallel interconnections of simple neurons ...'' which collaborate to function as a larger unit [MP95]. In this sense, neural networks are not unlike parallel computers where many processors work together to perform a much larger task.

The individual processing elements in a neural network are commonly known as neurons, perceptrons or cells. The function performed by these neurons is the calculation of an activation value, which consists of the summation of its inputs. The output of a neuron is discrete (with values of $\left\{-1,0,1\right\}$ or $\left\{0,1\right\}$) or continuous (with values in a range such as $\left[0,1\right]$, $\left[-1,1\right]$ or $\left[-0.5,0.5\right]$), but is always bounded. This bounding necessitates an adjustment of the activation value to satisfy the range or value requirement placed on the activation values. Sigmoid or Gaussian functions are commonly used to achieve this adjustment. Typically, each neuron in a neural network uses the same function for this adjustment.

The activation value is then passed to the next neuron via a weighted connection. The weight of the connection is used to multiply the output of a neuron before it is input to the next neuron. Thus, positive weights indicate a reinforcement of the activation value, whereas negative weights inhibit the effects of the given activation value.

The neurons in the neural net are partitioned into three layers: input, output, and hidden. The information enters the network at the input layer. The number of input neurons is therefore the same as the number of variables in the data set. Since the values of the variables in the data set may very well differ from the acceptable range or values for a neurons activation values, there may be a need to pre-process the input variables to fit within the acceptable range or values.

The output layer is where the neural net produces its answer or classification. The output layer must have at least one output node, but may very well have several, depending upon the complexity and nature of the problem presented.

It is in the hidden layer where the actual computation takes place. Unlike the input and output layers, of which there is only one each, there may be more that one hidden layer. Figure 1 shows a small neural net. Neural nets are, in general, much larger than the one shown in order to allow for a possibly large amount of knowledge to be distributed across the entire artificial neural net.

Figure 1: Layers of a neural net
\includegraphics{ann-paper.eps}

Before neural nets are usable they must be trained. This is done by presenting sample data to the network in the form of a test set which includes the desired output. The weights in the network are adjusted if the produced result does not match the expected output and left unchanged if the result was correct. The training process continues until the rate of error is acceptable, which is determined by the problem domain and the user of the neural net. Thus, the acceptable rate of misclassification for a problem upon which lives depend will be much smaller than the acceptable error rate for the optimal usage of dinner ingredients.

After training is complete, the network is tested with a test set. The test set differs from the training set to ensure that the network performs as expected on data outside of the training set. Careful attention must be paid to obtain a properly trained neural net able to handle data from across the entire problem domain. Both under- and over-training are undesirable. Under-training results when the training data are presented to the neural net too few times, leading to higher error rates. Over-training occurs when the network becomes to closely attuned to the training data. This manifests itself by providing low error rates for instances from the training set, but a much higher rate of misclassification for instances from outside the training set.

The structure of neural nets is very similar to that of weighted directed graphs or digraphs. The nodes in the neural nets become vertices in the digraphs. The neural net's connections become the weighted edges of the graph. By using the weights of the connections it is possible to treat a given neural net as a weighted digraph. The terms net, graph and digraph are used interchangeably throughout, as are edge and connection. Figure 2 shows a small neural net and the weights on its connections.

Figure 2: A simple neural network with weights
\includegraphics{ann-weight-graph.eps}

Although the normal path of travel followed by data through a neural net after training is unidirectional from the input nodes to the output node(s), it is it is possible to use the weights as a direction indicator. Positive weights indicate direction from input toward output and negative weights are directed from the output toward the input.

Definitions

For the purpose of this research only feed-forward back-propagation neural nets are considered. Feed-forward means that, during the classification process data are only fed forward through the network, and there are no cycles which provide feedback to an earlier neuron. Back-propagation is the method of training, where the adjustments of connection weights, due to misclassification of training instances, are propagated backwards through the neural net. Note that back-propagation does not stand in contradiction to a definition of feed-forward, since the back-propagation during training does not affect the classification process which only uses feed-forward.

The graph structures examined in this research are convex subsets, directed vertex covers, and spanning trees. A convex subset is a subset of vertices such that each vertex not in the subset either dominates or is dominated by all vertices within the convex subset. A vertex $s$ is said to dominate another vertex $t$ if there exists an edge leading from $s$ to $t$. All 2-paths between vertices in the same convex subset are completely contained within that convex subset. In other words, it is impossible to begin at a vertex inside the convex subset, then leave and immediately re-enter the subset with only two hops across edges. The vertices which dominate all vertices in the convex subset are called winners, whereas those dominated by the vertices in the convex subset are the losers. Figure 3 provides an example of the relationships among convex subset, winners and losers.

Figure 3: Example of Convex Subset, Winners, Losers
\includegraphics{ann-convex-subset.eps}

A directed vertex cover is a set of vertices such that each vertex not in the cover is dominated by at least one vertex inside the directed vertex cover. In general, the smallest vertex cover is sought. In this research however, all directed vertex covers are considered to allow for greater diversity in the potential rules extracted. An example of such a structure is provided in Figure 4.

Figure 4: Example of a Directed Vertex Cover
\includegraphics{ann-vertex-cover.eps}

A spanning tree is defined as a collection of edges without cycles, providing exactly one path from each vertex to every other vertex in the graph. Two types of spanning trees are considered in this research. The first type of spanning tree consists of the maximum weight edges, and the second type is based on the minimum non-zero weights. For both types of spanning trees all edge weights are treated as absolute values by eliminating the signs for all weights and using only the remaining absolute value. This leads to the largest (positive or negative) weights comprising the maximum weight spanning tree, and the smallest non-zero (positive or negative) weights to be part of the minimum weight spanning tree. Figure 5 shows a maximum absolute weight spanning tree.

Figure 5: Example of a Maximum Absolute Weight Spanning Tree
\includegraphics{ann-spanning-tree.eps}


REVIEW OF LITERATURE

The literature dealing with knowledge extraction form artificial neural networks is sparse. A search for available research revealed only three previously studied methods for extracting knowledge from neural nets.

The first method of knowledge extraction from neural nets is network inversion. This method of knowledge extraction is a brute force approach and works in the following way [Ebe92].

  1. Choose a neuron
  2. Examine all of its inputs
  3. Try all possible values of inputs
  4. Generate rules based on inputs
The approach examines what values the inputs to a neuron must have in order to receive a specific value as the output. The result is that the entire input space is mapped. The rules are then generated based on the values of the input neurons, at which the classification provided by the output neuron changes. The combined set of cut-off points is also referred to as a decision hyper-surface [ED91], since it essentially defines a set of points which partitions the input domain. While this method does work, its usefulness is limited by the exponential complexity involved and will only work for very small networks.

The algorithm discussed in [HNY96] uses fuzzy logic to provide an explanation facility for neural networks. Narazaki et al. describe the application of fuzzy logic to a logical formula as the extension of a truth value to a real number. For example, given the height of a person we can classify the individual as either ``tall'' or ``not tall'' using traditional sets. There is no different description for people who are somewhat tall. Fuzzy sets remedy this by providing degrees of membership[PT92]. Figure 6 illustrates how fuzzy sets provide much finer granularity in the classification process.

Figure 6: Illustration of fuzzy set
\includegraphics{fuzzy-paper.eps}

At the heart of the approach discussed in [HNY96] lies a partitioning of the input space $X$ into monotonic regions. The membership in a monotonic region $M$ depends on two conditions. First, any two input vectors in $M$ must be classified the same by the neural net, and second they must have the same sensitivity pattern. The sensitivity pattern of an input vector \(x\) is given as follows

\begin{displaymath}S(x) = \left(Sign\left(\frac{\delta z}{\delta
x_1}\right),Si...
..., \ldots
, Sign\left(\frac{\delta z}{\delta x_n}\right)\right)\end{displaymath}

where \(z\) is the output of the neural network, \(n\) is the size of the input vector, and the function $Sign()$ returns '$+$' if the value examined is greater or equal than $0$ and '$-$' if it is strictly less than $0$. These patterns of '$+$'s and '$-$'s are used to group the training instances and provides the fuzzy structure needed for the rule production. The sensitivity patterns can be generated using the same input vectors used to train the network.

The above process divides the input space into a set of monotonic regions
$\left\{M_i : i = 1,2, \ldots, j\right\}$. Rules are then generated by projecting each monotonic region $M_i$ onto each variable domain, which yields rules such as ``If $x_1$ is approximately $g_1^i$, and $x_2$ is approximately $g_2^i$, $\ldots$, and $x_n$ is approximately $g_n^i$, then it belongs to class $P$, where $P$ is one of the classes of the problem domain and $\left(g_1^i g_2^i \ldots
g_n^i\right)$ are the coordinates of the center of the monotonic region $M_i$.

It is interesting to note that while this algorithm does provide an explanation facility via rule extraction, it does not provide the same accuracy as a decision tree. According to [HNY96] the algorithm is an approximation since there is a trade-off between readability and accuracy. The authors give the example that

we say ``Birds fly'', knowing that there is at least one bird(e.g. a penguin) that does not fly. In this example, that accuracy gives way to the simplicity of the knowledge description to a certain degree, and the occurrence of exceptions is allowed.

Genetic algorithms can also be the foundation for an explanation facility in neural networks [ED91]. Genetic algorithms attempt to model the evolutionary behavior of biological systems [NS92]. The operators used by genetic algorithms are reproduction, crossover and mutation and the genetic material used in this research consists of the input vectors to the neural network. Reproduction is used to preserve input vectors between evolutionary cycles. Crossover produces a new vector by combining parts of two different input vectors. The last operator is mutation and serves the purpose of preventing the pool of input vectors from evolving into a dead end. Although mutation may eliminate desirable input vectors as well as disposing of unwanted input vectors, it is very useful in keeping the evolutionary process going. The overall performance of the genetic algorithm is controlled by a fitness function, which determines the goodness of a given genetic string. In this research the neural net acts as the fitness function by comparing the actual output to the expected output and using this to determine the fitness of a given input vector. This fitness value is then used to determine which genetic operator is to be applied to the given vector.

A neural net's input domain can be viewed as a multi-dimensional space where each input vector represents a point in this space. Genetic algorithms are used in finding points on the decision hyper-surface within this space of the neural net's input domain. This decision hyper-surface provides the separation of one classification from another. Alternatively, by connecting the points found by the genetic algorithm we find that the input vectors above the curve are members of one classification and those below the curve are members of another classification.

Eberhardt[Ebe92] does not offer an actual mechanism for rule extraction. However, it is clear that once the input space is mapped, only a partitioning process is needed to group the points. The groupings can then be used to extract rules. For example we find the minimum and maximum values for each variable within a grouping to generate rules of the form

\begin{displaymath}if  min_1 < x_1 < max_1 \wedge \ldots
\wedge min_y < x_y < max_y  \Rightarrow x \in  group_z.\end{displaymath}

This process is notably similar to the later steps in the fuzzy set algorithm.

Another point of interest is that this technique may also be used to generate additional training samples for partially trained neural nets, if only few training samples are available. This generation of additional training instances is done by searching out borderline instances through the use of genetic algorithms and then presenting the instances close to the activation value to an expert. After the expert classifies the instance it can be used as part of the training set.

METHODS

In order to examine the graph structures in neural nets, several such networks are trained and searched for the presence of convex subsets, directed vertex covers, and spanning trees. The neural net package used is NevProp4 developed at the University of Nevada, Reno [Goo]. This package offers the back-propagation, configuration and output features necessary for the research at hand, while being very fast and simple to use.

In order to evaluate the usefulness of the graph theoretical properties considered in this research, a number of different neural nets are trained. The resulting network weight files are then treated as weighted digraphs. If knowledge extraction is possible through the use of convex subsets, directed vertex covers, or spanning trees, then it should be possible to find certain characteristics which hold true across different neural nets. For example, the absence or presence of overlapping convex subsets, within a single digraph, may indicate a logical OR, convex subsets which share no vertices may point towards a logical NEGATION. Thus, the primary focus in this research lies in the search for similarities and differences which point toward possible rules.

In order to determine how the presence and structure of graph theoretical properties relates to potential rules, different neural nets are compared to one another. The basis for the comparison is the degree of similarity between the properties found in different neural nets. It is expected that neural networks trained with similar data, and encoding similar knowledge, also share some low-level graph properties.

Generally, graphs and sub-graphs are not compared based on the labels of their vertices, since this labeling is arbitrary. However, in this research the labeling of the vertices is fixed and crucial to the results. The size of all neural nets used in this research is a direct function of the number of input variables, and therefore the numeric label assigned to a vertex also denotes the function of the vertex. For example, the highest numbered vertex is always the output node, whereas the vertices with the smallest numeric labels always serve as input.

As an additional note, it must be pointed out that the properties used in this research do not necessarily improve upon the performance of even the brute force network inversion discussed in Chapter II. The search for all directed vertex covers, for example, requires exponential running time. However, if directed vertex cover is useful in the extraction of knowledge from neural nets, then there may be some optimization that can be used with respect to this problem, to improve upon the efficiency of the approach.

Instrument Development

The tools needed for the generation of test data and the extraction of graph structures from the trained networks are custom developed for this research. The graph extraction tools perform no evaluation or filtering of the data found. Several trained networks are used to compare the graph structures found and search for possible correlations between different networks. Similarities and differences are potential indicators of rules.

For example, it is possible that for the graphs derived from boolean expressions with a high degree of similarity there will be a high degree of overlap among convex subsets of each of the graphs. It may also be that case that overlapping and disjoint graph properties in a single network indicate the operators of the boolean expression used during training. More precisely, the overlapping substructures in a neural net may indicate a conjunction or logical AND, while completely separate substructures could point towards a disjunction or logical OR.

Population

The test data for this research consist of truth tables based on boolean expressions.

Table 1: Boolean expressions used for training and testing
(1) $x_1 \wedge x_2$
(2) $x_1 \vee x_2$
(3) $\neg x_1 \wedge x_2$
(4) $\neg x_1 \vee x_2$
(5) $\neg x_1 \wedge \neg x_2 \wedge x_3$
(6) $\neg x_1 \wedge x_2 \wedge x_3$
(7) $( x_1 \wedge x_2 ) \vee x_3$
(8) $( x_1 \vee x_2 ) \wedge x_3$
(9) $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
(10) $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$


The boolean expressions are numbered and listed in Table 1. Boolean expressions are used for the training and testing process, since they allow for complete prior knowledge of the problem domain. Unlike other datasets from fields such as medicine and science, where the correlation between input and output is not as clearly defined, this approach allows for a concentrated search for graph structures resembling a potential encoding of the boolean expression within the neural net. Additionally, by using the boolean expressions it is possible to define and constrain the complexity and size of resulting neural networks, since it is known that there are no hidden or missing relations in the data.

In order to gain a good understanding, care is taken to use a number of different expressions. The first four expressions in Table 1 use two distinct variables. The next four expressions are based on three variables each. The last two expressions have four input variables. Although these last two expressions are technically based on only two variables, it was necessary to treat each occurrence of a variable as a separate input in order to properly train the network. Proper training of a neural network depends on the application and the acceptable margin of error. In this research proper training is assumed to be perfect classification of all training data.

Expression $(9)$ in Table 1 is based on the XOR operator, but expressed using only the operators AND, OR and NEGATION. The truth table based on XOR operator failed to train the network without errors, and always misclassified some of the training and testing instances. The re-written expression, however, manages to achieve proper training of the network, with perfect classification of all training and testing data, when four distinct input variables are used. Expression $(10)$, the complement of expression $(9)$, is present to allow a point of comparison for expression $(9)$. Each occurrence of the variables will be fed to a different input node of the neural network and, thus, require four input nodes.

Several tests are used to examine potential opportunities for rule extraction from neural nets.

  1. Given that the expressions $(1)$ and $(3)$, both use conjunctions, it is expected that there exists some degree of overlap in the convex subsets, directed vertex covers and spanning trees found in the digraphs based on those expressions.
  2. Expressions $(2)$ and $(4)$ use disjunctions and should therefore contain a low level of overlap of convex subsets, directed vertex covers and spanning trees.
  3. The change of the only operator in the expressions $(1)$ and $(2)$ is expected to produce very little overlap of convex subsets, directed vertex covers and spanning trees.
  4. Similarities in expressions $(3)$ and $(4)$ should generate some low level of overlap in the convex subsets, directed vertex covers and spanning trees found.
  5. Comparing the expressions $(1)$ and $(4)$ is expected to yield very little overlap in convex subsets, directed vertex covers and spanning trees since their respective operators very greatly.
  6. A lack of overlap in convex subsets, directed vertex covers and spanning trees is expected for the expressions $(2)$ and $(3)$ due to the significant differences between them.
  7. The expressions $(5)$ and $(6)$ are expected to have a significant overlap of convex subsets, directed vertex covers and spanning trees, since the expressions only differ in one operator.
  8. The change in precedence and operators for expression $(7)$ and $(8)$ should result in very low overlap of convex subsets, directed vertex covers and spanning trees
  9. $(9)$ and $(10)$ are complements of one another and as such are expected to have very little overlap of convex subsets, directed vertex covers and spanning trees.
In addition to comparing expressions to one another, the influence of the operators and the number of variables is examined as well.

Data Collection

In order to minimize the size and thus the complexity of the ANNs several tests were conducted to determine the smallest networks that can properly classify the entire training set. These tests show that at least $2n+1$ nodes are necessary, where $n$ is the number of input nodes to the neural net. This calculation yields a network configuration with $n$ input nodes, $n$ hidden nodes and one output node. As mentioned earlier, each occurrence of a variable in a boolean expression requires its own input node. Another result of these preliminary tests is that each neural net must be trained by presenting it with the training and testing data $200$ times. Using this number of iterations results in the perfect classification of all training and testing data for each of the neural nets. Perfect classification in this case means that the neural net makes no mistakes given the testing or training data.

While it is generally necessary to split the available data into two independent sets to verify that the network can handle data other than that contained in the training set, this is not necessary here. Since all data for the problem are available, all data, or the entire truth table, is used for both training and testing. This choice of training and testing data also averts the problem of over-training in which the network is able to handle only the training and not the testing datasets, which in this case are identical.

To further make the neural nets more uniform a full connection model is chosen. Fully connected with respect to neural nets means that each node on a given layer is connected to all nodes on the following layer. No connections exist between nodes of the same layer. During testing all neural nets had their weights initialized to $0$ to further level the playing field between the different nets.

FINDINGS OF THE STUDY

In general, comparing digraphs based on their vertex labels is not practical. In this research however, the label is of significance since it describes the function of the vertex. For example, the highest numbered vertex is always the output node of the neural net, whereas the $n$ lowest numbered vertices are those receiving input. This observation is also relevant when considering that any potentially generated rules must be somehow related to the problem domain. The only means of doing so is in relying on input and output nodes. It is for this reason that the labeling is considered to be fixed: identical vertex subsets from different digraphs are treated as identical when comparing properties of those digraphs. For example, the subset $\left\{1,3,4,5\right\}$ occurring in two digraphs of equal size are considered to be identical for the purpose of this research.

Convex Subsets

In order to find the convex subsets, the neural nets are treated as digraphs, where the direction of a given edge is dependent upon the sign of its weight. A negative weight is interpreted as flowing in the opposite direction of the normal ANN from-input-to-output layer path. The convex subsets are used to compare the digraphs to one another. The digraphs based on similar boolean expressions are examined for an overlap in the sets of convex subsets. A small overlap is expected for digraphs based on expressions with similar features, such as conjunction, negation and number of variables.


Table 2: Total Number of Non-trivial Convex Subsets
  Number of
Expression Convex Subsets
$x_1 \wedge x_2$ 8
$x_1 \vee x_2$ 8
$\neg x_1 \wedge x_2$ 13
$\neg x_1 \vee x_2$ 8
$\neg x_1 \wedge \neg x_2 \wedge x_3$ 32
$\neg x_1 \wedge x_2 \wedge x_3$ 43
$( x_1 \wedge x_2 ) \vee x_3$ 57
$( x_1 \vee x_2 ) \wedge x_3$ 15
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ 28
$( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 28


All of the tested boolean expressions generated convex subsets. Table 2 lists the number of convex subsets found for each boolean expression. The number of convex subsets present in the digraphs increases as the complexity is increased through the addition of precedence operators, variables and additional operators including negation. Note that the last two expressions both use only two variables.


Table 3: Overlap of Convex Subsets
Number of Convex Subsets
shared unique to unique to
Expr. 1 Expr. 2 by both Expr. 1 Expr. 2
$x_1 \wedge x_2$ $x_1 \vee x_2$ 8 0 0
$\neg x_1 \wedge x_2$ $\neg x_1 \vee x_2$ 8 5 0
$x_1 \wedge x_2$ $\neg x_1 \wedge x_2$ 6 2 6
$x_1 \vee x_2$ $\neg x_1 \vee x_2$ 6 2 2
$x_1 \wedge x_2$ $\neg x_1 \vee x_2$ 6 2 2
$x_1 \vee x_2$ $\neg x_1 \wedge x_2$ 6 2 7
$\neg x_1 \wedge x_2 \wedge x_3$ $\neg x_1 \wedge \neg x_2 \wedge x_3$ 20 23 12
$( x_1 \wedge x_2 ) \vee x_3$ $( x_1 \vee x_2 ) \wedge x_3$ 15 42 0
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 28 0 0


Table 3 shows the overlap found between digraphs based on similar boolean expressions. Values are provided for the number of convex subset found in both expressions and those unique to each expression. Examining both tables we find that the number of convex subsets remains the same or increases when negation is added or a disjunction is replaced with a conjunction. This behavior is visible in the expressions $\neg x_1 \wedge x_2$ and $\neg x_1 \vee x_2$, and $\neg x_1 \wedge x_2 \wedge x_3$ and $\neg x_1 \wedge \neg x_2 \wedge x_3$.

A comparison between the convex subsets found for neural networks trained with truth tables from different boolean expressions yields two sets with identical convex subsets. These sets are found in the digraphs based on the expressions $x_1 \wedge x_2$ and $x_1 \vee x_2$. The expression $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ and the XOR expression $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ also yields the exact same set of convex subsets, but the sets of winners and losers differ in this case. A full listing of the convex subsets found for each of the neural nets is available in Appendix A.

Directed Vertex Covers

In the search for potential directed vertex covers, the ANN is treated similarly to the case of the convex subsets. Again, the natural direction of the edges is reversed if the given weight of the edge is negative. All digraphs produced numerous directed vertex covers. The tests are centered around a comparison of the directed vertex covers found for different digraphs. The overlap in the directed vertex covers is examined based on the similarity of the boolean expressions used in the creation of the neural net. The occurrence of similar operators, such as conjunction and disjunction, as well as negation and the number of variables, are examined for the presence of overlap in the vertex covers discovered. As is the case with the convex subsets, attention is also paid to the effect which operators and the number of variables have on the overall number of directed vertex covers discovered.

Table 4: Number of Directed Vertex Covers in each Digraph
  Number of Directed
Expression Vertex Covers
$x_1 \wedge x_2$ 18
$x_1 \vee x_2$ 18
$\neg x_1 \wedge x_2$ 8
$\neg x_1 \vee x_2$ 22
$\neg x_1 \wedge \neg x_2 \wedge x_3$ 94
$\neg x_1 \wedge x_2 \wedge x_3$ 32
$( x_1 \wedge x_2 ) \vee x_3$ 16
$( x_1 \vee x_2 ) \wedge x_3$ 76
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ 248
$( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 248


The number of directed vertex covers found in each digraph is shown in Table 4. Unlike with convex subsets, there is no discernible trend in the quantity of directed vertex covers present with the addition of negation or the change of disjunction and conjunction operators.


Table 5: Overlap of Directed Vertex Covers
Number of Directed
Vertex Covers
shared unique to unique to
Expr. 1 Expr. 2 by both Expr. 1 Expr. 2
$x_1 \wedge x_2$ $x_1 \vee x_2$ 18 0 0
$\neg x_1 \wedge x_2$ $\neg x_1 \vee x_2$ 6 2 16
$x_1 \wedge x_2$ $\neg x_1 \wedge x_2$ 6 12 2
$x_1 \vee x_2$ $\neg x_1 \vee x_2$ 18 0 4
$x_1 \wedge x_2$ $\neg x_1 \vee x_2$ 18 0 4
$x_1 \vee x_2$ $\neg x_1 \wedge x_2$ 6 12 2
$\neg x_1 \wedge x_2 \wedge x_3$ $\neg x_1 \wedge \neg x_2 \wedge x_3$ 24 8 70
$( x_1 \wedge x_2 ) \vee x_3$ $( x_1 \vee x_2 ) \wedge x_3$ 16 0 60
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 124 124 124


As with the convex subsets, the expressions $x_1 \wedge x_2$ and $x_1 \vee x_2$ yield the exact same set of directed vertex covers. The expressions compared in Table 5 are identical to those in Table 3. The pairing of similar expressions in Table 5 shows that those expressions with conjunctions and/or negation produce at least as many directed vertex covers as the expression which lack negation or use disjunctions.

Spanning Trees

In the search for spanning trees the sign of an edge weight is ignored, since it does not directly affect the flow in the neural net and forcing the search to honor the sign can potentially affect the presence of a spanning tree. Using the absolute values of the weights it is possible to direct the search towards an overall activation value. The activation value has a direct affect on how strongly or subtly the output of a neuron or vertex is changed. For this reason two searches are conducted for both the spanning tree with the smallest combined non-zero edge weight and for the largest combined edge weight. The degree of overlap between the spanning trees for different digraphs is examined as a possible indicator of knowledge. Unlike the test for convex subsets and directed vertex covers, the affect on the size of the set found is ignored, since the size of the spanning tree is fixed based on the size of the digraph.

Table 6: Overlap of Maximum Weight Spanning Tree in 2-Variable Expressions
  $x_1 \wedge x_2$ $x_1 \vee x_2$ $\neg x_1 \wedge x_2$ $\neg x_1 \vee x_2$
$x_1 \wedge x_2$ NA 2 2 2
$x_1 \vee x_2$ 2 NA 4 4
$\neg x_1 \wedge x_2$ 2 4 NA 4
$\neg x_1 \vee x_2$ 2 4 4 NA



Table 7: Overlap of Minimum Weight Spanning Tree in 2-Variable Expressions
  $x_1 \wedge x_2$ $x_1 \vee x_2$ $\neg x_1 \wedge x_2$ $\neg x_1 \vee x_2$
$x_1 \wedge x_2$ NA 2 2 2
$x_1 \vee x_2$ 2 NA 4 3
$\neg x_1 \wedge x_2$ 2 4 NA 3
$\neg x_1 \vee x_2$ 2 3 3 NA



Table 8: Overlap of Maximum Weight Spanning Tree in 3-Variable Expressions
  $\neg x_1 \wedge \neg x_2 \wedge x_3$ $\neg x_1 \wedge x_2 \wedge x_3$ $( x_1 \wedge x_2 ) \vee x_3$ $( x_1 \vee x_2 ) \wedge x_3$
$\neg x_1 \wedge \neg x_2 \wedge x_3$ NA 3 3 3
$\neg x_1 \wedge x_2 \wedge x_3$ 3 NA 3 4
$( x_1 \wedge x_2 ) \vee x_3$ 3 3 NA 5
$( x_1 \vee x_2 ) \wedge x_3$ 3 4 5 NA



Table 9: Overlap of Minimum Weight Spanning Tree in 3-Variable Expressions
  $\neg x_1 \wedge \neg x_2 \wedge x_3$ $\neg x_1 \wedge x_2 \wedge x_3$ $( x_1 \wedge x_2 ) \vee x_3$ $( x_1 \vee x_2 ) \wedge x_3$
$\neg x_1 \wedge \neg x_2 \wedge x_3$ NA 2 2 5
$\neg x_1 \wedge x_2 \wedge x_3$ 2 NA 4 2
$( x_1 \wedge x_2 ) \vee x_3$ 2 4 NA 3
$( x_1 \vee x_2 ) \wedge x_3$ 5 2 3 NA



Table 10: Overlap of Maximum Weight Spanning Tree in other Expressions
  $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ NA 8
$( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 8 NA



Table 11: Overlap of Minimum Weight Spanning Tree in other Expressions
  $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
$( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ NA 3
$( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ 3 NA


Table 6 and Table 7 show the number of edges common between the maximum and minimum spanning trees respectively. The expressions based on two variables produce digraphs with five vertices. The number of edges for the spanning tree in such digraphs is thus four. Table 6 shows that the digraphs based on the expressions $x_1 \vee x_2$, $\neg x_1 \wedge x_2$, and $\neg x_1 \vee x_2$ have identical maximum spanning trees. The minimum spanning tree for the expressions $x_1 \vee x_2$ and $\neg x_1 \wedge x_2$ are also identical according to the data presented in Table 7.

The number of edges needed for a spanning tree in the 3-variable expressions is six given that there are $2*3+1$ vertices. The overlap in spanning trees in digraphs based on 3-variable expressions are shown in Table 8 and Table 9. In this set of digraphs there are no identical maximum or minimum spanning trees between the different digraphs.

The expression $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$ and the XOR expression $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$ exhibited a perfect match for the maximum spanning tree as illustrated in Table 10. However, the minimum spanning trees for those expressions differed, as illustrated in Table 11.

CONCLUSIONS AND RECOMMENDATIONS

There are a number of trends which are apparent across the data collected on convex subsets and directed vertex covers. For example, that presence of negation and conjunction appears to increase the number of convex subsets found in the digraph. This trend is reversed with respect to directed vertex covers, where the lack of negation and disjunction yields a more simplified graph in terms of the number of directed vertex covers. A notable exception to these trends is the expression $\neg x_1 \wedge x_2$, which provided the smallest number of directed vertex covers.

The presence of identical convex subsets and directed vertex covers for the expressions $x_1 \wedge x_2$ and $x_1 \vee x_2$ is a strong indication that neither of these two approaches offer a great hope for the extraction of knowledge from neural nets, since it is impossible to distinguish between the two expressions. Given that the expressions are identical from the perspective of convex subset and directed vertex cover, it is impossible to extract rules to reflect the different original expressions.

The spanning trees approach also provided identical matches for the expressions $x_1 \vee x_2$, $\neg x_1 \wedge x_2$, and $\neg x_1 \vee x_2$ as well as $x_1 \vee x_2$ and $\neg x_1 \wedge x_2$. For the first group, the maximum spanning tree was identical among all three digraphs, and the second group of expressions shared the minimum spanning tree in common. This again renders these approaches ineffective as means of knowledge extraction from neural nets for the aforementioned reasons.

A brief examination of the weights produced by the neural net package, which is provided in Appendix A, shows that the weights appear to occur in pairs of similar absolute values. This symmetry is most likely due to the dichotomous nature of the input used and raises some additional doubts about the usefulness of the spanning tree approach.

Overall it appears that the structure of the neural network alone is not enough to extract rules from a neural net. The performance of convex subsets and directed vertex covers are likely suffering because they fail to consider the actual weight of the edges beyond the sign. However, even the spanning tree approach did not produce unique data for each expression, despite taking the weights into consideration. The performance of the spanning tree is likely due to a disregard of the entire graph, since the spanning tree is only a substructure. In addition to these observations, it must be again noted that the graph properties examined here are not necessarily faster than even the brute force approach discussed in Chapter II.

The trends found with respect to the occurrence of convex subsets and directed vertex covers, which seem to complement each other in a very slight way, may be a useful addition to other potential approaches in knowledge extraction from neural nets. Many other graph theoretical properties exist and may provide better explanation facilities. To further explore the effectiveness of graph algorithms, in the extraction of rules from neural nets, it seems prudent to concentrate on techniques, which take the weights into account. Possibly, the combination of two or more algorithms could yield a desirable result. It may also be fruitful to examine the weights closer before assigning a direction to the connections. Perhaps the average weight could be used as the cutoff for assigning a direction instead of $0$, as is the case in this research. Another possibility is to use the weights to prune or eliminate some of the edges. Pruning may lead to more distinct networks with respect to the structures examined in this research. In addition to these approaches it is also feasible that an alteration of the examined structures may yield positive results. For example, building directed vertex covers and convex subsets based on standard deviations of the connection weights could make it possible to make distinctions between different networks and perhaps rule extraction. In the end, the information is still stored in the network and, thus, it must be possible to retrieve it.

Bibliography

Die92
Joachim Diederich.
Explanation in artificial neural networks.
International Journal on Man-Machine Studies, 37:335-355, 1992.

Ebe92
R. C. Eberhart.
The role of genetic algorithms in neural network query-based learning and explanation facilities.
In International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92), pages 169-183, 1992.

ED91
R. C. Eberhart and R. W. Dobbins.
Designing neural network explanation facilities using genetic algorithms.
In Proceedings of 1991 IEEE International Joint Conference on Neural Networks, volume 2, pages 1758-1763, New York, NY, Nov 1991. IEEE.

Goo
Philip H. Goodman.
Nevprop4:artificial neural network software for statistical prediction.
http://www.scsr.nevada.edu/nevprop/.

HNY96
Toshihiko Watanabe Hiroshi Narazaki and Masaki Yamamoto.
Reorganizing knowledge in neural networks: An explanatory mechanism for neural networks in data classification problems.
IEEE Transactions on Systems, Man, and Cybernetics, 26(1):107-117, February 1996.

MP95
Sushmita Mitra and Sankar K. Pal.
Fuzzy multi-layer perceptron, inferencing and rule generation.
IEEE Transactions on Neural Networks, 6(1):51-63, January 1995.

NS92
Kendall E. Nygard and Susan M. Schilling.
Metastrategies for heuristic search.
In Proceedings of the Small College Computing Symposium, pages 213-222. SCCS, 1992.

PT92
Peter Pacini and Andrew Thorson.
Fuzzy logic primer.
Technical report, Togai InfraLogic, Inc., 1992.


COLLECTED DATA

Neural Net Weights
The following pages contain the weights of the trained neural networks.


Table 12: Edge Weights for $x_1 \wedge x_2$
Edge Weight
(3,1) 3.15202
(3,2) 3.16385
(4,1) -2.9848
(4,2) -2.96822
(5,3) 7.66868
(5,4) -7.3074



Table 13: Edge Weights for $\neg x_1 \wedge x_2$
Edge Weight
(3,1) 3.74577
(3,2) -3.37266
(4,1) 3.74639
(4,2) -3.37361
(5,3) -6.85934
(5,4) -6.86085



Table 14: Edge Weights for $( x_1 \wedge x_2 ) \vee x_3$
Edge Weight
(4,1) 2.24311
(4,2) 2.26489
(4,3) 4.69642
(5,1) 2.04371
(5,2) 2.05979
(5,3) 4.34941
(6,1) 1.91655
(6,2) 1.88848
(6,3) 4.10906
(7,4) 7.19451
(7,5) 6.74664
(7,6) 6.32229



Table 15: Edge Weights for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
Edge Weight
(4,1) 2.24311
(4,2) 2.26489
(4,3) 4.69642
(5,1) 2.24383
(5,2) 1.89193
(5,3) 2.24383
(5,4) 1.89818
(6,1) -2.86743
(6,2) -2.92232
(6,3) -2.8508
(6,4) -2.98965
(7,1) 1.61428
(7,2) -1.97341
(7,3) 1.69459
(7,4) -1.98589
(8,1) -1.49351
(8,2) 0.807372
(8,3) -1.49146
(8,4) 1.05283
(9,5) 9.25905
(9,6) 9.31185
(9,7) -4.01554
(9,8) -3.25489



Table 16: Edge Weights for $x_1 \vee x_2$
Edge Weight
(3,1) 3.47937
(3,2) 3.47937
(4,1) -3.47953
(4,2) -3.47951
(5,3) 6.70906
(5,4) -6.70935



Table 17: Edge Weights for $\neg x_1 \wedge x_2 \wedge x_3$
Edge Weight
(4,1) 2.58427
(4,2) -2.39833
(4,3) -2.39955
(5,1) 2.62855
(5,2) -2.41021
(5,3) -2.44096
(6,1) 2.46888
(6,2) -2.22714
(6,3) -2.19911
(7,4) -6.35295
(7,5) -6.41536
(7,6) -5.94335



Table 18: Edge Weights for $\neg x_1 \wedge \neg x_2 \wedge x_3$
Edge Weight
(4,1) 2.81053
(4,2) 2.81574
(4,3) -2.5222
(5,1) 2.80223
(5,2) 2.80732
(5,3) -2.5168
(6,1) -2.8642
(6,2) -2.84914
(6,3) 2.61967
(7,4) -5.71686
(7,5) -5.69239
(7,6) 5.82244



Table 19: Edge Weights for $\neg x_1 \vee x_2$
Edge Weight
(3,1) 3.37041
(3,2) -3.74498
(4,1) -3.37602
(4,2) 3.74728
(5,3) -6.85684
(5,4) 6.86319



Table 20: Edge Weights for $( x_1 \vee x_2 ) \wedge x_3$
Edge Weight
(4,1) -4.45428
(4,2) -4.48298
(4,3) -0.865505
(5,1) 0.871718
(5,2) 0.921963
(5,3) -4.99305
(6,1) 1.37743
(6,2) 1.39556
(6,3) 3.17766
(7,4) -8.40771
(7,5) -7.59491
(7,6) 7.91462



Table 21: Edge Weights for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
Edge Weight
(5,1) 1.69364
(5,2) -1.01507
(5,3) 1.65296
(5,4) -1.0253
(6,1) 3.22234
(6,2) 2.91392
(6,3) 3.16291
(6,4) 2.95664
(7,1) -0.215332
(7,2) 0.585917
(7,3) -0.0334094
(7,4) 0.569212
(8,1) -2.28083
(8,2) -2.38674
(8,3) -2.17656
(8,4) -2.53572
(9,5) -1.97741
(9,6) 9.84791
(9,7) -1.14815
(9,8) 10.1616


Convex Subsets

The tables on the following pages list all convex subsets, as well as the respective sets of winners and losers for each digraph.


Convex Winners Losers
6 7 1 2 3 4 5  
5 7 1 2 3 4 6  
5 6 7 1 2 3 4  
4 7 1 2 3 5 6  
4 6 7 1 2 3 5  
4 5 7 1 2 3 6  
4 5 6 1 2 3 7
4 5 6 7 1 2 3  
3 6 1 2 4 5 7
3 5 1 2 4 6 7
3 5 6 1 2 4 7
3 4 1 2 5 6 7
3 4 6 1 2 5 7
3 4 5 1 2 6 7
3 4 5 6 1 2 7
2 6 1 3 4 5 7
2 5 1 3 4 6 7
2 5 6 1 3 4 7
2 4 6 1 3 5 7
2 4 1 3 5 6 7
2 4 6 1 3 5 7
2 4 5 1 3 6 7
2 4 5 6 1 3 7
2 3 6 1 4 5 7
2 3 5 1 4 6 7
2 3 5 6 1 4 7
2 3 4 1 5 6 7
2 3 4 6 1 5 7
2 3 4 5 1 6 7
2 3 4 5 6 1 7
1 6 2 3 4 5 7
1 5 2 3 4 6 7
1 5 6 2 3 4 7
1 4 2 3 5 6 7
1 4 6 2 3 5 7
1 4 5 2 3 6 7
1 4 5 6 2 3 7


Table 22: Convex subsets for $( x_1 \wedge x_2 ) \vee x_3$
1 3 6 2 4 5 7
1 3 5 2 4 6 7
1 3 5 6 2 4 7
1 3 4 2 5 6 7
1 3 4 6 2 5 7
1 3 4 5 2 6 7
1 3 4 5 6 2 7
1 2 6 3 4 5 7
1 2 5 3 4 6 7
1 2 5 6 3 4 7
1 2 4 3 5 6 7
1 2 4 6 3 5 7
1 2 4 5 3 6 7
1 2 4 5 6 3 7
1 2 3 6   4 5 7
1 2 3 5   4 6 7
1 2 3 5 6   4 7
1 2 3 4   5 6 7
1 2 3 4 6   5 7
1 2 3 4 5   6 7
1 2 3 4 5 6   7



Table 23: Convex subsets for $x_1 \wedge x_2$
Convex Winners Losers
4 5 3 1 2
3 5 1 2 4
2 4 5 1 3
2 3 1 4 5
1 4 5 2 3
1 3 2 4 5
1 2 4 5 3
1 2 3 4 5



Table 24: Convex subsets for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
Convex Winners Losers
8 9 2 4 5 6 1 3 7
7 9 1 3 5 6 2 4 8
6 9 5 1 2 3 4 7 8
5 9 1 2 3 4 6 7 8
4 8 2 6 7 9 1 3 5
4 7 1 3 6 9 2 5 8
4 6 7 1 2 3 5 8 9
4 5 1 2 3 6 7 8 9
3 8 2 4 6 9 1 5 7
3 7 1 6 8 9 2 4 5
3 6 8 1 2 4 5 7 9
3 5 1 2 4 6 8 7 9
2 8 4 6 7 9 1 3 5
2 7 1 3 6 9 4 5 8
2 6 7 1 3 4 5 8 9
2 5 1 3 4 6 7 8 9
2 4 8 6 7 9 1 3 5
2 4 7 1 3 6 9 5 8
2 4 6 7 1 3 5 8 9
2 4 5 1 3 6 7 8 9
1 8 2 4 6 9 3 5 7
1 7 3 6 8 9 2 4 5
1 6 8 2 3 4 5 7 9
1 5 2 3 4 6 8 7 9
1 3 8 2 4 6 9 5 7
1 3 7 6 8 9 2 4 5
1 3 6 8 2 4 5 7 9
1 3 5 2 4 6 8 7 9



Table 25: Convex subsets for $\neg x_1 \wedge x_2$
Convex Winners Losers
4 5 1 2 3
3 5 1 2 4
3 4 1 5 2
3 4 5 1 2
2 4 1 3 5  
2 3 1 4 5  
2 3 4 1 5  
1 4 5 2 3
1 4 5   2 3
1 3 5 2 4
1 3 5   2 4
1 3 4 5 2
1 3 4 5   2



Table 26: Convex subsets for $\neg x_1 \vee x_2$
Convex Winners Losers
4 5 2 1 3
3 5 1 4 2
2 4 3 1 5
2 3 1 5 4
1 4 2 3 5
1 4 5 2 3
1 3 4 5 2
1 3 5 4 2



Convex Winners Losers
6 7 1 2 3 4 5
5 7 1 2 3 4 6
5 6 7 1 2 3 4
4 7 1 2 3 5 6
4 6 7 1 2 3 5
4 5 7 1 2 3 6
4 5 6 1 7 2 3
4 5 6 7 1 2 3
3 6 1 4 5 7 2
3 5 1 4 6 7 2
3 5 6 1 4 7 2
3 4 1 5 6 7 2
3 4 6 1 5 7 2
3 4 5 1 6 7 2
3 4 5 6 1 7 2
2 6 1 4 5 7 3
2 5 1 4 6 7 3
2 5 6 1 4 7 3
2 4 1 5 6 7 3
2 4 6 1 5 7 3
2 4 5 1 6 7 3
2 4 5 6 1 7 3
2 3 6 1 4 5 7  
2 3 5 1 4 6 7  
2 3 5 6 1 4 7  
2 3 4 1 5 6 7  
2 3 4 6 1 5 7  
2 3 4 5 1 6 7  
2 3 4 5 6 1 7  
1 6 7 2 3 4 5
1 6 7   2 3 4 5
1 5 7 2 3 4 6
1 5 7   2 3 4 6
1 5 6 7 2 3 4
1 5 7   2 3 4 6
1 5 6 7 2 3 4
1 5 7   2 3 4 6
1 5 6 7   2 3 4


Table 27: Convex subsets for $\neg x_1 \wedge x_2 \wedge x_3$
1 4 7 2 3 5 6
1 4 7   2 3 5 6
1 4 6 7 2 3 5
1 4 6 7   2 3 5
1 4 5 7 2 3 6
1 4 5 7   2 3 6
1 4 5 6 7 2 3
1 4 5 6 7   2 3



Table 28: Convex subsets for $( x_1 \vee x_2 ) \wedge x_3$
Convex Winners Losers
6 7 1 2 3 4 5
5 7 1 2 6 3 4
4 7 6 1 2 3 5
3 6 1 2 4 5 7
3 5 1 2 4 7 6
3 4 5 7 1 2 6
2 6 1 3 4 5 7
2 5 1 4 7 3 6
2 4 7 1 3 5 6
1 6 2 3 4 5 7
1 5 2 4 7 3 6
1 4 7 2 3 5 6
1 2 6 3 4 5 7
1 2 5 4 7 3 6
1 2 4 7 3 5 6



Table 29: Convex subsets for $\neg x_1 \wedge \neg x_2 \wedge x_3$
Convex Winners Losers
6 7 3 1 2 4 5
5 7 1 2 6 3 4
4 7 1 2 6 3 5
4 5 7 1 2 6 3
3 6 4 5 1 2 7
3 5 1 2 4 7 6
3 4 1 2 5 7 6
3 4 5 1 2 7 6
2 6 3 1 4 5 7
2 6 7 3 1 4 5
2 5 1 6 7 3 4
2 5 7 1 6 3 4
2 4 1 6 7 3 5
2 4 7 1 6 3 5
2 4 5 1 6 7 3
2 4 5 7 1 6 3
1 6 3 2 4 5 7
1 6 7 3 2 4 5
1 5 2 6 7 3 4
1 5 7 2 6 3 4
1 4 2 6 7 3 5
1 4 7 2 6 3 5
1 4 5 2 6 7 3
1 4 5 7 2 6 3
1 2 6 3 4 5 7
1 2 6 7 3 4 5
1 2 5 6 7 3 4
1 2 5 7 6 3 4
1 2 4 6 7 3 5
1 2 4 7 6 3 5
1 2 4 5 6 7 3
1 2 4 5 7 6 3



Table 30: Convex subsets for $x_1 \vee x_2$
Convex Winners Losers
4 5 3 1 2
3 5 1 2 4
2 4 5 1 3
2 3 1 4 5
1 4 5 2 3
1 3 2 4 5
1 2 4 5 3
1 2 3 4 5



Table 31: Convex subsets for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
Convex Winners Losers
8 9 6 1 2 3 4 5 7
7 9 2 4 6 8 1 3 5
6 9 1 2 3 4 8 5 7
5 9 1 3 6 8 2 4 7
4 8 5 1 2 3 6 7 9
4 7 2 5 8 9 1 3 6
4 6 1 2 3 5 8 7 9
4 5 1 3 8 9 2 6 7
3 8 7 1 2 4 5 6 9
3 7 2 4 8 9 1 5 6
3 6 1 2 4 7 8 5 9
3 5 1 7 8 9 2 4 6
2 8 5 1 3 4 6 7 9
2 7 4 5 8 9 1 3 6
2 6 1 3 4 5 8 7 9
2 5 1 3 8 9 4 6 7
2 4 8 5 1 3 6 7 9
2 4 7 5 8 9 1 3 6
2 4 6 1 3 5 8 7 9
2 4 5 1 3 8 9 6 7
1 8 7 2 3 4 5 6 9
1 7 2 4 8 9 3 5 6
1 6 2 3 4 7 8 5 9
1 5 3 7 8 9 2 4 6
1 3 8 7 2 4 5 6 9
1 3 7 2 4 8 9 5 6
1 3 6 2 4 7 8 5 9
1 3 5 7 8 9 2 4 6


Spanning Trees
The spanning tress found are listed on the following pages. MIN denotes the edges in the minimum spanning tree, and MAX is the listing of edges in the maximum spanning tree.


Table 32: Spanning Trees for $x_1 \wedge x_2$
MIN (2,4),(4,1)(1,3),(4,5)
MAX (3,5),(5,4),(3,2),(3.1)



Table 33: Spanning Trees for $( x_1 \wedge x_2 ) \vee x_3$
MIN (2,6),(6,1),(1,5),(1,4),(6,3),(6,7)
MAX (4,7),(7,5),(7,6),(4,3),(4,2),(4,1)



Table 34: Spanning Trees for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$
MIN (2,8),(8,4),(8,3),(8,1),(1,7),(2,5),(3,6),(8,9)
MAX (6,9),(9,5),(9,7),(9,8),(6,4),(6,2),(6,1),(6,3)



Table 35: Spanning Trees for $\neg x_1 \wedge x_2$
MIN (2,3),(2,4),(3,1),(3.5)
MAX (4,5),(5,3),(4,1),(4,2)



Table 36: Spanning Trees for $\neg x_1 \wedge x_2 \wedge x_3$
MIN (3,6),(6,2),(2,4),(2,5),(6,1),(6,7)
MAX (5,7),(7,4),(7,6),(5,1),(5,3),(5,2)



Table 37: Spanning Trees for $\neg x_1 \wedge \neg x_2 \wedge x_3$
MIN (3,5),(3,4),(3,6),(5,1),(5,2),(5,7)
MAX (6,7),(7,4),(7,5),(6,1),(6,2),(6,3)



Table 38: Spanning Trees for $\neg x_1 \vee x_2$
MIN (1,3),(1,4),(3,2),(3,5)
MAX (4,5),(5,3),(4,2),(4,1)



Table 39: Spanning Trees for $x_1 \vee x_2$
MIN (2,3),(3,1),(2,4),(3,5)
MAX (4,5),(5,3),(4,1),(4,2)



Table 40: Spanning Trees for $( x_1 \vee x_2 ) \wedge x_3$
MIN (3,4),(3,6),(6,1),(1,5),(5,2),(5,7)
MAX (4,7),(7,6),(7,5),(5,3),(4,2),(4,1)



Table 41: Spanning Trees for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
MIN (3,7),(7,1),(7,4),(7,2),(2,5),(7,9),(3,8),(2,6)
MAX (6,9),(9,8),(6,1),(6,3),(6,4),(6,2),(9,5),(9,7)


Directed Vertex Covers

The following pages contain all directed vertex covers found for the neural networks examined in this research.


Table 42: Directed Vertex Covers for $x_1 \wedge x_2$
\begin{table}\begin{center}
\{1,2\},
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
...
...2,3,4,5\},
\{2,4\},
\{2,4,5\},
\{3,4\},
\{3,4,5\}
\end{center}\end{table}



Table 43: Directed Vertex Covers for $\neg x_1 \wedge x_2$
\begin{table}\begin{center}
\{1,2,3,4,5\},
\{1,2,3,5\},
\{1,2,4,5\},
\{1,2,5\},
\{1,3,4,5\},
\{1,3,5\},
\{1,4,5\},
\{1,5\}
\end{center}\end{table}



Table 44: Directed Vertex Covers for $( x_1 \wedge x_2 ) \vee x_3$
\begin{table}\begin{center}
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
\{1,2,3,4,...
...2,3,5,7\},
\{1,2,3,6\},
\{1,2,3,6,7\},
\{1,2,3,7\}
\end{center}\end{table}



Table 45: Directed Vertex Covers for $( x_1 \vee x_2 ) \wedge x_3$
\begin{table}\begin{center}
\{1,2\},
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
...
...4,7\},
\{4,5,6\},
\{4,5,6,7\},
\{4,6\},
\{4,6,7\}
\end{center}\end{table}



Table 46: Directed Vertex Covers for $( x_1 \vee \neg x_2 ) \wedge ( \neg x_1 \vee x_2)$



Table 47: Directed Vertex Covers for $\neg x_1 \wedge x_2 \wedge x_3$
\begin{table}\begin{center}
\{1,2,3,4,5,6,7\},
\{1,2,3,4,5,7\},
\{1,2,3,4,6,7\...
...4,7\},
\{1,5,6,7\},
\{1,5,7\},
\{1,6,7\},
\{1,7\}
\end{center}\end{table}



Table 48: Directed Vertex Covers for $\neg x_1 \wedge \neg x_2 \wedge x_3$
\begin{table}\begin{center}
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
\{1,2,3,4,...
...4,5,6,7\},
\{4,6\},
\{4,6,7\},
\{5,6\},
\{5,6,7\}
\end{center}\end{table}



Table 49: Directed Vertex Covers for $\neg x_1 \vee x_2$
\begin{table}\begin{center}
\{1,2\},
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
...
...2,4\},
\{2,4,5\},
\{2,5\},
\{3,4\},
\{3,4,5\}
\par\end{center}\end{table}



Table 50: Directed Vertex Covers for $x_1 \vee x_2$
\begin{table}\begin{center}
\{1,2\},
\{1,2,3\},
\{1,2,3,4\},
\{1,2,3,4,5\},
...
...2,3,4,5\},
\{2,4\},
\{2,4,5\},
\{3,4\},
\{3,4,5\}
\end{center}\end{table}



Table 51: Directed Vertex Covers for $( x_1 \wedge \neg x_2 ) \vee ( \neg x_1 \wedge x_2)$
\begin{table}\begin{center}
\{1,2,3,4,5,6,7,8\},
\{1,2,3,4,5,6,7,8,9\},
\{1,2,...
...8,9\},
\{6,7,8\},
\{6,7,8,9\},
\{6,8\},
\{6,8,9\}
\end{center}\end{table}