42 EXPLANATION OF NEURAL NETWORK INFORMATICA 4/92 CLASSIFICATION Matija Drobnič, Viljem Križman Keywords: machine learning, neural networks, and Borut Korenjak decision trees Institut Jožef Stefan We introduced explanation in human-readable form into a neural network classiffier. The neural network was upgraded by an inductive learning system, which generated the decision tree to explain the way neural network classified new examples. The decision tree learned was compaxed to the neural network itself and to the inductive learning system regarding both transparency and classification accuracy. Razlaga klasifikacije z nevronsko mrežo V delu predstavljamo metodo, ki pri klasifikaciji z nevronsko mrežo omogoča razlago klasifikacije v človeku razumljivi obliki. Nevronsko mrežo smo nadgradili s sistemom za induktivno učenje, ki generira odločitveno drevo kot razlago delovanja le-te. Tako dobljena odločitvena drevesa smo primerjali z izvirnimi nevronskimi mrežami in s sistemom za induktivno učenje tako glede razumljivosti kot tudi s stališča klasifikacijske točnosti. 1. Introduction Artificial neural network models were introduced as an attempt to describe thé way human brain copes with data, especially in cases of pattern recognition [1]. They are, in principle, based on our understanding of the human brain structure. Their computational power is based on the massive parallelism of simple elements and their dense interconnection. Many different types of neural networks (NN) were introduced during last years. In the field of digital pattern recognition, single-layer networks are mostly used [2], whereas three-layer feedforward networks can be used as general classifying systems for the data described in an attribute-value laguage [3]. In this field, their adaptability and classification accu- racy makes them a very useful tool. Their main drawback is the lack of transparency to the human user, who cannot figure much out from the values of the NN weights. Induction learning (IL) is another approach to the classification task (e.g. [4], [5]). Given the examples, the IL system tries to generate a classification function in the form of DT or in the form of IF - THEN rules. The main advantage is, the acquisition of knowledge in the form suitable for expert systems, where the transparency of the results is strictly required. Introduction of statistical methods into the knowledge acquisition process also provides classification accuracy comparable to the one of the classification methods of classical statistics. 43 In this paper, we try to combine the advantages of both approaches. The main idea is to use IL methods to extract the information hidden in NN weights. The DT obtained in this way, provides an insight into the process of the classification of new examples. 2. Explanation in the neural network As an example of NN classifier, the growing neural network (GNN) has been chosen. It is a single-layer NN, where neurons are vectors of weights with attached classes, belonging to the same space as the learning examples. Its classification is based on the nearest-neighbour method. To generate the GNN classifier, slightly modified unsupervised learning algorithm proposed by Koho-nen [6] was used. It can be put as follows normalise-vectors; net = fir3i-examplejvector; repeat x = next-example joector; y = nearest-vector-from.networh(x); if class(x) = class(y) then begin V = y + ol(x — y); update-vector(y, net); end else add_vector(x, net); until no-more-examples; For each new learning example x, we find the nearest vector y from the network according to the || |j2 norm. If their classes match, y is slightly rotated into direction of x (in our experiments, we set a = 0.2). Otherwise, x is added to the network as a new vector. When the network is built, the classification process is simple: given an example, find the nearest vector in network according to the || H2 norm and use its class to classify the example. The implementation of the upper algorithm in C language is given in [7]. As an IL system, ASSISTANT Professional [8] was chosen. It is a tool for the induction of decision trees (DTs) from examples in the attribute-value language, based on ID3 algorithm [4], improved by the binarisation of the attributes, the mechanism for dealing with incomplete data and the tree pruning features. Several ways of combining the NN and IL methods have been proposed recently [9]. One can classify all the learning examples with the NN learned from them, obtaining their new classes, and then feed them to the IL system as an input. Another possibility is, to generate artificial examples, classify them with NN, and use them as an IL system input again. We have chosen another way: we took the original weight vectors from the GNN and use them as learning examples. In the case of GNN, this simple schema makes sense, since the GNN uses its weight vectors as examples for the nearest-neighbour classification. 3. Experimental results 3.1. Experimental Setup In our experiments, the medical domain, describing the condition of coronary arteries after the bypass operation has been used. Domain contains 112 examples. Each of them belongs to one of the following classes: deteriorated, unchanged or improved condition. The data is described with 30 parameters, 14 numerical and 16 logical. The numerical attribute values were normalised using the || ||oo norm. The logical ones were coded as 0 and 1. Before loaded into ASSISTANT Professional, the numerical values were discretized into 5 equal intervals. For the cross-validation, the examples were 10 times randomly divided into a training set (70% or 80 examples) and a testing set (30% or 32 examples). For every distribution, GNN was built on learning examples (GNN). Then, DT was learned from the neural network (IL_GNN). As a reference, another DT was learned from the original learning examples (IL). All three methods were compared regarding the transparrency of the classification process. 44 In the next step, we have used the trees, learned from neural networks as standalone classifiers. All three methods were then compared also regarding the classification accuracy. 3.2. Transparency of the classification In this section, we compare three different methods with respect to their transparency of the classification process to the human user. First, let us examine the GNN classifiers. The algorithm described in Section 2 generates networks containing about 15 neurons. The results axe shown in Table 1. Every neuron contains 30 real-valued weights and attached class. A typical network (distribution 2) is presented in Figure 1. Classi Class2 Class3 £ 0 1 0 12 13 1 3 0 13 16 2 2 1 13 16 3 0 1 8 9 4 1 1 13 15 5 2 0 12 14 6 3 1 13 17 7 3 0 16 19 8 2 1 13 16 9 4 1 13 18 (x) 2.1 0.6 12.6 15.3 ax 1.1 0.5 1.9 2.7 Table 1: Number of neurons in GNNs. Class 1 Class 2 Class 3 OOO................................................ o 1 2 3 30 Figure 1: An example of neural network. 45 Every classification of the new example requires 16 x 30 = 480 subtractions, multiplications and additions to calculate Euclidean distance from all the vectors in neural network. Additionally, it also requires 16 comparisons. Even if a human user is capable of using GNN, it is a black box, returning the result without any explanation. In the next stage, the neurons of GNNs were used as learning examples for the IL system. The DTs learned from the GNNs contained about 4-5 nodes and about 2-3 leaves. An example of such decision tree (distribution 2) is shown in Figure 2. Figure 2: An example of DT, learned from network. The difference between the classifiers from Figures 1 and 2 is obvious: in the second case, only two comparisons axe required during the classification in the worst case. Furthermore, the DT is simple enough to be understood by humans, and can be easily used even without a computer. This is certainly not true with the neural network, where knowledge is hidden into 480 real-valued weights. For further comparison, we also ran the AS- SISTANT Professional with the original examples as an input. The DTs, learned in this way were much bigger than the ones, learned from the neural networks: they typically contained about 18 nodes and about 9 leaves. For comparison with the DTs learned from GNN neurons, the number of nodes and leaves for both methods are presented in Table 2. An example of DT, learned from the learning examples (distribution 2), is shown in Figure 3. DT learned from examples DT learned from GNN nodes leaves NULL nodes leaves NULL 0 21 11 2 3 2 0 1 21 11 4 3 2 0 2 17 9 2 5 3 0 3 19 10 3 3 2 1 4 17 9 4 5 3 0 5 15 8 2 3 2 0 6 13 7 3 5 3 1 7 19 10 3 5 3 0 8 19 10 3 5 3 0 9 17 9' 3 5 3 0 -<*> 17.8 9.4 2.9 4.2 2.6 0.2