INFORMATICA 4/1960 AN EX PERIMENT IN AUTOMATIC LEARNING OF DIAGNOSTIC RULES I. BRATKO, P. MULEC UDK:681.3:616-071 FACULTV OF ELECTRICAL ENG. AND J. STEFAN INSTITUTE E. KAROELJ UNIVERSITV, LJUBLJANA' VUGOSLAVIA The paper reports on an experiment in automatic learning of classification rules for medical diagnoais. The input to the learning process is a set of exampleaf i.e. already diagnosed patienta. The output is a diagnostic rule, in the form of a decision tree, for diagnosing unknown examples. As a learning method we employed a slightly modified Quinlan's algorithm ID3. The lymphograpbic investigation served as a problem-domain for the experiment. We used the data about 150 patients, each o£ them described by a set of 18 discrete attributes and classified into one of 9 alternative diagnoses. Tbe average precision of automatically derived rulea obtained in a series of experiments was about 80% when diagnosing unlcnown patients, which comparea favourably to the estimated precision of human diagnosticians. This is between 60 and 85% depending on experience. POSKUS Z AVTOMATSKIM UČI5NJEM DIAGNOSTIČNIH PRAVTL. Članek opisuje poskus z avtomatskim učenjem diagnostičnih pravil za diagnosticiranje v medicini. Vhod v proces llčenja je množica primerov, to je pacientov z znanimi diagnozami. Izhod je diagnostično pravilo v obliki oAločitvenega drevesa za diagnosticiranje neznanih priraerov. Kot metodo učenja srao uporabili nekoliko modificiran Quinlanov algoritem ID3, kot problemsko področje za naš poskus pa je služila lirafografska preiskava. Uporabili smo podatke 0 150 pacientih, opisanih z 18 diskretnimi atributi in klasificiranih v 9 možnih al,ternativnih diagnoz. Povprečna natančnost diagnostičnih pravil, avtomatsko generiranih v zaporednih poskusih, je bila okrog 80% pri diagnosticiranju neznanih primerov. Ocenjena natančnost diagnostika - zdravnika leži med 60 in 85%. Introduction One problem arising in the development of computer applications such as expert information systems is: How to get the problern- domain knowledge into the system? The usual way is that the human domain-expert himself describes his or her own knowledge in some suitable formal language. It often turna that this is a difficult task since the knowledge used by the expert is often intuitive, not systematic, and/or poorly formalised. Examples of problem-domains in whioh human experts typically use nonformalised knowledge are: medical diagnosis, economic forecaats, playing chess etc. Another, attractive way of getting the knowledge into the system is based on the use of automatic learning frora examples and counter-examples. The domain-expert's task here becomes simpler as he is no more requested to systematically formalise his entire know- ledge, but only to provide tho system with an adequate set of examples. This set should, hopefully, be sufficient for the system to autonomously recognise the regulacities :underlying the exampl3S. In this paper we report on an experiraent in automatic learning of medical diagnoais. The diagnostic domain chosen for the experiment was lymphographic inveetigation. As examples Xor learning we used sild raedioal data with known correct diagnoaes. The result of the learning process was a diagnoatic rule in the form of a decision tree. This decision tree definea a mapping between lymphographic data and the corresponding diagnoaia, and can thua be used for automatic diagnosis. Our learning algorithm was baaed on the Quinlan's automatic learning program ID3 (e.g. Quinlan 1979, Quinlan 1980), which had to be generalised to classification into any number of claaaea (ID3 could originally deal with two clasaea only). The reaulta of the experi- ment indicated that the preciaion of the 19 automatically learned diagnostic rule super- seded that of an average physician - practi- tioner in this field, and that it is only slightly worse than the precision of best specialists for lymphographic investigation The learninp; alftorithm The algorithra used in our experiment is a version of Quinlan's ID3 system, which is based on Hunfs CLS (Concept Learning System, Hunt et. al. 1966). The input to the algorithm are examples to- gether with their class membership. Each exaraple is described by a set of discrete attributes. Each attribute has typically a few values. All examples are specified by the values of all the attributes (i.e. each example is completely specified), and by the olass to which the example belongs. Quinlan's original algorithm works with two classes only. As our problem of lymphographic diagnosis requi- red 9 classes, IDJ had to be modified accord- ingly. The appropriate generalisation from 2 to N classes of ID3»s information-theoretic evaluation function was straightforv/ard. The output of the algorithm is a decision tree. The nodes of this tree correspond to tests o£ attributes. The arcs stemming from nodes in the tree correspond to the values of the attribute corresponding to the node. Each leaf of the tree is assigned a class in such a way that this class conta ins all the examples which, according to their attribute values, fall irito this leaf. The algorithm £ov constructing a decision tree JTrom examples is very simple and efficient. First, a subset, called a "window", of the example set is chosen. A decision tree which "explains" this wiridow is constructed. Then this tree is tested against the •whole example set. If the tree explains the whole set (i.e. correctly classifies all the examples in the set) then this tree is the final reault of the learning process. If not, then the window is modified by the inclusion of some exaraples which contradict the ourront deci-sion tree, whereby possibly deleting sorae of the members of the old window. A new decision tree is constructed for the new window, then tested against the complete example set, etc. A decision tree for a given window is constructed in a top-down fashion. First, one of the attributes is selected to become the root of the tree. This attribute partitions the window into "subwindows", so that each subwindow contains examples with the sarae value of this attribute. Then, subtrees are constructed for all the subwindows. The sub- trees are connected to corresponding arcs stemming from the root. Attributes to become roots of the (sub)trees are chosen by a heuristic criterion: that attribute is chosen which most reduces the information content of the (sub)window. An implementation of this algorithm is in more detail documented in Mulec 1980. The problem of Lymphop;raphic diaRnosis In the lymphographic investigation, 18 symptoms are considered. Symptoms correspond to attri- butes, as referred to in the previous section. There are 9 possible alternative diagnoses; t]iat is: each example is classified into one of 9 classes. Table 1 shows a form which is to be filled in by a physician when diagnosing a lymphograph. The data in this form defines one example for our learning algorithm. Experiment and results In the experiment, we used the archive data about 150 patients who were lymphographically investigated at the Institute of Oncology, Ljubljana, over a 3 year period. Fig. 1 shows the diagnostic rule produced by the learning algorithm if all 150 samples were used as training examples. By the defini-tion o.f the Quinlan's algorithm, the diagnostic rule has to correctly diagnose all the examples used for training. It is interesting, however, how successfully this diagnostic rule classifies unknown samples. To investigate this question empirically, we randomly permuted all 150 examples, then used the first 100 examples as a training set for the derivation of a diagnostic rule, and then tested the rule on thefremaining 50 samples as unknown cases. To eliminate the risk of pathological permutations, this experiment was repeated 10 times, each time with another 20 (*utt M»J^Y^ ^»vtio^ H«,W>M«*> H