INFORMATICA 4/1960
AN EX PERIMENT IN AUTOMATIC LEARNING
OF DIAGNOSTIC RULES
I. BRATKO, P. MULEC
UDK:681.3:616-071 FACULTV OF ELECTRICAL ENG. AND J. STEFAN
INSTITUTE E. KAROELJ UNIVERSITV, LJUBLJANA'
VUGOSLAVIA
The paper reports on an experiment in automatic learning of classification rules for medical
diagnoais. The input to the learning process is a set of exampleaf i.e. already diagnosed
patienta. The output is a diagnostic rule, in the form of a decision tree, for diagnosing
unknown examples. As a learning method we employed a slightly modified Quinlan's algorithm ID3.
The lymphograpbic investigation served as a problem-domain for the experiment. We used the data
about 150 patients, each o£ them described by a set of 18 discrete attributes and classified
into one of 9 alternative diagnoses. Tbe average precision of automatically derived rulea
obtained in a series of experiments was about 80% when diagnosing unlcnown patients, which
comparea favourably to the estimated precision of human diagnosticians. This is between 60 and
85% depending on experience.
POSKUS Z AVTOMATSKIM UČI5NJEM DIAGNOSTIČNIH PRAVTL. Članek opisuje poskus z avtomatskim učenjem
diagnostičnih pravil za diagnosticiranje v medicini. Vhod v proces llčenja je množica primerov,
to je pacientov z znanimi diagnozami. Izhod je diagnostično pravilo v obliki oAločitvenega
drevesa za diagnosticiranje neznanih priraerov. Kot metodo učenja srao uporabili nekoliko
modificiran Quinlanov algoritem ID3, kot problemsko področje za naš poskus pa je služila
lirafografska preiskava. Uporabili smo podatke 0 150 pacientih, opisanih z 18 diskretnimi atributi
in klasificiranih v 9 možnih al,ternativnih diagnoz. Povprečna natančnost diagnostičnih pravil,
avtomatsko generiranih v zaporednih poskusih, je bila okrog 80% pri diagnosticiranju neznanih
primerov. Ocenjena natančnost diagnostika - zdravnika leži med 60 in 85%.
Introduction
One problem arising in the development of
computer applications such as expert
information systems is: How to get the problern-
domain knowledge into the system? The usual
way is that the human domain-expert himself
describes his or her own knowledge in some
suitable formal language. It often turna that
this is a difficult task since the knowledge
used by the expert is often intuitive, not
systematic, and/or poorly formalised. Examples
of problem-domains in whioh human experts
typically use nonformalised knowledge are:
medical diagnosis, economic forecaats,
playing chess etc.
Another, attractive way of getting the
knowledge into the system is based on the
use of automatic learning frora examples and
counter-examples. The domain-expert's task
here becomes simpler as he is no more requested
to systematically formalise his entire know-
ledge, but only to provide tho system with an
adequate set of examples. This set should,
hopefully, be sufficient for the system to
autonomously recognise the regulacities
:underlying the exampl3S.
In this paper we report on an experiraent in
automatic learning of medical diagnoais. The
diagnostic domain chosen for the experiment
was lymphographic inveetigation. As examples
Xor learning we used sild raedioal data with
known correct diagnoaes. The result of the
learning process was a diagnoatic rule in
the form of a decision tree. This decision tree
definea a mapping between lymphographic data
and the corresponding diagnoaia, and can thua
be used for automatic diagnosis.
Our learning algorithm was baaed on the
Quinlan's automatic learning program ID3 (e.g.
Quinlan 1979, Quinlan 1980), which had to be
generalised to classification into any number
of claaaea (ID3 could originally deal with
two clasaea only). The reaulta of the experi-
ment indicated that the preciaion of the
19
automatically learned diagnostic rule super-
seded that of an average physician - practi-
tioner in this field, and that it is only
slightly worse than the precision of best
specialists for lymphographic investigation
The learninp; alftorithm
The algorithra used in our experiment is a
version of Quinlan's ID3 system, which is
based on Hunfs CLS (Concept Learning System,
Hunt et. al. 1966).
The input to the algorithm are examples to-
gether with their class membership. Each
exaraple is described by a set of discrete
attributes. Each attribute has typically a
few values. All examples are specified by the
values of all the attributes (i.e. each example
is completely specified), and by the olass to
which the example belongs. Quinlan's original
algorithm works with two classes only. As
our problem of lymphographic diagnosis requi-
red 9 classes, IDJ had to be modified accord-
ingly. The appropriate generalisation from 2
to N classes of ID3»s information-theoretic
evaluation function was straightforv/ard.
The output of the algorithm is a decision tree.
The nodes of this tree correspond to tests
o£ attributes. The arcs stemming from nodes
in the tree correspond to the values of the
attribute corresponding to the node. Each leaf
of the tree is assigned a class in such a way
that this class conta ins all the examples
which, according to their attribute values,
fall irito this leaf.
The algorithm £ov constructing a decision tree
JTrom examples is very simple and efficient.
First, a subset, called a "window", of the
example set is chosen. A decision tree which
"explains" this wiridow is constructed. Then
this tree is tested against the •whole example
set. If the tree explains the whole set (i.e.
correctly classifies all the examples in the
set) then this tree is the final reault of the
learning process. If not, then the window is
modified by the inclusion of some exaraples
which contradict the ourront deci-sion tree,
whereby possibly deleting sorae of the members
of the old window. A new decision tree is
constructed for the new window, then tested
against the complete example set, etc.
A decision tree for a given window is
constructed in a top-down fashion. First, one
of the attributes is selected to become the
root of the tree. This attribute partitions
the window into "subwindows", so that each
subwindow contains examples with the sarae
value of this attribute. Then, subtrees are
constructed for all the subwindows. The sub-
trees are connected to corresponding arcs
stemming from the root.
Attributes to become roots of the (sub)trees
are chosen by a heuristic criterion: that
attribute is chosen which most reduces the
information content of the (sub)window.
An implementation of this algorithm is in
more detail documented in Mulec 1980.
The problem of Lymphop;raphic diaRnosis
In the lymphographic investigation, 18 symptoms
are considered. Symptoms correspond to attri-
butes, as referred to in the previous section.
There are 9 possible alternative diagnoses;
t]iat is: each example is classified into one
of 9 classes. Table 1 shows a form which is
to be filled in by a physician when diagnosing
a lymphograph. The data in this form defines
one example for our learning algorithm.
Experiment and results
In the experiment, we used the archive data
about 150 patients who were lymphographically
investigated at the Institute of Oncology,
Ljubljana, over a 3 year period. Fig. 1 shows
the diagnostic rule produced by the learning
algorithm if all 150 samples were used as
training examples.
By the defini-tion o.f the Quinlan's algorithm,
the diagnostic rule has to correctly diagnose
all the examples used for training. It is
interesting, however, how successfully this
diagnostic rule classifies unknown samples.
To investigate this question empirically, we
randomly permuted all 150 examples, then used
the first 100 examples as a training set for
the derivation of a diagnostic rule, and then
tested the rule on thefremaining 50 samples as
unknown cases. To eliminate the risk of
pathological permutations, this experiment was
repeated 10 times, each time with another
20
(*utt
M»J^Y^ ^»vtio^
H«,W>M«*>
H<fa*»bW-S
V
*c
Figure 1: A diagnostic rule for lymphographic investigation.
21
feottne A
Hrt«*«*
Figure 1: Continued.
22
Lymphoftraphic attributes
1. lymphangia:
0 normal
1 curved
2 deformations
3 displacement
2. Stop on afferent lyraphangia:
1
2
no
yes
3. Stop on chain of lymphangia:
no
yes
4. Block o£ lymphatic systera:
1 no
2 yes v •
5. By-pass:
1
2
no
yes
6.- Extravasations:
1
2
no
yes
7. Regeneratidh lymphangia:
1 no
2 yes
8. Early uptake in lymph-nod.es:
1 no
2 yes
9. Lyraph nodes diminished:
0
1
2
3 . ,
10. Lymph nodes enlarged:
0
1
2
3
11. Shape of lymph nodea:-
1 bean-iike
2 oval ••..
3 upherlcal •
12. Various Tilling defects:
1 no
2 folicular
3 big central
4 small defects
13. Lacunar filling defects:
1 no
2 lacunar
3 lacunar marginal
^ central
1** Structural alterations:
1 no
2 grains . -. , -
3 small droplets
4- coarae droplete
5 deluted
6 grid
7 stripes
8 obscure
15« Special structure and form:
1 glass
- . 2 bladder
16. Dislocation of lymph nodes:
1 no ••••.-
2 • yes ;' •
17. No uptake in lymph nodes.:
1
2
no
yes
18. Number of abnormal lymph nodes:
1
2
3
5
6
7
8
9
0-9
10-19
20-29
30-39
40-4-9
50-59
more than- 59
Diaginoses
norraal
reactive hyperplasia
mefcastases suspected
malignarit lymphoma auapected
raetastases
mal-ignant lymphoma
Brill-Symmers"
fibrosation
other diseases
Table 1: Syraptoms and diagnoses in lymphographic investigation.
23
random permutation of the data.
Diagnostic rules were evaluated in two ways:
by "absolute preoision" and by "relative
precision"., The relative precision was baaed
ori the physicians judgement on the Berlousness
of particular errors in diagnosis. Thus each
possible case of misclassification was
assigned a penalty value according to the
physičiarfs feeling of how serious was the
difference between the wrong and the correct
diagriosis.
Absblute precision is the peroentage of
unsuccešsfully diagnbsed samplea. The
fol'lowing cases were counted as unsuoceasful
diagnosis:
- the patient falls into.a leaf of the
decision-tree labelled by another diagnosis;
- the patient falls into a leaf of tho
deoisiori tree labelled by "null" (that is
a ieaf wHich did not match any example in
the trainirig set, ahd therefore the class
of this leaf was not known);
- the patient falls into a leaf labelled
"search" (thaf means that in thia case the
attributes are insufficient for unambigous
diagnosis; this situation arises if patients
with the same symptom3 in the training aet
were diagnosed differently).
The last case above indicates a aort of
insu.fflciency or incosištency of the training
eet. It never occured in our set of 150
patierits.
The relative precision is computed so that
each inčorrect diagnosis (the first one of the
above threo cases) ia penalised by a penalty
value between 0 and 1. Por example, to
diagnose a "normal" patient "metastases" is
considered to be a most serious error and is
therefore penalised by 1. On the other hand,
the interchange o£ the diagnoses "metastases"
and "rnetastases suapected" ia a small mistake
(penalty 0.1). Table 2 is a penalty matrix
for our experiment as proposed by a phyaician
specialised in lymphographic diagnosia.
Table 3 contains sorao oharaoteriBtics of the
learnt diagnostic rules for all 10 experimentB.
Columns in the table correspond to the
experiraents. Each experiraent is described by
the following parameters:
- the size of the diagnostic rule, i.e. the
number of nodes in the decision tree;
- the necessary size of the data-base, i.e.
the number of examples in the window which
was sufficient for the oonstruction of a
decision tree to explain all 100 exaraples
in the training set;
- the number of unlcnown testing saraples which
matched a leaf labelled "null";
- the nuraber of unlcnovm testing samples which
match a leaf labelled "search" (this was
always 0 as our example set was
"consistent");
- the number of ineorreetly diagnosed
samples (caae 1 above);
- absolute precision (percentage);
- relative precision (percentage).
Comparatively poor precision in the first
experiment can be explained by the fact that
the examples in this experiment were not
randomly permuted. They urere chronologically
ordered, covering a few years period. During .
this period, the human diagnostician's criteria
for recognising some of the symptoms were
probably changing, which made symptom-
patterns of patients, distant in time,
incompatible to some extent. The average
absolute precision was about 80#, the average
relative preoision was 88%.
Diacuasion
To evaluate the above results let us compare
the precision of our automatically learned
diagnostic rules to that attained by the
physicians in practice, and to that of
another learning method.
The absolute precision of the lymphographic
diagnoais attained by physicians - practiti-
oners in the field, is between 60^ and 85$,
depending on how experienced is the diagno-
stician. The 80^ average precision of our
system compares quite favourably with this
60 - 85^ interval.
M.Soklič carried out, at the Institute of
Oncology, another experiraent in automatic
learning using the sarae medical data and
empibying his own learning method based on
quasi-spherical partitioning of the pattern-
space (Raziakovalna skupnost Slovenije, 1978).
The precision obtained by that method was:
abaolute 62^, relative 70^.
These comparisons indicate that our automati-
cally derived diagnostic rule could be suc-
cessfully applied in the practice of lympho-
graphic diaGnosis. Unfortunately a straight-
24
forward use of our decision tree by the
physician would still require čonsiderable'
physician's knowledge about lymphographic
investigation. This knowledge is necossary
for the recognition of symptoms (i.e.
attribute values) in lymphographs. It seems
that for a really helpful application in this
diagnostic problem, a mucn more sophisticated
system would be needed. Such a system should
guide the user also in recognising particular
symptoms, or should itself be capable of
recogniaing visual patterna.
AclcnovledRement
The authors would lilce to thank dr. G.
Klanjšček and dr. M.Soklič for advice, advice,
and medioal data used in our experiment,
and dr. M.Zwitter for his suggestions in the
preparation o£ tbis paper.
Referencea
Hunt, E.B., Martin, J., Stone, P. (1966)
Experiments in Induction, Academic Press.
Mulec,' P. (1980) Aigorithms for automatic
learning (Undergraduate thesis). Ljubljana:
Faculty of Electrical Eng. (in Slovenian).
Quinlan, J.R. (1979) Discovering rules by
induction from large collections of exaraples,
in Expert Systems in the Microelectronic Age
(ed. D.Michie). Edinburgh: University Press.
Quinlan, J.R. (1980) Serniautonomous
knowledge acquisition, in Expert Systems.
London: Infotech.
Diagnosis
-
2
3
H-
5
6
7
8
9
normal
reactive hyperplašia
metastases suspected
malignant lymphoma suspected
metastases
malignant lymphoma
Brill-8ymmers
fibrosation
other diseases
1
0.33
0.66
0.66
1.00
0.85
0.66
0.50
0.66
2
0.33
-
0.10
0.33
0.66
0.50
0.10
1.00
0.33
3
0.66
0.10
0.50'
0.10
0.50
0.50
0.85
0.33
n-
0.66
0.33
,0.50
-
0.75
0.10
0.15
0.15
0.50
5
1.00
0.66
0.10.
0.75
-
0.75
0.66
0.50
0.50
6
0.85
0.50
0.50
0.10
0.75
-
0.33
0.15
0.33
7
0.66
0.10
0.50
0.15
0.66
0.33
-
0.85
0.50
8
0.50 .
1.00
0.85
0.15
0.50
0.15
0.85
-
0.66
9
0.66
0.33
0.33
0.50
0.50
0.33
0.50
0.66
-
Table 2: Seriousness of errors in diagno3is.
Index of experiment
Rule size •
Data-base size
Null
Search
Wrong diagnosia
Absolute precision (#)
Relative precision (#)
1
88
82
0
0
22
56
80
.2
80
75
5
0
6
78
85
3
74
63
3
0
9
76
88
68
62
1
0
6
86
91
5
53
68
0
0
10
80
90
6
78
68
6
0
6
76
82
7
53
65
1
0
5
88
93
8
58
62
0
• 8
.76
85
9
64
56
H-
0
84
90
10
68
62
3
0
4
84
91
Table 3: Results in repeated experiments.
25
APPENDIX: Symptoms and diagnoses in lymphographic investigation
Original form as used at the Institute of Oncology,•Ljubljana (in Slovenian)
Limfop.rafski simptomi
1. Mezgovnice:
0 normalno
1 loki
2 deformacije
3 odriv
2. Blok dovodnih mezgovnic:
1 ga ni
2 je
3. Blok raezgovnic verige:
1 ga ni
2 je
4. Blok limfatičnega sistema:
1 ga ni
2 Je
5. Obvoz - by pass:
1 ni
6. Ekstravazati - jezerca:
1 jih ni
2 so
7. Regeneracijske mezgovnice:
1 jih ni
2 so
8. Zgodnje kopičenje v bezgavkah:
1 ga ni
2 de
9. Velikost bezgavk - zmanošanje:
0
1
2
3'
10. Velikost; bezgavk - povečanoe:
0
1
2
3
11. Spi*ememba oblike bezgavk:
1 fižol
2 ovalna
3
12. Polnitvoni defekti ra;f;';t:i:.
1 jib ni
2 Tolikularni
3 veliki ccntralni
^ drobci
13. Polnitveni defekti lakularni:
1 Jih ni •
2 lakunarni
3 lakunarni marginalni
1- lakunarni centralni
14. Sprememba strukture kopičenja:
1 je ni
2 zrnata
3 drobno kapljasta
4 grobo kapljasta
5 razredčena
6 mrežasta
7 proca;:ta
8 zabrisana
15. Posebna struktura in oblika:
1 kelih
2 mehur
16. Dislokacija - odriv bezgavk:
1 ga ni
2 je
17. Izpad kopičenja bezgavk:
ga m
18. Stevilo prizadetih bezgavk:
0 0-9
10-19
20-29
30-39
40-49
50-59
več kot 59
Diagnoze
1 normalni izvid
2 reaktivna hiperplazija
3 sumljiv na metastaze
4 sumljiv na raaligni limfom
5 metastaze
6 maligni limfom
7 Brill-Symmers
8 fibrozacija
9 ostale bolezni