The Modified Diagonalization Method
for Analysing Clusters
within Economies
Henryk Gurgul
Pawel Majdosz
In this paper a modification of the diagonalization method, originally
put forward by Hoen (2002), is suggested which is aimed at uncovering
clusters of sectors within an input-output framework. Our interest in
this subject was largely motivated by the fact that the preceding method
appears to be incapable of providing us with an accurate representation
of the real cluster structure that exists in an economy, as a consequence
of missing the position at which a given inter-sectoral flow stands in
the hierarchy of the purchasing industry and the supplying industry.
By making a distinction between an internal and external relationship,
when it comes up at the moment of deciding whether each pair of in-
dustries is categorized as belonging to the same or different clusters,
the proposed alternative, which will be referred to as the modified di-
agonalization method, seems to be superior to its predecessor. Such
a conclusion is supported by the results of comparison of the relative
performance of the rival methods (i. e. the original and modified diag-
onalization method) which show, among other things, that the average
value of flows between industries grouped into clusters is higher in the
case of the proposed method.
Key Words: internal and external interindustrial relationships,
diagonalization method, clusters
jEL Classification: B41
Introduction
Industry clusters are nowadays an intrinsic element of the economic
landscape of almost every country all over the world. Cluster-related
problems have been viewed from various perspectives (spatially, inter-
industrially and intra-industrially) and in varying contexts (see e.g.
Dr Henryk Gurgul is a Professor at the Department of Economics
and Econometrics, University of Science and Technology, Poland.
Dr Pawel Majdosz is an Assistant Professor at the Department
of Economics and Econometrics, University of Science
and Technology, Poland.
Managing Global Transitions 6 (1): 53-73
Munroe and Hewings 2000). Theoretical interest in the concept of clus-
tering is first and foremost associated with classical work on agglom-
eration in which the process of clustering is typically explained by the
presence of externalities such as economies of scale and scope, which
give economic advantages to firms agglomerating in a certain locality
(see e. g. Hoover 1937; 1948; Marshall 1890; Ohlin 1933). Over the course
of decades, there has been a tremendous development in the literature
concerning this subject, and a comprehensive review of it is provided by
Bekele and Jackson (2006). In the work of the mentioned authors the
interested reader can find more details.
These theoretical approaches have been accompanied by empirical in-
vestigations of the clustering phenomenon which have found evidence
supporting the presumption that an industry cluster allows enterprises
to reduce costs, uncertainties and risks (see e. g. Antonelli 1999; Krugman
1991; Krugman and Venables 1996; Porter, 1998). Re-orienting economies
towards a knowledge-based model has entailed a growing awareness that
competitive success increasingly involves innovations and continuous
quality improvements. The role of industry clusters has again appeared
to be of crucial importance, this time, as a mechanism that enables in-
novations to spread throughout the economy (see e. g. Hauknes 1998;
Martin and Sunley 2003).
Despite the importance of cluster analysis in explaining and exploring
real economic structure, the effort which has been undertaken to im-
prove the methods of cluster identification hardly gives an answer to the
most urgent questions and there are still many problems unsolved in the
face of which researchers may rely only on their own intuition. Overall,
the practical application of cluster analysis poses many difficulties, espe-
cially since there is no uniform definition of what we should consider as a
cluster. Definitions used in practice frequently depend on the particular
ends to which a given study is subordinated. On the basis of the applied
definition of cluster, one then decides which of the alternative methods
of grouping sectors into clusters is employed. Confining our primary in-
terest to the input-output context exclusively, it might be noticed that if
great emphasis is placed in the definition upon intermediate deliveries
between sectors within the economy, it is likely that the cluster identifi-
cation method will be based on the matrix of intermediate deliveries. If
attention is focused on the inputs of the ith product which are required
for a one-unit increase in the production of the jth sector, the matrix of
input coefficients may be selected as a base. If what is more important
are inputs of the ith product required to meet the demand of (all) other
sectors associated with a one-unit increase in the final demand of the jth
sector, rather than those necessary for such an increase in the produc-
tion of the jth sector, the inverse Leontief matrix will be probably used in
place of the matrix of input coefficients. And finally, if the cluster defini-
tion stresses the extent to which the production of the ith sector is used
by the jth sector in order to contribute to further production, the ma-
trix of output coefficients may appear to the researcher to be particularly
useful for dealing with the cluster identification problem.
The existence of many methods that can be used in practice for identi-
fying inter-industrial relationships composing clusters contributes to the
emergence serious difficulties with result comparability and could lead to
a misinterpretation of the obtained evidence. It seems to be particularly
controversy-prone when one of the rules underlying a technique em-
ployed is an economic meaning of the resulting clusters. By allowing this
principle, any potential cluster that is an outcome of the formal proce-
dure used in a first stage can then be cast away by the researcher as being
deprived of precise economic meaning. The most important question
that must be answered in all those cases where an economic meaning is a
valid rule of the overall cluster identification procedure is to which extent
the obtained results describe the actual cluster structure within the econ-
omy or, speaking more clearly, whether an exclusive goal of it is not, in
fact, to support the presupposed statements and beliefs of the researcher.
As long as different researchers will be tempted to base their work on the
questionable principle of the economic meaning of identified clusters,
any effort to improve and develop the rigorous methods for uncovering
cluster structure, which could restrict the scope of arbitrariness related
to the empirical studies that are undertaken in this momentous field,
should be welcome.
In this paper a method designed to identify industry clusters within
the economy is suggested. The main advantage of the method stems
from the fact that it relies on a three-level classification of interindus-
trial linkages, thereby making it possible to discern between internal (or
intra-cluster) relationships and external ones which connect industries
belonging to different clusters, or industries within clusters with those
outside. The suggested method is presented in the context of its predeces-
sors, such as the diagonalization method (dm) proposed by Hoen (2002),
and an extensive comparison using data from different countries and
dates is carried out to enable the relative performances of these meth-
ods to be ascertained. The major conclusion that can be drawn from the
results is that the proposed method tends to produce structures of in-
dustry clusters which seem to be more plausible and, at the same time,
the level of internal relationships measured by the average value of flows
among industries grouped into clusters is even higher than that of those
obtained by means of the dm.
The remainder of the paper is organized as follows. In the next section
several customarily used methods of identifying industry clusters are
outlined and the alternative, i. e. the modified diagonalization method
(mdm) is suggested. The third section contains a brief description of
the data used in comparing the relative performances of the alternative
methods. The empirical results are reported in the fourth section, while
the fifth section concludes the paper.
Cluster Identification Methods
To facilitate a comparison of the proposed method with those hitherto-
existing, we start with an outline of the latter, thereby giving a back-
ground against which the method suggested in this paper is formed as a
remedy for some of their shortcomings. As was mentioned in the previ-
ous section, one characteristic that may make one method for identifying
clusters different from another is that it may use the input-output matrix
of a different sort as the base for computation. Let x denote the vector of
gross output and Z be the matrix of intermediate deliveries whose ele-
ments show the amounts of output of the ith sector that are sold to the
jth industry to maintain its own production. Then, the related matrices,
i. e. the matrix of input coefficients (A), the Leontief inverse (D), and the
matrix of output coefficients (B), can be derived, in turn, as follows:
A = ZX-1	(1)
D = (I - A)-1 = L-1	(2)
B = X~1Z	(3)
where a hat is used to denote the diagonal matrix with elements of the
vector on its main diagonal and all other entries equal to zero.
The simplest algorithm of grouping sectors into clusters, and the most
frequently used in practice, the so-called method of maximization (mm),
involves finding the largest off-diagonal element of a selected matrix
(suppose that this matrix is A) and joining the sectors with the largest
amount of intermediate deliveries until the number of clusters identified
in such a way reaches a previously fixed threshold. The only advantage
of this algorithm arises from its simplicity. However, it suffers from the
so-called mega cluster problem and the fact that the obtained solution is
sensitive to the matrix chosen as a basis of computation. In other words,
although there are relationships among the matrices (as expressed by (1)-
(3)), each of them embodies different information, and it may happen
that the clusters yielded by this algorithm will be completely different
when we use, for example, the Z in place of A or D instead of the B ma-
trix.
Similar to the above-described method of uncovering clusters within
the economy is the method of restricted maximization (rm). Roughly
speaking, whilst the former takes into account all off-diagonal elements
of the respective matrix to form the clusters, the latter focuses only on
those which are large enough to satisfy the imposed restrictions. Such
restrictions may be expressed in different ways, but without knowledge
of the distribution of the matrix elements, it is usually formulated as
a multiplication factor of the matrix average. This approach appears to
be more flexible for several reasons. It allows the involvement of two or
even all the matrices under consideration simultaneously, by imposing
a conjunction of the respective restrictions for the single matrix. The
method does not require the number of clusters to be determinate at the
beginning of the investigation; instead one can adjust the value of the
multiplication factor to obtain the same effect. Conversely, it does not
deal with the possibility of varying clusters when different matrices are
used as the basis for calculation.
Hoen (2002) put forward the dm which appears to be superior to the
above-described ones. The first stage of this approach involves formulat-
ing a binary matrix (R), holding ones and zeros, which is given as:
where q(X, p) stands for quantile of the elements of X at the order p.
Then, by permuting its rows and columns, these are an attempt to
transform matrix R into a block-diagonal matrix so that each of its blocks
represents one cluster. The reasoning underlying this method is that the
cluster should encompass all sectors which are connected to each other
and, at the same time, unattached from the rest of the economy. Chang-
ing the level of significance a, we decide which intermediate deliveries,
1 if aij > q(A, 1 - a) A bij > q(B, 1 - a)
0 otherwise
(4)
input coefficients and output coefficients are regarded as significant and
which are insignificant and to be fixed at zero. The higher the level of a,
the more linkages there are among the sectors within the economy, but
not always more clusters are identified as Hoen himself would wish. This
issue will be given more attention later on.
As Hoen (2002) documented, the dm solves the problem of changes
in the composition of clusters with respect to the matrix used as a basis,
and facilitates a better insight into the structure of an economy by not
requiring the number of existing clusters to be specified at the beginning
of the investigation. However, this approach also has several drawbacks.
One is that the respective quantiles are obtained on the basis of all matrix
elements, including those located on the principal diagonal. When this
method is used for comparative purposes, i. e. to examine how far the
composition of clusters is unchangeable over time, it seems to be rea-
sonable to fix parameter a at the same level during the whole analysed
period. But the two sets of results obtained by this means from differ-
ent points are comparable only if there is no change in the proportion of
elements placed on the principal diagonal to the off-diagonal elements.
This is probably not a problem in the case of developed countries where
the off-diagonal elements of the respective matrices dominate those lo-
cated on the principal diagonal, indicating that each sector assigns only a
small portion of its own current outputs in order to contribute to further
production, but for transitional countries this ratio is reversed (see e. g.
Cmiel and Gurgul 2002; Gurgul and Majdosz 2005). Within a few years
of market-oriented reforms the share of off-diagonal flows within the
economies of the countries in transition is expected to increase and the
problem will vanish when it reaches a level which is typical of developed
countries all over the world. Until then however, it is recommended that
all elements on the diagonal be set at zero before calculating the respec-
tive quantiles.
As mentioned above, using the dm does not require the number of
clusters to be determined by the researcher. Instead, the level of sig-
nificance a can be adjusted for the same result, i. e. to obtain the pre-
determined number of clusters identified within the economy. However,
there is no unambiguous relationship between the level of significance
a and the number of uncovered clusters. With a higher value of pa-
rameter a, more entries in the matrix of interest emerge as important
linkages and nothing further. Under some particular circumstances, this
may lead to the inclusion within clusters of sectors previously excluded
at the lower level of significance a, or even to identifying new clusters.
Nonetheless, it should be stressed that the relationship: the higher pa-
rameter a, the more clusters uncovered is not automatically true. More
strictly speaking, for some range of values of parameter a such a rela-
tionship holds true, but for another it does not. To realise this, suppose
that parameter a is equal to zero, implying that matrix R consists of ze-
ros, exclusively. Without significant entries in the respective matrix, no
cluster will be identified. In contrast, when parameter a equals 100%,
matrix R is formed with ones. Having only sectors with significant link-
ages with each other, all the sectors are now included within the same
identified single cluster. It becomes immediately obvious that for a to
belong to the range from 0 to k the number of identified clusters is a
non-decreasing function of a, but if a belongs to the range from k to 1,
the number of identified clusters is a non-increasing function of a. For
the sake of simplicity, we abstract from a situation where the number of
clusters is serially a non-decreasing and non-increasing function of a for
a e (0,ki) U (ki,k2) U ... U (kn, 1).
Therefore, k is a threshold, in excess of which the number of uncov-
ered clusters within the economy can only diminish. This is accompanied
by a trend towards the joining of clusters, thus obscuring the real cluster
structure we want to explain and explore. Theoretically, with a simplified
example like this, it is possible to determine a threshold k, not analyti-
cally of course but only practically by trial-and-error, and by selecting a
level of significance a which is below or even equal to this threshold to
avoid blurring the economy's structure. In practice, however, when it is
likely that the number of clusters is serially a non-decreasing and non-
increasing function of a, we would have to estimate the n threshold of ki
(where n is the number of changes in direction). Even if the values of the
threshold for any i are known, a still unanswered question is which value
out of the n thresholds should be used to maximize the transparency of
the identified relationships among the sectors operating within the real-
world economy. In the light of the above-mentioned difficulties, it is ob-
vious that an effort should be undertaken to develop another approach
which, being aimed at identifying clusters of sectors within the economy,
distinguishes between linkages of sectors belonging to the same cluster,
relationships between two sectors arising from different clusters, and the
linkages of sectors within clusters with the rest of the sectors classified as
outside the clusters.
Defining the problem, it should be stated that the main shortcoming
of the dm stems from its inability to classify sectors which may poten-
tially be assigned to more than one cluster. Without a rule for grading
linkage strength, in the case when a given sector has relationships with
two different clusters, the dm blurs the economy's structure by joining
these clusters into one. Hence, the method proposed by us, the mdm,
should be first and foremost provided with an operational principle
which enables the alternative allocations of a given sector to be ranked
in respect to the strength of inter-industry linkages associated with each
of them.
Seeking a solution to this problem, it becomes apparent immediately
that the application by the dm simple categorization of inter-industrial
relationships, according to their magnitudes, as significant or not is no
longer valid, and that the significant relations among industries have to
be further broken down if the proposed method is to give any advantages
to the researcher who is irritated by the necessity to choose the thresh-
old k in such a way so as to maximize the number of clusters uncovered
without any, even the slightest, guide as to how it should be reached in
practice. We would gain little, if anything, by setting leaning a division of
the significant - in terms of magnitude - inter-industrial relationships
on the absolute value principle since this amounts to introducing two
unattached thresholds, instead of the one acting under the original diag-
onalization method, which would be hardly able to avoid the tendency
to amalgamate different clusters into one with a increasing. What we
propose in this paper is to look at a given connection from the perspec-
tive of co-relative industries. Each inter-sectoral relationship expressing
the actual flow of goods and services which takes place between two in-
volved industries can be considered, at least, from two different view-
points, namely, the side of industry that sells its output and the side of
industry that purchases it. Whereas in the case of the former, a question
we have to answer concerns an issue as to what is the position of the flows
in the hierarchy of the selling industry. In the case of the latter, we will be
rather concerned with the degree to which that same flow is crucial for
the purchasing industry.
Suppose that the element located in the ith row and the jth column of
the matrix Z, i. e. zij, is deemed as significant, no matter what the term
'significant' exactly means here. Then the ith sector is a supplier (seller)
whilst the jth sector represents the demand side (buyer). Note that under
the dm, such sectors connected by the element zij would be automati-
cally considered as composing the same cluster. But now, such a conclu-
sion will be valid if, and only if, in the ith row there is no element that
would be greater than the element placed on the jth column and, at the
same time, in the jth column there is no element exceeding those located
in the ith row. This means that here we focus our attention on the rel-
ative position of the flow at hand, on the list of deliveries of the selling
industry as well as on the list of purchases of the buying industry. Only
if the considered flow is ranked first in terms of both the lists, will the
situation be equivalent to those presupposed by the dm, and the corre-
sponding relationship will be referred to as internal or primary (from a
given cluster's point of view). On the other hand, if at least one requisite
mentioned above is violated, i. e. if there is a larger flow in the ith row
or in the jth column than the element zij, then such a relationship, its
significance remaining, will be termed as external or secondary.
Before going on to outline the procedure of the suggested approach,
which is based on a distinction made between internal and external (or
primary and secondary) inter-industrial relationships, it is necessary to
point out one, rather practical, issue. Consider a situation in which zij
and zji are both significant and zij is larger than zji. It might well happen
that zij would be categorized as a secondary relation and zji as a primary
one since there is nothing that would guarantee that both the relation-
ships will be simultaneously considered as internal or external and this
may occur only by chance. In order to prevent such an ambiguous find-
ing, it is necessary to introduce the additional conventional principle that
with zij and zji being significant, only that relationship is subject to fur-
ther consideration which is larger, i. e. zij in our simple example.
The proposed approach draws upon several elements that have been
utilized under the original diagonalization method. The first step con-
sists of creating the restriction matrix (Q), but unlike its predecessor
which yields a binary matrix, a three-value-coding is used here to al-
low for discerning between external and internal linkages. Assuming the
matrix of deliveries (Z) is to be chosen as a basis for calculation, which
however does not impair its generality, we can express the restriction
matrix as:
qkl =
2 if akl > q(A, 1 - a) A bkl > q(B, 1 - a)
AZkl > Zkj > AZkl > zu, Vi + k, j + l
.	(5)
1 if aki > q(A, 1 - a) A bki > q(B, 1 - a)
0 otherwise
Note that all the input-output matrices involved in (5), i. e. A, B, and
also Z in our example, should be at first prepared in such a way that
the elements located on their main diagonal are all set to zero. This is
a prerequisite for any further calculation, because without neutralizing
the effect of the main-diagonal elements of the respective matrices, the
method might produce erroneous results. A reason for this stems not
only from the fact that an inclusion of the main-diagonal elements in
the calculation of the quantiles in (5) may lead to overestimation of their
values, but large entries placed on the main diagonal of the matrix used
as a base, are able to change what we will regard as an internal relation
or external relation.
The second step is almost identical to the corresponding step under
the dm with one exception. Starting with the restriction matrix and us-
ing Hoen's algorithm (see Hoen 2002) we then try to transform it into a
block-diagonal matrix with respect to the internal inter-industrial rela-
tionships only (the entries with a '2' digit) and allow the elements repre-
senting the external relations (with a '1' digit) to change their positions
freely with row-wise and columnwise permutations that are necessary to
complete Hoen's algorithm. In other words, while transforming the re-
striction matrix into a block-diagonal one, we treat all the entries which,
according to (5), are equal to one as if they are set to zero, but without
losing information concerning their positions once the algorithm is ter-
minated.
The transformed restricted matrix can be interpreted in terms of ex-
ternal and internal relationships as follows. Each block, as discovered by
Hoen's algorithm, represents a single cluster of industries among which
only internal relationships occur. The elements of the restriction ma-
trix pertaining to external relationships indicate, therefore, either an
inter-cluster linkage or connection of a given cluster with the rest of the
economy that are composed of the industries not being assigned to any
cluster.
What do we gain from using the suggested method? For instance, if a
certain sector is significantly connected with two other sectors belong-
ing to different clusters, it will be assigned to the cluster whose linkage is
stronger. Information about the existence of a significant, although rela-
tively weaker linkage with the sector belonging to other cluster is not lost,
however, because such a linkage is automatically classified as an external
one. In this way the proposed method prevents an unreasonable joining
of clusters without omitting significant relationships among the clusters
(or between the clusters and the rest of the economy) and, therefore, of-
fers a better insight into the real structure of the economy. These benefits
would, however, appear illusory, if the method does worse in terms of
other properties such as, for example, its soundness in the selection of
a basic matrix for calculation, or the ratio of average flows among the
sectors included within clusters to the analogous value for sectors out-
side the clusters. Therefore, in the following sections we empirically test
the properties of the suggested method and compare the obtained results
with those produced by its predecessor, i. e. the dm.
Data Description
When illustrating the mdm's capacity for uncovering the cluster struc-
ture as compared with that of the dm, a sufficiently high level of disag-
gregation of the input-output tables used as the basis for computation
is of prime importance. Although the initial number of distinct sectors
within an economy differs significantly across empirical investigations
concerning the problem of identifying clusters of industries, it can be
found that such tables almost always distinguish no less than a hundred
sectors (see e. g. Hauknes 1998). In order to find out the relative perfor-
mance of the method discussed above, we therefore used several national
input-output tables for different dates which deal with at least one hun-
dred different sectors and which, of course, were available to us. To be
more precise, the tables from three countries were engaged in the sam-
ple, namely the us tables for various dates, the Danish tables for various
dates, and the uk tables for 1995. In order to keep the presentation of re-
sults short, and also because the main conclusions about the promising
nature of the mdm remain basically the same irrespective of which coun-
try and which date are selected, we decided not to present all outcomes
but only those being obtained by using the uk input-output tables for
1995. Other results are available from the authors on request.
The above-mentioned tables, derived directly from the National Statis-
tics, are evaluated at current prices from the seller's point of view (basic
prices) and are fully consistent with the European System of Accounts
1995 (esa 95). The original statistics provide coverage of the economy
as a whole combined with 138 industries/products using the Standard
Industrial Classification 1992 (sic 92). The last fifteen entries of the re-
spective tables, however, arose from dividing some industries/products
associated with Government and non-profit institutions serving house-
holds into market and non-market components. Doing so, the tables give
table 1 Industry classification
1 Agriculture	32	Pulp, paper and paperboard
2 Forestry	33	Paper and paperboard products
3 Fishing	34	Printing and publishing
4 Coal extraction	35	Coke ovens, refined petroleum and nuclear fuel
5 Oil and gas extraction	36	Industrial gases and dyes
6 Metal ores extraction	37	Inorganic chemicals
7 Other mining and quarrying	38	Organic chemicals
8 Meat processing	39	Fertilisers
9 Fish and fruit processing	40	Plastics and synthetic resins etc
10 Oils and fats	41	Pesticides
11 Dairy products	42	Paints, varnishes, printing ink etc
12 Grain milling and starch	43	Pharmaceuticals
13 Animal feed	44	Soap and toilet preparations
14 Bread, biscuits, etc	45	Other chemical products
15 Sugar	46	Man-made fibres
16 Confectionery	47	Rubber products
17 Other food products	48	Plastic products
18 Alcoholic beverages	49	Glass and glass products
19 Soft drinks and mineral waters	50	Ceramic goods
20 Tobacco products	51	Structural clay products
21 Textile fibres	52	Cement, lime and plaster
22 Textile weaving	53	Articles of concrete, stone etc
23 Textile finishing	54	Iron and steel
24 Made-up textiles	55	Non-ferrous metals
25 Carpets and rugs	56	Metal castings
26 Other textiles	57	Structural metal products
27 Knitted goods	58	Metal boilers and radiators
28 Wearing apparel and fur products	59	Metal forging, pressing, etc
29 Leather goods	60	Cutlery, tools etc
30 Footwear	61	Other metal products
31 Wood and wood products	62	Mechanical power equipment
Continued on the next page
a better insight into the inter-industrial relationships taking account of
the differences in proportions of inputs when an industry's products are
sold on market principles as opposed to the case where market mecha-
nisms do not apply. Taking into account the purpose of our investigation,
however, we decide not to distinguish market and non-market compo-
nents and to aggregate the tables into 123 industries/products (see table 1
for the list of sectors), avoiding in this way the zero-row or zero-column
problems which would have to be corrected if we did not reduce the
number of sectors to the above-mentioned 123.
The original statistics enable the analysis to be carried out either
on commodity-by-industry or commodity-by-commodity basis. With
table 1 Continued
63	General purpose machinery
64	Agricultural machinery
65	Machine tools
66	Special purpose machinery
67	Weapons and ammunition
68	Domestic appliances nec
69	Office machinery and computers
70	Electric motors and generators etc.
71	Insulated wire and cable
72	Electrical equipment nec
73	Electronic components
74	Transmitters for tv, radio and phone
75	Receivers for tv and radio
76	Medical and precision instruments
77	Motor vehicles
78	Shipbuilding and repair
79	Other transport equipment
80	Aircraft and spacecraft
81	Furniture
82	Jewellery and related products
83	Sports goods and toys
84	Miscellaneous manufacturing nec and recycling
85	Electricity production and distribution
86	Gas distribution
87	Water supply
88	Construction
89	Motor vehicle distribution and repair,
automotive fuel retail
90	Wholesale distribution
91	Retail distribution
92	Hotels, catering, pubs etc.
93	Railway transport
94	Other land transport
95	Water transport
96	Air transport
97	Ancillary transport services
98	Postal and courier services
99	Telecommunications
100	Banking and finance
101	Insurance and pension funds
102	Auxiliary financial services
103	Owning and dealing in real estate
104	Letting of dwellings
105	Estate agent activities
106	Renting ofmachinery etc
107	Computer services
108	Research and development
109	Legal activities
110	Accountancy services
111	Market research, management consultancy
112	Architectural activities and technical consultancy
113	Advertising
114	Other business services
115	Public administration and defence
116	Education
117	Health and veterinary services
118	Social work activities
119	Sewage and sanitary services
120	Membership organisations nec
121	Recreational services
122	Other service activities
123	Private households with employed persons
commodity-specific technologies being a more persuasive assumption
in an input-output framework, the latter form of tables are used in the
entire study. It should, however, be stressed that the findings do not dif-
fer significantly with respect to both techniques of deriving input-output
tables in their quadratic form.
Empirical Results
Neither of the above-outlined methods of identifying clusters necessi-
tates the number of distinct clusters to be specified in advance. Instead,
changing the level of significance a gives the same result, that is, the de-
sired number of identified clusters is achieved. But comparing the rela-
tive performances of the methods at a fixed level of significance may be
rather confusing due to the fact that each of these methods is likely to
have a different range of a over which reasonable results are produced.
Another, and perhaps more revealing, way of illustrating their imple-
mentation is by imposing a fixed number of identified clusters without a
concern for the level of significance at which it is achieved in the case of
each of the methods. Following the latter, we selected the number of clus-
ters to be identified by means of each of the methods under consideration
(16 clusters were chosen), and then the level of significance was adjusted
until the specified number of clusters was reached. While the dm gives
the desired number of clusters at a level of significance amounting to
0.5%, the corresponding value for the mdm is equal to 1.32%. Perhaps, it
should be here remembered that a (or level of significance at which clus-
ters are evaluated) is simply a single complement of quantile order that
is calculated on the basis of the matrix elements (see the equations (4)
and (5)). Note also that the input-output matrices involved in computa-
tion were first adjusted for main-diagonal element effects by imposing a
zero diagonal principle (all the corresponding entries were set to zero),
so that what constitutes a basis for calculation of the quantiles are only
the off-diagonal elements of the respective matrices.
Figure 1 depicts the cluster structure of the uk economy for 1995 based
on the dm. With the help of this it becomes immediately obvious why as
many as 16 clusters need to be identified to compare the alternative ap-
proaches. Of the 16 uncovered clusters only 7 are composed of three or
more industries. Surprisingly, as will be shown later on, the same pro-
portion of so-called mini-clusters, where only one inter-industry rela-
tionship constitutes a cluster, applies to the results obtained by means
of the mdm. Consideration of such mini-clusters is not of interest from
the practitioner's point of view, though they have some informational
content. Searching for significant levels at which the problem of mini-
clusters does not exist, although theoretically possible, would lead to a
worse transparency of results and compromise the overall comparison of
alternative methods of identifying clusters, which is the main aim of this
study. Preferring to preserve the transparency of our results as far as pos-
sible, we decide not to adjust for mini-clusters, but hardly any attention
will be given to such clusters due to their minimal economic importance
for a practitioner. Despite this, the full results are presented, including
mini-clusters, to facilitate for interested readers an insight into the real
economic landscape of the economy under study. One important con-
sideration which can be drawn from the relative high ratio of mini- to all
figure 1 Clusters in the uk economy for 1995 based on the Diagonalization Method
identified clusters is that this problem remains unsolved no matter which
of the methods is used.
The largest cluster (ii) consists of as many as 28 industries which
are rather diverse in terms of their activities, ranging from Forestry
through Textiles and Distribution (both wholesale and Retail) to Con-
figure 2 Clusters in the uk economy for 1995 based on the Modified
Diagonalization Method
struction and Transportation. The high level of activity diversification
in this mega-cluster poses some difficulties when one tries to give it a
name. This cluster could be, for example, referred to as the Wood-textile-
construction cluster, because all of these kinds of activities are strongly
represented within its structure. However, almost everyone will agree
with us that this name is rather ungainly. No matter whatever name
this cluster is given, it is more important that it, in fact, obscures the
actual relationships among industries constituting the cluster structure
since some sub-clusters are likely to be sensibly singled out. One such
sub-cluster might include, for example, such industries as Forestry (2),
Wood and wood products (31) and Furniture (91). Another one might
consist of Textile weaving (22), Textile finishing (23) and Other textiles
(26). Furthermore, Sugar (15) and Confectionery (16), Cement, lime and
plaster (52) and Articles of concrete (53) as well as Plastics and Synthetic
resins (40) and Plastic products (48) are other exemplifications of pairs
of industries which should be probably considered as sub-clusters of the
mega Wood-textile-constriction cluster.
Other clusters, excluding the mini-clusters mentioned above, sug-
gested by the dm seem to be better defined. The Agro cluster (1) includes
such industries as Agriculture (1), Meat processing (8), Dairy products
(11), and Animal feed (13), Fertilisers (39) and Pesticides (41) as well. In-
cluded in the Energy cluster (iii) are only three industries, namely Oil
and gas extraction (5), Coke ovens and refined petroleum (35), and Gas
distribution (86). As large as the Agro cluster is the Metal and Machin-
ery cluster (iv) with six industries, which is followed by the Connection
and Financial cluster (xiv) with four industries: Banking and finance
(100), Insurance (101), Auxiliary financial services (102) as well as Postal
and courier services (98). Two other clusters identified by the dm are the
Paper cluster (xiii) and the Weapons and Shipbuilding cluster (xi) each
of which consists of three industries.
Figure 2 shows the clusters obtained by means of the mdm. As men-
tioned above, the number of clusters identified is the same as with the
dm, but one can see that there exist substantial differences in terms of
the composition of each of them when comparing the two alternative
approaches presented here. Note also that unlike the dm, the mdm pro-
vides information about the inter-industry relationships of two kinds. In
order to distinguish them, a full line is used to denote inter-cluster rela-
tionships among industries, whereas a dotted line means external rela-
tionships where out of two intertwined industries one is outside clusters.
There are only three such external relationships in the figure.
One can see that now the largest is the Metal and Machinery cluster
(b) with nine intertwined industries. It is useful to state right at the be-
ginning that the results based on the mdm, with the exception of the
Energy cluster (c), in which the component industries are exactly the
same as those obtained by means of the dm, and despite the same labels
(names) in some cases as previously used, the component industries of
the respective clusters are completely different under the mdm. The cur-
rent Metal and Machinery cluster is composed of such industries among
others as Metal forging and pressing (59), Mechanical power equipment
(62)	and Motor vehicles (77), which were previously classified as belong-
ing to cluster iv as well as Iron and steel (54) and Miscellaneous manu-
facturing and recycling (84) grouped the first time into the two-element
cluster ix. It also deserves to be emphasized that some industries in this
cluster, mainly Other metal products (61), General purpose machinery
(63),	Special purpose machinery (66), and Aircraft and spacecraft (80),
were set aside when using the dm.
The subsequent Wood and paper cluster (a) is ranked second with re-
spect to its size. Out of five industries included within it, three (Forestry
(2), Wood and wood products (31), and Furniture (81)) were originally in
cluster ii, whereas the following two (Pulp, paper and paperboard (32)
and Paper and paperboard products (33)) were previously grouped into
cluster xiii. The Textiles cluster (d), on the other hand, is formed par-
tially from cluster vii (Textile fibres (21) and Knitted goods (27)) and
partially from the mega cluster in which Textile weaving (22) was previ-
ously classified.
Interestingly, the Transportation and telecommunication cluster (e) is
entirely formed by breaking down the mega cluster identified by the dm.
One can, however, see that Railway transport (93) and Ancillary trans-
port services (97) are here connected via an external bi-directional rela-
tionship, and not what is suggested by the dm. Also, in the case of mini-
cluster i we find that it emerged from the same mega cluster, but that
now there exists an external relation between this cluster and cluster h
entirely formed by breaking down the Connection and Financial cluster
(xiv).
The last two clusters that will be given attention are clusters f and g.
The former emerged as a part of the Agro cluster (i), whereas the latter
consists of Grain milling and starch (12), and Bread and biscuits (14) be-
ing previously grouped into the mega cluster as well as Animal feed (13)
which previously belonged to the Agro cluster and is now revealed to be
externally interrelated with Agriculture (1) from cluster f.
Another way of dealing with externally interrelated clusters is by treat-
ing all the individual clusters among which there exist external relation-
ships as a single cluster with sub-clusters. Following this approach re-
sults in joining the pairs of clusters f and g as well as h and i, previously
treated separately.
A desirable feature of any method for identifying clusters of indus-
table 2 Comparison of the strength of identified relationships
under the two alternative methods
Diagonalization Method	Modified Diag. Method
	(1)	(2)	(3)	(1)	(2)	(3)
Q1.	0.009	0.000	0.000	0.013	0.000	0.000
Mean	20.336	8.920	5.813	25.339	10.111	7.772
Median	1.721	0.790	0.547	2.141	0.925	0.700
Q3.	102.227	21.292	18.453	102.831	27.020	50.767
Std. deviation	796.375	93.281	96.200	486.315	117.746	744.978
Note that all figures are in £ million.
tries is that it should produce the same results independently of which
of the alternative matrices is used as a basis for calculations. As men-
tioned above, the dm has this feature which becomes immediately ob-
vious when taking the manner of forming the R matrix (see (4)) into
account. But, it turned out that the mdm also results in always forming
the same clusters for intermediate deliveries, input and output coeffi-
cient matrices. Only using the Leontief inverse matrix produces different
clusters, and sometimes the differences were rather essential. Neverthe-
less, this disadvantage still appears to be outweighed by its benefits in
uncovering clusters of better transparency when compared with the dm.
A more important question, however, is whether both methods perform
comparably in terms of the strength of identified relationships among
industries within and outside the clusters. To answer this question some
helpful descriptive statistics were computed for the intermediate deliver-
ies matrix. The results are reported in table 2.
Due to the more careful way of assessing the significance of inter-
industry relationships applied by the mdm, in that some of them are
classified as external ones, it should be expected that the inter-cluster re-
lationships will be stronger than those suggested by the dm. As one can
see, this anticipation finds full confirmation from the figures in table 2.
Mean inter-cluster flow under the dm equals over £20 million, whereas
using the latter method it is a further £5 million greater. In addition to
this, it turned out that the dispersion around average value measured by
standard deviation for the mdm is approximately 60% of that obtained
under the former method. On the other hand, for the same reasons we
find that the average values of flows and their standard deviations be-
tween industries within clusters and outside them, as well as among in-
dustries beyond clusters, are both greater in the case of the mdm as com-
pared to the dm.
Conclusions
This paper aims at improving methods designed to identify industry
clusters by explicitly distinguishing between an internal and external re-
lationship depending on whether two intertwined industries are grouped
into the same or different clusters, or whether one of intertwined indus-
tries is classified outside the clusters. The most interesting finding of this
study is that the mdm appears to produce a resultant cluster structure
which is superior in some respects to that of the alternative method. In
particular, the cluster structure under the mdm seems to be more trans-
parent and more easily interpretable. Furthermore, as our experiment
has shown, using this method does not necessarily entail worse perfor-
mance in terms of the strength of identified relationships among indus-
tries within and outside the clusters. On the contrary, they even appear
better.
We are, however, aware that the proposed method still leaves unsolved
many other problems that one can encounter in investigating the indus-
try clusters in a real-world economy. These include, for example, the so-
called mini-cluster problem, and some inconvenience rooted in the fact
that the choice of a suitable level of significance under the mdm may still
be regarded as rather arbitrary. Perhaps further theoretical and empiri-
cal efforts in this field will help to overcome the common drawbacks of
methods of identifying industry clusters, and contribute to reducing the
extent of arbitrary decisions in these kind of analyses.
References
Cmiel, A., and H. Gurgul. 2002. Application of maximum entropy prin-
ciple in key sector analysis. Systems Analysis Modelling Simulation
42:1361-76.
Antonelli, C. 1999. The microdynamics of technological change. London:
Routledge.
Bekele, G. W., and R. Jackson. 2006. Theoretical perspectives on industry
clusters. Research Paper 2006-5, West Virginia University.
Gurgul, H., and P. Majdosz. 2005. Key sector analysis: A case of the tran-
sited Polish economy. Managing Global Transitions 3 (1): 95-111.
Hauknes, J. 1998. Norwegian input-output clusters and innovation pat-
terns, step report R-15.
Hoen, A. 2002. Identifying linkages with a cluster-based methodology.
Economic Systems Research 14 (2): 131-45.
Hoover, E. M. 1937. Location theory and the shoe and leather industries.
Cambridge, ma: Harvard University Press.
Hoover, E. M. 1948. The location of economic activity. New York: McGraw
Hill.
Krugman, P. 1991. Geography and trade. Cambridge, ma: mit Press.
Krugman, P., and A. J. Venables. 1996. Integration, specialization, adjust-
ment. European Economic Review 40:959-68.
Marshall, A. 1890. Principles of economics. London: Macmillan.
Martin, R., and P. Sunley. 2003. Deconstructing clusters: chaotic concept
or policy panacea? Journal of Economic Geography 3 (1): 5-35.
Munroe, D. K., and G. J. D. Hewings. 2000. The role of intraindustry trade
in interregional trade in the midwest of the us. Discussion Paper 99-
t-7, University of Illinois.
Ohlin, B. 1933. Interregional and international trade. Cambridge, ma: Har-
vard University Press.
Porter, M. E. 1998. On competition. Boston: Harvard Business Review
Press.