The Modified Diagonalization Method for Analysing Clusters within Economies Henryk Gurgul Pawel Majdosz In this paper a modification of the diagonalization method, originally put forward by Hoen (2002), is suggested which is aimed at uncovering clusters of sectors within an input-output framework. Our interest in this subject was largely motivated by the fact that the preceding method appears to be incapable of providing us with an accurate representation of the real cluster structure that exists in an economy, as a consequence of missing the position at which a given inter-sectoral flow stands in the hierarchy of the purchasing industry and the supplying industry. By making a distinction between an internal and external relationship, when it comes up at the moment of deciding whether each pair of in- dustries is categorized as belonging to the same or different clusters, the proposed alternative, which will be referred to as the modified di- agonalization method, seems to be superior to its predecessor. Such a conclusion is supported by the results of comparison of the relative performance of the rival methods (i. e. the original and modified diag- onalization method) which show, among other things, that the average value of flows between industries grouped into clusters is higher in the case of the proposed method. Key Words: internal and external interindustrial relationships, diagonalization method, clusters jEL Classification: B41 Introduction Industry clusters are nowadays an intrinsic element of the economic landscape of almost every country all over the world. Cluster-related problems have been viewed from various perspectives (spatially, inter- industrially and intra-industrially) and in varying contexts (see e.g. Dr Henryk Gurgul is a Professor at the Department of Economics and Econometrics, University of Science and Technology, Poland. Dr Pawel Majdosz is an Assistant Professor at the Department of Economics and Econometrics, University of Science and Technology, Poland. Managing Global Transitions 6 (1): 53-73 Munroe and Hewings 2000). Theoretical interest in the concept of clus- tering is first and foremost associated with classical work on agglom- eration in which the process of clustering is typically explained by the presence of externalities such as economies of scale and scope, which give economic advantages to firms agglomerating in a certain locality (see e. g. Hoover 1937; 1948; Marshall 1890; Ohlin 1933). Over the course of decades, there has been a tremendous development in the literature concerning this subject, and a comprehensive review of it is provided by Bekele and Jackson (2006). In the work of the mentioned authors the interested reader can find more details. These theoretical approaches have been accompanied by empirical in- vestigations of the clustering phenomenon which have found evidence supporting the presumption that an industry cluster allows enterprises to reduce costs, uncertainties and risks (see e. g. Antonelli 1999; Krugman 1991; Krugman and Venables 1996; Porter, 1998). Re-orienting economies towards a knowledge-based model has entailed a growing awareness that competitive success increasingly involves innovations and continuous quality improvements. The role of industry clusters has again appeared to be of crucial importance, this time, as a mechanism that enables in- novations to spread throughout the economy (see e. g. Hauknes 1998; Martin and Sunley 2003). Despite the importance of cluster analysis in explaining and exploring real economic structure, the effort which has been undertaken to im- prove the methods of cluster identification hardly gives an answer to the most urgent questions and there are still many problems unsolved in the face of which researchers may rely only on their own intuition. Overall, the practical application of cluster analysis poses many difficulties, espe- cially since there is no uniform definition of what we should consider as a cluster. Definitions used in practice frequently depend on the particular ends to which a given study is subordinated. On the basis of the applied definition of cluster, one then decides which of the alternative methods of grouping sectors into clusters is employed. Confining our primary in- terest to the input-output context exclusively, it might be noticed that if great emphasis is placed in the definition upon intermediate deliveries between sectors within the economy, it is likely that the cluster identifi- cation method will be based on the matrix of intermediate deliveries. If attention is focused on the inputs of the ith product which are required for a one-unit increase in the production of the jth sector, the matrix of input coefficients may be selected as a base. If what is more important are inputs of the ith product required to meet the demand of (all) other sectors associated with a one-unit increase in the final demand of the jth sector, rather than those necessary for such an increase in the produc- tion of the jth sector, the inverse Leontief matrix will be probably used in place of the matrix of input coefficients. And finally, if the cluster defini- tion stresses the extent to which the production of the ith sector is used by the jth sector in order to contribute to further production, the ma- trix of output coefficients may appear to the researcher to be particularly useful for dealing with the cluster identification problem. The existence of many methods that can be used in practice for identi- fying inter-industrial relationships composing clusters contributes to the emergence serious difficulties with result comparability and could lead to a misinterpretation of the obtained evidence. It seems to be particularly controversy-prone when one of the rules underlying a technique em- ployed is an economic meaning of the resulting clusters. By allowing this principle, any potential cluster that is an outcome of the formal proce- dure used in a first stage can then be cast away by the researcher as being deprived of precise economic meaning. The most important question that must be answered in all those cases where an economic meaning is a valid rule of the overall cluster identification procedure is to which extent the obtained results describe the actual cluster structure within the econ- omy or, speaking more clearly, whether an exclusive goal of it is not, in fact, to support the presupposed statements and beliefs of the researcher. As long as different researchers will be tempted to base their work on the questionable principle of the economic meaning of identified clusters, any effort to improve and develop the rigorous methods for uncovering cluster structure, which could restrict the scope of arbitrariness related to the empirical studies that are undertaken in this momentous field, should be welcome. In this paper a method designed to identify industry clusters within the economy is suggested. The main advantage of the method stems from the fact that it relies on a three-level classification of interindus- trial linkages, thereby making it possible to discern between internal (or intra-cluster) relationships and external ones which connect industries belonging to different clusters, or industries within clusters with those outside. The suggested method is presented in the context of its predeces- sors, such as the diagonalization method (dm) proposed by Hoen (2002), and an extensive comparison using data from different countries and dates is carried out to enable the relative performances of these meth- ods to be ascertained. The major conclusion that can be drawn from the results is that the proposed method tends to produce structures of in- dustry clusters which seem to be more plausible and, at the same time, the level of internal relationships measured by the average value of flows among industries grouped into clusters is even higher than that of those obtained by means of the dm. The remainder of the paper is organized as follows. In the next section several customarily used methods of identifying industry clusters are outlined and the alternative, i. e. the modified diagonalization method (mdm) is suggested. The third section contains a brief description of the data used in comparing the relative performances of the alternative methods. The empirical results are reported in the fourth section, while the fifth section concludes the paper. Cluster Identification Methods To facilitate a comparison of the proposed method with those hitherto- existing, we start with an outline of the latter, thereby giving a back- ground against which the method suggested in this paper is formed as a remedy for some of their shortcomings. As was mentioned in the previ- ous section, one characteristic that may make one method for identifying clusters different from another is that it may use the input-output matrix of a different sort as the base for computation. Let x denote the vector of gross output and Z be the matrix of intermediate deliveries whose ele- ments show the amounts of output of the ith sector that are sold to the jth industry to maintain its own production. Then, the related matrices, i. e. the matrix of input coefficients (A), the Leontief inverse (D), and the matrix of output coefficients (B), can be derived, in turn, as follows: A = ZX-1 (1) D = (I - A)-1 = L-1 (2) B = X~1Z (3) where a hat is used to denote the diagonal matrix with elements of the vector on its main diagonal and all other entries equal to zero. The simplest algorithm of grouping sectors into clusters, and the most frequently used in practice, the so-called method of maximization (mm), involves finding the largest off-diagonal element of a selected matrix (suppose that this matrix is A) and joining the sectors with the largest amount of intermediate deliveries until the number of clusters identified in such a way reaches a previously fixed threshold. The only advantage of this algorithm arises from its simplicity. However, it suffers from the so-called mega cluster problem and the fact that the obtained solution is sensitive to the matrix chosen as a basis of computation. In other words, although there are relationships among the matrices (as expressed by (1)- (3)), each of them embodies different information, and it may happen that the clusters yielded by this algorithm will be completely different when we use, for example, the Z in place of A or D instead of the B ma- trix. Similar to the above-described method of uncovering clusters within the economy is the method of restricted maximization (rm). Roughly speaking, whilst the former takes into account all off-diagonal elements of the respective matrix to form the clusters, the latter focuses only on those which are large enough to satisfy the imposed restrictions. Such restrictions may be expressed in different ways, but without knowledge of the distribution of the matrix elements, it is usually formulated as a multiplication factor of the matrix average. This approach appears to be more flexible for several reasons. It allows the involvement of two or even all the matrices under consideration simultaneously, by imposing a conjunction of the respective restrictions for the single matrix. The method does not require the number of clusters to be determinate at the beginning of the investigation; instead one can adjust the value of the multiplication factor to obtain the same effect. Conversely, it does not deal with the possibility of varying clusters when different matrices are used as the basis for calculation. Hoen (2002) put forward the dm which appears to be superior to the above-described ones. The first stage of this approach involves formulat- ing a binary matrix (R), holding ones and zeros, which is given as: where q(X, p) stands for quantile of the elements of X at the order p. Then, by permuting its rows and columns, these are an attempt to transform matrix R into a block-diagonal matrix so that each of its blocks represents one cluster. The reasoning underlying this method is that the cluster should encompass all sectors which are connected to each other and, at the same time, unattached from the rest of the economy. Chang- ing the level of significance a, we decide which intermediate deliveries, 1 if aij > q(A, 1 - a) A bij > q(B, 1 - a) 0 otherwise (4) input coefficients and output coefficients are regarded as significant and which are insignificant and to be fixed at zero. The higher the level of a, the more linkages there are among the sectors within the economy, but not always more clusters are identified as Hoen himself would wish. This issue will be given more attention later on. As Hoen (2002) documented, the dm solves the problem of changes in the composition of clusters with respect to the matrix used as a basis, and facilitates a better insight into the structure of an economy by not requiring the number of existing clusters to be specified at the beginning of the investigation. However, this approach also has several drawbacks. One is that the respective quantiles are obtained on the basis of all matrix elements, including those located on the principal diagonal. When this method is used for comparative purposes, i. e. to examine how far the composition of clusters is unchangeable over time, it seems to be rea- sonable to fix parameter a at the same level during the whole analysed period. But the two sets of results obtained by this means from differ- ent points are comparable only if there is no change in the proportion of elements placed on the principal diagonal to the off-diagonal elements. This is probably not a problem in the case of developed countries where the off-diagonal elements of the respective matrices dominate those lo- cated on the principal diagonal, indicating that each sector assigns only a small portion of its own current outputs in order to contribute to further production, but for transitional countries this ratio is reversed (see e. g. Cmiel and Gurgul 2002; Gurgul and Majdosz 2005). Within a few years of market-oriented reforms the share of off-diagonal flows within the economies of the countries in transition is expected to increase and the problem will vanish when it reaches a level which is typical of developed countries all over the world. Until then however, it is recommended that all elements on the diagonal be set at zero before calculating the respec- tive quantiles. As mentioned above, using the dm does not require the number of clusters to be determined by the researcher. Instead, the level of sig- nificance a can be adjusted for the same result, i. e. to obtain the pre- determined number of clusters identified within the economy. However, there is no unambiguous relationship between the level of significance a and the number of uncovered clusters. With a higher value of pa- rameter a, more entries in the matrix of interest emerge as important linkages and nothing further. Under some particular circumstances, this may lead to the inclusion within clusters of sectors previously excluded at the lower level of significance a, or even to identifying new clusters. Nonetheless, it should be stressed that the relationship: the higher pa- rameter a, the more clusters uncovered is not automatically true. More strictly speaking, for some range of values of parameter a such a rela- tionship holds true, but for another it does not. To realise this, suppose that parameter a is equal to zero, implying that matrix R consists of ze- ros, exclusively. Without significant entries in the respective matrix, no cluster will be identified. In contrast, when parameter a equals 100%, matrix R is formed with ones. Having only sectors with significant link- ages with each other, all the sectors are now included within the same identified single cluster. It becomes immediately obvious that for a to belong to the range from 0 to k the number of identified clusters is a non-decreasing function of a, but if a belongs to the range from k to 1, the number of identified clusters is a non-increasing function of a. For the sake of simplicity, we abstract from a situation where the number of clusters is serially a non-decreasing and non-increasing function of a for a e (0,ki) U (ki,k2) U ... U (kn, 1). Therefore, k is a threshold, in excess of which the number of uncov- ered clusters within the economy can only diminish. This is accompanied by a trend towards the joining of clusters, thus obscuring the real cluster structure we want to explain and explore. Theoretically, with a simplified example like this, it is possible to determine a threshold k, not analyti- cally of course but only practically by trial-and-error, and by selecting a level of significance a which is below or even equal to this threshold to avoid blurring the economy's structure. In practice, however, when it is likely that the number of clusters is serially a non-decreasing and non- increasing function of a, we would have to estimate the n threshold of ki (where n is the number of changes in direction). Even if the values of the threshold for any i are known, a still unanswered question is which value out of the n thresholds should be used to maximize the transparency of the identified relationships among the sectors operating within the real- world economy. In the light of the above-mentioned difficulties, it is ob- vious that an effort should be undertaken to develop another approach which, being aimed at identifying clusters of sectors within the economy, distinguishes between linkages of sectors belonging to the same cluster, relationships between two sectors arising from different clusters, and the linkages of sectors within clusters with the rest of the sectors classified as outside the clusters. Defining the problem, it should be stated that the main shortcoming of the dm stems from its inability to classify sectors which may poten- tially be assigned to more than one cluster. Without a rule for grading linkage strength, in the case when a given sector has relationships with two different clusters, the dm blurs the economy's structure by joining these clusters into one. Hence, the method proposed by us, the mdm, should be first and foremost provided with an operational principle which enables the alternative allocations of a given sector to be ranked in respect to the strength of inter-industry linkages associated with each of them. Seeking a solution to this problem, it becomes apparent immediately that the application by the dm simple categorization of inter-industrial relationships, according to their magnitudes, as significant or not is no longer valid, and that the significant relations among industries have to be further broken down if the proposed method is to give any advantages to the researcher who is irritated by the necessity to choose the thresh- old k in such a way so as to maximize the number of clusters uncovered without any, even the slightest, guide as to how it should be reached in practice. We would gain little, if anything, by setting leaning a division of the significant - in terms of magnitude - inter-industrial relationships on the absolute value principle since this amounts to introducing two unattached thresholds, instead of the one acting under the original diag- onalization method, which would be hardly able to avoid the tendency to amalgamate different clusters into one with a increasing. What we propose in this paper is to look at a given connection from the perspec- tive of co-relative industries. Each inter-sectoral relationship expressing the actual flow of goods and services which takes place between two in- volved industries can be considered, at least, from two different view- points, namely, the side of industry that sells its output and the side of industry that purchases it. Whereas in the case of the former, a question we have to answer concerns an issue as to what is the position of the flows in the hierarchy of the selling industry. In the case of the latter, we will be rather concerned with the degree to which that same flow is crucial for the purchasing industry. Suppose that the element located in the ith row and the jth column of the matrix Z, i. e. zij, is deemed as significant, no matter what the term 'significant' exactly means here. Then the ith sector is a supplier (seller) whilst the jth sector represents the demand side (buyer). Note that under the dm, such sectors connected by the element zij would be automati- cally considered as composing the same cluster. But now, such a conclu- sion will be valid if, and only if, in the ith row there is no element that would be greater than the element placed on the jth column and, at the same time, in the jth column there is no element exceeding those located in the ith row. This means that here we focus our attention on the rel- ative position of the flow at hand, on the list of deliveries of the selling industry as well as on the list of purchases of the buying industry. Only if the considered flow is ranked first in terms of both the lists, will the situation be equivalent to those presupposed by the dm, and the corre- sponding relationship will be referred to as internal or primary (from a given cluster's point of view). On the other hand, if at least one requisite mentioned above is violated, i. e. if there is a larger flow in the ith row or in the jth column than the element zij, then such a relationship, its significance remaining, will be termed as external or secondary. Before going on to outline the procedure of the suggested approach, which is based on a distinction made between internal and external (or primary and secondary) inter-industrial relationships, it is necessary to point out one, rather practical, issue. Consider a situation in which zij and zji are both significant and zij is larger than zji. It might well happen that zij would be categorized as a secondary relation and zji as a primary one since there is nothing that would guarantee that both the relation- ships will be simultaneously considered as internal or external and this may occur only by chance. In order to prevent such an ambiguous find- ing, it is necessary to introduce the additional conventional principle that with zij and zji being significant, only that relationship is subject to fur- ther consideration which is larger, i. e. zij in our simple example. The proposed approach draws upon several elements that have been utilized under the original diagonalization method. The first step con- sists of creating the restriction matrix (Q), but unlike its predecessor which yields a binary matrix, a three-value-coding is used here to al- low for discerning between external and internal linkages. Assuming the matrix of deliveries (Z) is to be chosen as a basis for calculation, which however does not impair its generality, we can express the restriction matrix as: qkl = 2 if akl > q(A, 1 - a) A bkl > q(B, 1 - a) AZkl > Zkj > AZkl > zu, Vi + k, j + l . (5) 1 if aki > q(A, 1 - a) A bki > q(B, 1 - a) 0 otherwise Note that all the input-output matrices involved in (5), i. e. A, B, and also Z in our example, should be at first prepared in such a way that the elements located on their main diagonal are all set to zero. This is a prerequisite for any further calculation, because without neutralizing the effect of the main-diagonal elements of the respective matrices, the method might produce erroneous results. A reason for this stems not only from the fact that an inclusion of the main-diagonal elements in the calculation of the quantiles in (5) may lead to overestimation of their values, but large entries placed on the main diagonal of the matrix used as a base, are able to change what we will regard as an internal relation or external relation. The second step is almost identical to the corresponding step under the dm with one exception. Starting with the restriction matrix and us- ing Hoen's algorithm (see Hoen 2002) we then try to transform it into a block-diagonal matrix with respect to the internal inter-industrial rela- tionships only (the entries with a '2' digit) and allow the elements repre- senting the external relations (with a '1' digit) to change their positions freely with row-wise and columnwise permutations that are necessary to complete Hoen's algorithm. In other words, while transforming the re- striction matrix into a block-diagonal one, we treat all the entries which, according to (5), are equal to one as if they are set to zero, but without losing information concerning their positions once the algorithm is ter- minated. The transformed restricted matrix can be interpreted in terms of ex- ternal and internal relationships as follows. Each block, as discovered by Hoen's algorithm, represents a single cluster of industries among which only internal relationships occur. The elements of the restriction ma- trix pertaining to external relationships indicate, therefore, either an inter-cluster linkage or connection of a given cluster with the rest of the economy that are composed of the industries not being assigned to any cluster. What do we gain from using the suggested method? For instance, if a certain sector is significantly connected with two other sectors belong- ing to different clusters, it will be assigned to the cluster whose linkage is stronger. Information about the existence of a significant, although rela- tively weaker linkage with the sector belonging to other cluster is not lost, however, because such a linkage is automatically classified as an external one. In this way the proposed method prevents an unreasonable joining of clusters without omitting significant relationships among the clusters (or between the clusters and the rest of the economy) and, therefore, of- fers a better insight into the real structure of the economy. These benefits would, however, appear illusory, if the method does worse in terms of other properties such as, for example, its soundness in the selection of a basic matrix for calculation, or the ratio of average flows among the sectors included within clusters to the analogous value for sectors out- side the clusters. Therefore, in the following sections we empirically test the properties of the suggested method and compare the obtained results with those produced by its predecessor, i. e. the dm. Data Description When illustrating the mdm's capacity for uncovering the cluster struc- ture as compared with that of the dm, a sufficiently high level of disag- gregation of the input-output tables used as the basis for computation is of prime importance. Although the initial number of distinct sectors within an economy differs significantly across empirical investigations concerning the problem of identifying clusters of industries, it can be found that such tables almost always distinguish no less than a hundred sectors (see e. g. Hauknes 1998). In order to find out the relative perfor- mance of the method discussed above, we therefore used several national input-output tables for different dates which deal with at least one hun- dred different sectors and which, of course, were available to us. To be more precise, the tables from three countries were engaged in the sam- ple, namely the us tables for various dates, the Danish tables for various dates, and the uk tables for 1995. In order to keep the presentation of re- sults short, and also because the main conclusions about the promising nature of the mdm remain basically the same irrespective of which coun- try and which date are selected, we decided not to present all outcomes but only those being obtained by using the uk input-output tables for 1995. Other results are available from the authors on request. The above-mentioned tables, derived directly from the National Statis- tics, are evaluated at current prices from the seller's point of view (basic prices) and are fully consistent with the European System of Accounts 1995 (esa 95). The original statistics provide coverage of the economy as a whole combined with 138 industries/products using the Standard Industrial Classification 1992 (sic 92). The last fifteen entries of the re- spective tables, however, arose from dividing some industries/products associated with Government and non-profit institutions serving house- holds into market and non-market components. Doing so, the tables give table 1 Industry classification 1 Agriculture 32 Pulp, paper and paperboard 2 Forestry 33 Paper and paperboard products 3 Fishing 34 Printing and publishing 4 Coal extraction 35 Coke ovens, refined petroleum and nuclear fuel 5 Oil and gas extraction 36 Industrial gases and dyes 6 Metal ores extraction 37 Inorganic chemicals 7 Other mining and quarrying 38 Organic chemicals 8 Meat processing 39 Fertilisers 9 Fish and fruit processing 40 Plastics and synthetic resins etc 10 Oils and fats 41 Pesticides 11 Dairy products 42 Paints, varnishes, printing ink etc 12 Grain milling and starch 43 Pharmaceuticals 13 Animal feed 44 Soap and toilet preparations 14 Bread, biscuits, etc 45 Other chemical products 15 Sugar 46 Man-made fibres 16 Confectionery 47 Rubber products 17 Other food products 48 Plastic products 18 Alcoholic beverages 49 Glass and glass products 19 Soft drinks and mineral waters 50 Ceramic goods 20 Tobacco products 51 Structural clay products 21 Textile fibres 52 Cement, lime and plaster 22 Textile weaving 53 Articles of concrete, stone etc 23 Textile finishing 54 Iron and steel 24 Made-up textiles 55 Non-ferrous metals 25 Carpets and rugs 56 Metal castings 26 Other textiles 57 Structural metal products 27 Knitted goods 58 Metal boilers and radiators 28 Wearing apparel and fur products 59 Metal forging, pressing, etc 29 Leather goods 60 Cutlery, tools etc 30 Footwear 61 Other metal products 31 Wood and wood products 62 Mechanical power equipment Continued on the next page a better insight into the inter-industrial relationships taking account of the differences in proportions of inputs when an industry's products are sold on market principles as opposed to the case where market mecha- nisms do not apply. Taking into account the purpose of our investigation, however, we decide not to distinguish market and non-market compo- nents and to aggregate the tables into 123 industries/products (see table 1 for the list of sectors), avoiding in this way the zero-row or zero-column problems which would have to be corrected if we did not reduce the number of sectors to the above-mentioned 123. The original statistics enable the analysis to be carried out either on commodity-by-industry or commodity-by-commodity basis. With table 1 Continued 63 General purpose machinery 64 Agricultural machinery 65 Machine tools 66 Special purpose machinery 67 Weapons and ammunition 68 Domestic appliances nec 69 Office machinery and computers 70 Electric motors and generators etc. 71 Insulated wire and cable 72 Electrical equipment nec 73 Electronic components 74 Transmitters for tv, radio and phone 75 Receivers for tv and radio 76 Medical and precision instruments 77 Motor vehicles 78 Shipbuilding and repair 79 Other transport equipment 80 Aircraft and spacecraft 81 Furniture 82 Jewellery and related products 83 Sports goods and toys 84 Miscellaneous manufacturing nec and recycling 85 Electricity production and distribution 86 Gas distribution 87 Water supply 88 Construction 89 Motor vehicle distribution and repair, automotive fuel retail 90 Wholesale distribution 91 Retail distribution 92 Hotels, catering, pubs etc. 93 Railway transport 94 Other land transport 95 Water transport 96 Air transport 97 Ancillary transport services 98 Postal and courier services 99 Telecommunications 100 Banking and finance 101 Insurance and pension funds 102 Auxiliary financial services 103 Owning and dealing in real estate 104 Letting of dwellings 105 Estate agent activities 106 Renting ofmachinery etc 107 Computer services 108 Research and development 109 Legal activities 110 Accountancy services 111 Market research, management consultancy 112 Architectural activities and technical consultancy 113 Advertising 114 Other business services 115 Public administration and defence 116 Education 117 Health and veterinary services 118 Social work activities 119 Sewage and sanitary services 120 Membership organisations nec 121 Recreational services 122 Other service activities 123 Private households with employed persons commodity-specific technologies being a more persuasive assumption in an input-output framework, the latter form of tables are used in the entire study. It should, however, be stressed that the findings do not dif- fer significantly with respect to both techniques of deriving input-output tables in their quadratic form. Empirical Results Neither of the above-outlined methods of identifying clusters necessi- tates the number of distinct clusters to be specified in advance. Instead, changing the level of significance a gives the same result, that is, the de- sired number of identified clusters is achieved. But comparing the rela- tive performances of the methods at a fixed level of significance may be rather confusing due to the fact that each of these methods is likely to have a different range of a over which reasonable results are produced. Another, and perhaps more revealing, way of illustrating their imple- mentation is by imposing a fixed number of identified clusters without a concern for the level of significance at which it is achieved in the case of each of the methods. Following the latter, we selected the number of clus- ters to be identified by means of each of the methods under consideration (16 clusters were chosen), and then the level of significance was adjusted until the specified number of clusters was reached. While the dm gives the desired number of clusters at a level of significance amounting to 0.5%, the corresponding value for the mdm is equal to 1.32%. Perhaps, it should be here remembered that a (or level of significance at which clus- ters are evaluated) is simply a single complement of quantile order that is calculated on the basis of the matrix elements (see the equations (4) and (5)). Note also that the input-output matrices involved in computa- tion were first adjusted for main-diagonal element effects by imposing a zero diagonal principle (all the corresponding entries were set to zero), so that what constitutes a basis for calculation of the quantiles are only the off-diagonal elements of the respective matrices. Figure 1 depicts the cluster structure of the uk economy for 1995 based on the dm. With the help of this it becomes immediately obvious why as many as 16 clusters need to be identified to compare the alternative ap- proaches. Of the 16 uncovered clusters only 7 are composed of three or more industries. Surprisingly, as will be shown later on, the same pro- portion of so-called mini-clusters, where only one inter-industry rela- tionship constitutes a cluster, applies to the results obtained by means of the mdm. Consideration of such mini-clusters is not of interest from the practitioner's point of view, though they have some informational content. Searching for significant levels at which the problem of mini- clusters does not exist, although theoretically possible, would lead to a worse transparency of results and compromise the overall comparison of alternative methods of identifying clusters, which is the main aim of this study. Preferring to preserve the transparency of our results as far as pos- sible, we decide not to adjust for mini-clusters, but hardly any attention will be given to such clusters due to their minimal economic importance for a practitioner. Despite this, the full results are presented, including mini-clusters, to facilitate for interested readers an insight into the real economic landscape of the economy under study. One important con- sideration which can be drawn from the relative high ratio of mini- to all figure 1 Clusters in the uk economy for 1995 based on the Diagonalization Method identified clusters is that this problem remains unsolved no matter which of the methods is used. The largest cluster (ii) consists of as many as 28 industries which are rather diverse in terms of their activities, ranging from Forestry through Textiles and Distribution (both wholesale and Retail) to Con- figure 2 Clusters in the uk economy for 1995 based on the Modified Diagonalization Method struction and Transportation. The high level of activity diversification in this mega-cluster poses some difficulties when one tries to give it a name. This cluster could be, for example, referred to as the Wood-textile- construction cluster, because all of these kinds of activities are strongly represented within its structure. However, almost everyone will agree with us that this name is rather ungainly. No matter whatever name this cluster is given, it is more important that it, in fact, obscures the actual relationships among industries constituting the cluster structure since some sub-clusters are likely to be sensibly singled out. One such sub-cluster might include, for example, such industries as Forestry (2), Wood and wood products (31) and Furniture (91). Another one might consist of Textile weaving (22), Textile finishing (23) and Other textiles (26). Furthermore, Sugar (15) and Confectionery (16), Cement, lime and plaster (52) and Articles of concrete (53) as well as Plastics and Synthetic resins (40) and Plastic products (48) are other exemplifications of pairs of industries which should be probably considered as sub-clusters of the mega Wood-textile-constriction cluster. Other clusters, excluding the mini-clusters mentioned above, sug- gested by the dm seem to be better defined. The Agro cluster (1) includes such industries as Agriculture (1), Meat processing (8), Dairy products (11), and Animal feed (13), Fertilisers (39) and Pesticides (41) as well. In- cluded in the Energy cluster (iii) are only three industries, namely Oil and gas extraction (5), Coke ovens and refined petroleum (35), and Gas distribution (86). As large as the Agro cluster is the Metal and Machin- ery cluster (iv) with six industries, which is followed by the Connection and Financial cluster (xiv) with four industries: Banking and finance (100), Insurance (101), Auxiliary financial services (102) as well as Postal and courier services (98). Two other clusters identified by the dm are the Paper cluster (xiii) and the Weapons and Shipbuilding cluster (xi) each of which consists of three industries. Figure 2 shows the clusters obtained by means of the mdm. As men- tioned above, the number of clusters identified is the same as with the dm, but one can see that there exist substantial differences in terms of the composition of each of them when comparing the two alternative approaches presented here. Note also that unlike the dm, the mdm pro- vides information about the inter-industry relationships of two kinds. In order to distinguish them, a full line is used to denote inter-cluster rela- tionships among industries, whereas a dotted line means external rela- tionships where out of two intertwined industries one is outside clusters. There are only three such external relationships in the figure. One can see that now the largest is the Metal and Machinery cluster (b) with nine intertwined industries. It is useful to state right at the be- ginning that the results based on the mdm, with the exception of the Energy cluster (c), in which the component industries are exactly the same as those obtained by means of the dm, and despite the same labels (names) in some cases as previously used, the component industries of the respective clusters are completely different under the mdm. The cur- rent Metal and Machinery cluster is composed of such industries among others as Metal forging and pressing (59), Mechanical power equipment (62) and Motor vehicles (77), which were previously classified as belong- ing to cluster iv as well as Iron and steel (54) and Miscellaneous manu- facturing and recycling (84) grouped the first time into the two-element cluster ix. It also deserves to be emphasized that some industries in this cluster, mainly Other metal products (61), General purpose machinery (63), Special purpose machinery (66), and Aircraft and spacecraft (80), were set aside when using the dm. The subsequent Wood and paper cluster (a) is ranked second with re- spect to its size. Out of five industries included within it, three (Forestry (2), Wood and wood products (31), and Furniture (81)) were originally in cluster ii, whereas the following two (Pulp, paper and paperboard (32) and Paper and paperboard products (33)) were previously grouped into cluster xiii. The Textiles cluster (d), on the other hand, is formed par- tially from cluster vii (Textile fibres (21) and Knitted goods (27)) and partially from the mega cluster in which Textile weaving (22) was previ- ously classified. Interestingly, the Transportation and telecommunication cluster (e) is entirely formed by breaking down the mega cluster identified by the dm. One can, however, see that Railway transport (93) and Ancillary trans- port services (97) are here connected via an external bi-directional rela- tionship, and not what is suggested by the dm. Also, in the case of mini- cluster i we find that it emerged from the same mega cluster, but that now there exists an external relation between this cluster and cluster h entirely formed by breaking down the Connection and Financial cluster (xiv). The last two clusters that will be given attention are clusters f and g. The former emerged as a part of the Agro cluster (i), whereas the latter consists of Grain milling and starch (12), and Bread and biscuits (14) be- ing previously grouped into the mega cluster as well as Animal feed (13) which previously belonged to the Agro cluster and is now revealed to be externally interrelated with Agriculture (1) from cluster f. Another way of dealing with externally interrelated clusters is by treat- ing all the individual clusters among which there exist external relation- ships as a single cluster with sub-clusters. Following this approach re- sults in joining the pairs of clusters f and g as well as h and i, previously treated separately. A desirable feature of any method for identifying clusters of indus- table 2 Comparison of the strength of identified relationships under the two alternative methods Diagonalization Method Modified Diag. Method (1) (2) (3) (1) (2) (3) Q1. 0.009 0.000 0.000 0.013 0.000 0.000 Mean 20.336 8.920 5.813 25.339 10.111 7.772 Median 1.721 0.790 0.547 2.141 0.925 0.700 Q3. 102.227 21.292 18.453 102.831 27.020 50.767 Std. deviation 796.375 93.281 96.200 486.315 117.746 744.978 Note that all figures are in £ million. tries is that it should produce the same results independently of which of the alternative matrices is used as a basis for calculations. As men- tioned above, the dm has this feature which becomes immediately ob- vious when taking the manner of forming the R matrix (see (4)) into account. But, it turned out that the mdm also results in always forming the same clusters for intermediate deliveries, input and output coeffi- cient matrices. Only using the Leontief inverse matrix produces different clusters, and sometimes the differences were rather essential. Neverthe- less, this disadvantage still appears to be outweighed by its benefits in uncovering clusters of better transparency when compared with the dm. A more important question, however, is whether both methods perform comparably in terms of the strength of identified relationships among industries within and outside the clusters. To answer this question some helpful descriptive statistics were computed for the intermediate deliver- ies matrix. The results are reported in table 2. Due to the more careful way of assessing the significance of inter- industry relationships applied by the mdm, in that some of them are classified as external ones, it should be expected that the inter-cluster re- lationships will be stronger than those suggested by the dm. As one can see, this anticipation finds full confirmation from the figures in table 2. Mean inter-cluster flow under the dm equals over £20 million, whereas using the latter method it is a further £5 million greater. In addition to this, it turned out that the dispersion around average value measured by standard deviation for the mdm is approximately 60% of that obtained under the former method. On the other hand, for the same reasons we find that the average values of flows and their standard deviations be- tween industries within clusters and outside them, as well as among in- dustries beyond clusters, are both greater in the case of the mdm as com- pared to the dm. Conclusions This paper aims at improving methods designed to identify industry clusters by explicitly distinguishing between an internal and external re- lationship depending on whether two intertwined industries are grouped into the same or different clusters, or whether one of intertwined indus- tries is classified outside the clusters. The most interesting finding of this study is that the mdm appears to produce a resultant cluster structure which is superior in some respects to that of the alternative method. In particular, the cluster structure under the mdm seems to be more trans- parent and more easily interpretable. Furthermore, as our experiment has shown, using this method does not necessarily entail worse perfor- mance in terms of the strength of identified relationships among indus- tries within and outside the clusters. On the contrary, they even appear better. We are, however, aware that the proposed method still leaves unsolved many other problems that one can encounter in investigating the indus- try clusters in a real-world economy. These include, for example, the so- called mini-cluster problem, and some inconvenience rooted in the fact that the choice of a suitable level of significance under the mdm may still be regarded as rather arbitrary. Perhaps further theoretical and empiri- cal efforts in this field will help to overcome the common drawbacks of methods of identifying industry clusters, and contribute to reducing the extent of arbitrary decisions in these kind of analyses. References Cmiel, A., and H. Gurgul. 2002. Application of maximum entropy prin- ciple in key sector analysis. Systems Analysis Modelling Simulation 42:1361-76. Antonelli, C. 1999. The microdynamics of technological change. London: Routledge. Bekele, G. W., and R. Jackson. 2006. Theoretical perspectives on industry clusters. Research Paper 2006-5, West Virginia University. Gurgul, H., and P. Majdosz. 2005. Key sector analysis: A case of the tran- sited Polish economy. Managing Global Transitions 3 (1): 95-111. Hauknes, J. 1998. Norwegian input-output clusters and innovation pat- terns, step report R-15. Hoen, A. 2002. Identifying linkages with a cluster-based methodology. Economic Systems Research 14 (2): 131-45. Hoover, E. M. 1937. Location theory and the shoe and leather industries. Cambridge, ma: Harvard University Press. Hoover, E. M. 1948. The location of economic activity. New York: McGraw Hill. Krugman, P. 1991. Geography and trade. Cambridge, ma: mit Press. Krugman, P., and A. J. Venables. 1996. Integration, specialization, adjust- ment. European Economic Review 40:959-68. Marshall, A. 1890. Principles of economics. London: Macmillan. Martin, R., and P. Sunley. 2003. Deconstructing clusters: chaotic concept or policy panacea? Journal of Economic Geography 3 (1): 5-35. Munroe, D. K., and G. J. D. Hewings. 2000. The role of intraindustry trade in interregional trade in the midwest of the us. Discussion Paper 99- t-7, University of Illinois. Ohlin, B. 1933. Interregional and international trade. Cambridge, ma: Har- vard University Press. Porter, M. E. 1998. On competition. Boston: Harvard Business Review Press.