Metodoloski zvezki, Vol. 11, No. 1, 2014, 1-20 Symbolic Covariance Matrix for Interval-valued Variables and its Application to Principal Component Analysis: a Case Study Katarina Kosmelj,1 Jennifer Le-Rademacher2 and Lynne Billard3 Abstract In the last two decades, principal component analysis (PCA) was extended to interval-valued data; several adaptations of the classical approach are known from the literature. Our approach is based on the symbolic covariance matrix Cov for the interval-valued variables proposed by Billard (2008). Its crucial advantage, when compared to other approaches, is that it fully utilizes all the information in the data. The symbolic covariance matrix can be decomposed into a within part CovW and a between part CovB. We propose a further insight into the PCA results: the proportion of variance explained due to the within information and the proportion of variance explained due to the between information can be calculated. Additionally, we suggest PCA on CovB and CovW to be done to obtain deeper insights into the data under study. In the case study presented, the information gain when performing PCA on the intervals instead of the interval midpoints (conditionally the means) is about 45%. It turns out that, for these data, the uniformity assumption over intervals does not hold and so analysis of the data represented by histogram-valued variables is suggested. 1 Introduction 1.1 Principal component analysis for classical data Principal component analysis (PCA) was first described by Pearson (1901) as an analogue of the principal axes theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. It is a very popular exploratory tool in classical multivariate data (see e.g., Chatfield and Collins, 1980; Johnson and Wichern, 2002). Its major objective is to reduce the dimension of the variable space: the original p random variables X = (X^X2, ...,Xp) are transformed into s random variables Y = (Y1, Y2, ...,Ys), called Principal Components, where s ^ p, and the Y variables are uncorrelated. This transformation is defined in such a way that the first principal component (PC1) accounts for as much of the variability, i.e., variance, in the data as 1 Biotechnical Faculty, University of Ljubljana, Slovenia; katarina.kosmelj@bf.uni-lj.si 2 Medical College of Wisconsin, Milwaukee, USA; jlerade@mcw.edu 3 University of Georgia, Athens, USA; lynne@stat.uga.edu 2 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard possible, and each succeeding component in turn has the highest variance possible, under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components. The solution of the problem described above is given by the eigenvalues and eigenvectors of the covariance matrix of Xi,X2,...,Xp. Principal components are linear combinations of the original variables, defined by the eigenvectors of this covariance matrix. From the basic linear algebra it follows: there are p eigenvalues ordered: Ai > A2 > ... > Ap > 0; eigenvalues of the covariance matrix are the variances of the principal components. The eigenvalues add up to the sum of the diagonal elements, i.e., to the trace of the covariance matrix. This means that the sum of the variances of the principal components is equal to the sum of the variances of the original variables. The i-th principal component accounts for Ai/ YJj=1 Aj of the total variance in the original data. When the decision on the reduced dimension s is taken, we calculate the proportion of variance accounted for by the first s principal components, ^j=1 A j /Y^jj=\ Aj. As the covariance on standardized variables equals the correlation, therefore, in this case, eigenvalues and eigenvectors of the correlation matrix are used. It is recommended to perform PCA on standardized variables when the original variables are measured on scales with different ranges. 1.2 Principal component analysis for symbolic data In the second part of the 20th century, the need to analyze massive datasets emerged. Symbolic data analysis started as a response to that demand; see Bock and Diday (2000), Billard and Diday (2003, 2006), among others. Symbolic analytical methods are often generalizations of their classical approach counterparts. A symbolic method should give the same results as its classical counterpart when applied to classical data (Billard, 2011, Le-Rademacher and Billard, 2012). In the last two decades, PCA was adapted for symbolic data, first in the context of interval-valued data. A number of approaches were proposed. Le-Rademacher and Billard (2012) give a short overview of its historical development; let us review them briefly. Cazes et al. (1997) proposed the first adaptations of PCA known as the centers method and the vertices method, see also Douzal-Chouakria et al. (2011); Zuccolotto (2007) applied the vertices method to a dataset on job satisfaction; Lauro and Palumbo (2000) introduced a Boolean matrix to account for the interdependency of the vertices, Palumbo and Lauro (2003) and Lauro et al. (2008) proposed the midpoint-radii method treating interval midpoints and interval midranges as two separate variables; Gioia and Lauro (2006) proposed a PCA version based on an interval algebra approach. Le-Rademacher and Billard (2012) describe these approaches in detail and discuss their characteristics in the context of symbolic data analysis: namely, these approaches fail in different ways to utilize the entire information included in the interval-valued data. These deficiencies can be avoided when the symbolic covariance matrix Cov is used. Its calculation in the interval setting was first presented in Billard (2008). The crucial advantage of this symbolic covariance matrix is that it fully utilizes all the information in the data; also it is shown that the symbolic covariance matrix can be decomposed into a within part CovW and a between part CovB. Two papers on this topic (Le-Rademacher and Billard, 2012 and Billard and Le-Rademacher, 2012) also provide a new approach to constructing the observations in PC space allowing for a better visualization of the Symbolic Covariance Matrix for Interval-Valued Data 3 results. Le-Rademacher and Billard (2013a) propose an approach to construct histogram values from the principal components of interval-valued observations. Le-Rademacher (2008) and and Le-Rademacher and Billard (2013b) extend these ideas to histogram-valued observations. In a different direction, Giordani and Kiers (2006) consider fuzzy data, which is a different domain from symbolic data and so is outside the purview of the present work. Likewise, a different domain is the PCA of time series data of Irpino (2006). 1.3 Objective of this study We want to compare PCA results obtained on different data types. To enable the comparison of the results, the data were aggregated from the same dataset. For each observation and each variable, we aggregated the data in two different ways: • the mean value; • the [min, max] interval which is based on the minimal and maximal value under observation. The main objective of this study is to find out what is the information gain when analyzing the [min, max] interval instead of the mean value. In the next section, some well known characteristics of interval-valued data are summarized. Covariance in the interval setting will be illustrated and compared to the covari-ance in the classical setting. For PCA on interval-valued variables, a simple measure of the information gain will be introduced and additional PCA analyses will be suggested. These approaches allow for a deeper insight into the dataset under study. The third section presents a simple case study. It consists of seven meteorological stations in Slovenia, they are described by seven variables, the data are from the 40 year-period 1971-2010. The results of different PCA analyses will be presented and compared. To facilitate the comparison of the results, the dataset is very small, however, the stations are chosen according to subject-matter knowledge. The last section gives some conclusions and suggestions for further work. 2 Interval-valued variables Let us first note that an interval-valued random variable is just a standard random variable but its values are intervals. Let X = (Xi,X2, ...,Xp) be ap-dimensional random variable taking values in Rp. Let Xj be an interval-valued random variable, its data exist for a random sample of size n and is in the form Xj = [aj, bj], a^ < b^-, i = 1,..., n. In the case aj = bj, for any i = 1, 2,..., n, Xj has a classical value. Each observation described by a p-dimensional interval-valued variable can be visualized as a hypercube in Rp. 4 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard 2.1 Mean and variance The mean and the variance for an interval-valued variable are based on the assumption that the distribution of the values within each interval is uniform. They were first defined by Bertrand and Goupil (2000). The sample variance of Xj is: 1 S2 = (4 + ajjbij + bj) - X,2, (2.1) i=1 where the sample mean Xj is the average of the interval midpoints _ 1 n Xj = 2^ + (2'2) i=1 Billard (2008) showed that (2.1) can be rewritten as 1 n _ _ _ _ S2 = 3n E[(Oij - Xj)2 + (ajj - Xj)(bjj - Xj) + (bjj - Xj)2], (2.3) i=1 and proved that the Total Sum of Squares SST can be decomposed into a within part SSW and a between part SSB : nSj = SSTj = SSWj + SSBj. (2.4) The Within Sum of Squares SSW measures the internal variation and can be expressed as follows: SSWj = 1 £ [(Oij - )2 + (o; - )(bij - ) + (bij - )2] i=l (bij - Oij)2 ^ 12 ' i=l (2.5) Thus, as expected, SSW is based on an implicit assumption that the distribution of values within each observed interval is uniform, Xij ~ U(aij, bij), i = 1, 2,..., n. Other distributions are also relevant; e.g., Billard (2008) presents the formulae for SSW and SST when observations within each interval follow a triangular distribution. The Between Sum of Squares SSB describes the between variation, i.e., the variation of the interval midpoints: n . b SSBj = £( - Xj)2, (2.6) j=i and is independent of the distribution within the intervals. Symbolic Covariance Matrix for Interval-Valued Data 5 2.2 Covariance Let Xj1 and Xj2 be two interval-valued random variables with pairwise observations: Xj1 = [aij1 ,bij1 ] and Xj2 = [aij2, bij2] on a random sample of size n. The following holds: aij < bij, for j = ji,j2, and i = 1, 2, ...,n. Total Sum of Products SPT is decomposed into two components, the Sum of Products Within, SPW, and the Sum of Products Between, SPB; it is connected to the covariance Cov: nCj = SPTjj = SPj + SPj. (2.7) The Sum of Products Within SPW and Sum of Products Between SPB are related to CovW and CovB, respectively, which are expressed as follows: CovWj = SSWj = 1 V (bj1 - aj1 )(bj - aj2), (2.8) j1j2 n n ^ 12 ' i= 1 Covj = = n £ (^^ - Xji)(^^ - Xj2). (2.9) n n 2 2 i= 1 It may be interesting to notice that the entries of the CovW matrix are always positive, their magnitudes depend on the ranges, Rij = bij — aij, j = j\, j2; the greater the ranges of the two variables the greater is the entry of CovW. It should be pointed out that Cov W is not a true covariance matrix on the ranges; the terms for the true covariance matrix on the ranges would be (Rij1 — Rj1 )(Rij2 — Rj2). However, the CovW matrix incorporates information on the size of the rectangles. The entries of CovB are classical covariances (divided by n not by n — 1) on the interval midpoints. When, instead of the intervals [a, b], PCA is performed on the interval midpoints: [(a + b)/2, (a + b)/2], CovW is zero and Cov = CovB; in this case, the symbolic PCA results are the same as for a classical PCA on the interval midpoints. Billard (2008) showed that the covariance between two interval-valued variables Xj1 and Xj2 can be calculated directly, using the following expression: 1 n CoV7h2 — [2 (aijl - )(aij2 - Xj2 ) + (ai7l - Xjl )(bh2 - X32 ) i= 1 + (bihi - Xji)(aih2 - Xj2) + 2 (bihi - Xji)(bih2 - Xj2)] (2.10) Two special cases are easily checked: a) covariance of two identical variables equals its variance; b) covariance of two classical variables equals the well known classical co-variance. Figure 1 gives some insight into the calculation of the covariance in the classical and interval setting. Covariance in the classical setting is based on the position of the points, in the interval setting it is based on the rectangles: the location of the midpoints determines the between part, the size of the rectangles determines the within part, which 6 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard 0 CM - 0 O - V O CD O to o IS O X A O o A Bilje Crnom + o CN + X o V Ljubl Marib MurSo Porto o - IS Ratec 0 50 100 150 D.Cloud o cd Figure 1: Calculation of the covariance in the classical setting (upper part) is based on the position of the points; in the interval setting (lower part) it is based on the rectangles: position of the midpoints and size of the rectangles determine its value. Symbolic Covariance Matrix for Interval-Valued Data 7 is always positive. Covariance is calculated on the number of clear days (D.Clear) and the number of cloudy days (D.Cloud) for seven meteorological stations (for details, see next section). Figure 1 (upper part) illustrates the classical covariance, the position of the points suggests that the covariance is negative, the same would be expected from the subject-matter knowledge; the obtained value is Cov(D.Cloud, D.Clear) = -212.0. However, the covariance in the interval setting is positive, Cov (D.Cloud, D.Clear) = +202.2. This is due to a large within interval component CovW(D.Cloud, D.Clear) = 411.7, the between component is the same as classical covariance on the midpoints, CovB(D.Cloud, D.Clear) = -212.0; see Figure 1 (lower part). 2.3 Principal component analysis in the context of interval-valued data A crucial advantage of the symbolic covariance matrix Cov is that it fully utilizes all the information in the data. It can be decomposed into a within part CovW and a between part CovB. This decomposition allows for a deeper insight into the PCA results from the traces of these matrices. Since the trace of a matrix is a linear operator, the following holds: tr(Cov) = tr(CovW) + tr(CovB). (2.11) Hence, we can assess the proportion of variance explained due to the within information and the proportion of variance explained due to the between information. The information gain when performing PCA on the intervals instead of the interval midpoints (conditionally the means) is due to the within information. Additional PCA analysis can be done on CovB, these results are equivalent to the classical PCA results on the interval midpoints. A PCA analysis can also be performed on CovW; the interpretation of these results may enlighten some of the aspects of the within information. 3 A case study We consider yearly data from the period 1971-2010 in Slovenia, data were collected by Slovenian Environment Agency (http://meteo.arso.gov.si/met/sl/archive/), and are shown in the Appendix. The following variables are taken into account: number of cold days (D.Cold), number of warm days (D.Warm), number of days with storms (D.Storm), number of days with precipitations (D.Prec), number of days with snow cover (D.SnCov), number of clear days (D.Clear), and number of cloudy days (D.Cloud). According to meteorological definitions, for a cold day the minimal daily air temperature is below 0 0C, for a warm day the maximal daily temperature is above 25 0C; a clear day has under 20% of cloudiness, a cloudy day has over 80%. Hence, D.Cold and D.Warm are based on the same variable, i.e., air temperature, the same holds for D.Clear and D.Cloud which are based on cloudiness. For illustrative simplicity, only seven meteorological stations are chosen for this case study. They are: Bilje (Bilje), Crnomelj (Crnom), Ljubljana (Ljubl), Maribor (Marib), 8 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard Figure 2: Geographical position of seven meteorological stations under study: Bilje (Bilje), (Črnomelj (Crnom), Ljubljana (Ljubl), Maribor (Marib), Murska Sobota (MurSo), Portorož-airport (Porto), and Ratece (Ratec); elevation (in meters) is pinned to each station. Murska Sobota (MurSo), PortoroZ-airport (Porto), and Ratece (Ratec). Their location is shown in Figure 2. PortoroZ-airport is situated at sea level (elevation 2 m), Ratece is in the Alps (elevation 864 m), the other stations have elevation from 55 m to 299 m. The dataset is slightly incomplete: data for PortoroZ-airport started in 1975, for Bilje, Crnomelj, Maribor and Murska Sobota data for some years are inconsistently missing. As already stated, we want to compare PCA results obtained on different data types which were aggregated from the same dataset. For each station and each variable, we aggregated the data in two different ways: the mean value and the [min, max] interval which is based on the minimal and maximal values in the period under observation. 3.1 PCA on the Means In Table 1, the classical covariance matrix calculated on the means is presented; the sum of variances (3891.8) is given below the matrix. Dominant variances are as follows: Var(D.SnCov) = 1449.4, Var(D.Cold) = 1284.6; dominant covariances are: Cov(D.SnCov, D.Cold) = 1232.5 (positive), Cov(D.SnCov, D.Warm) = -679.3 and Cov(D.Warm, D.Cold) = -558.2 (negative). In Table 2, the PCA results are given. The first two principal components explain about 92% of total variance, the first three around 97%. The loads for the first three principal components are also presented; we shall interpret the first two principal components only. For the first principal component (PC1) D.Cold and D.SnCov are dominant, for the second principal component (PC2) D.Clear and D.Cloud show up. We can deduce that PC1 is positively correlated with low air temperature and PC2 with the surplus of cloudy Symbolic Covariance Matrix for Interval-Valued Data 9 Table 1: Covariance matrix calculated on the means. The sum of the variances (in the table in bold) is given below the matrix. D.Cold D.Warm D.Storm D.Prec D.SnCov D.Clear D.Cloud D.Cold 1284.6 -558.2 -182.0 308.1 1232.5 -262.3 204.8 D.Warm -558.2 356.7 45.1 -96.1 -679.3 70.1 -28.3 D.Storm -182.0 45.1 47.3 -29.3 -109.4 51.4 -29.5 D.Prec 308.1 -96.1 -29.3 212.9 337.8 -148.6 195.5 D.SnCov 1232.5 -679.3 -109.4 337.8 1449.4 -216.3 177.6 D.Clear -262.3 70.1 51.4 -148.6 -216.3 296.8 -212.0 D.Cloud 204.8 -28.3 -29.5 195.5 177.6 -212.0 244.1 Sum of variances = 3891.8 Table 2: PCA on the means, results for the first three principal components: cumulative percentage of variance explained, principal component loads (dominant loads are in bold). PCi PC2 PC3 Cum.% of var. exp. 79.2 92.2 96.7 D.Cold 0.625 0.014 0.661 D.Warm -0.307 0.274 0.192 D.Storm -0.071 -0.057 -0.391 D.Prec 0.172 0.385 -0.322 D.SnCov 0.670 -0.218 -0.473 D.Clear -0.138 -0.598 -0.090 D.Cloud 0.113 0.607 -0.193 over clear days. Figure 3 presents the seven stations in the space of PC1 by PC2. There is a positive trend with low air temperature along PCi: Portorož-airport reveals few days with low air temperature and snow cover, Ratece the opposite. This is consistent with the fact that Portorož-airport is located near the Adriatic sea, Ratece is located in the Alps. There is a positive trend in the surplus of cloudy over clear days along PC2; here, Portorož-airport has the lowest surplus (it has more clear than cloudy days), Ljubljana and Črnomelj have the highest (here, there are more cloudy than clear days). 3.2 Symbolic PCA on interval-valued variables 3.2.1 Symbolic covariance matrix and its decomposition The symbolic covariance matrix Cov for the intervals is given in Table 3; also shown is the decomposition into CovB and CovW. The term CovB is identical to the classical covariance matrix on the interval midpoints. Values of CovW reflect the internal variability and are all positive. Consequently, the terms in Cov are always larger than the corresponding terms in CovB; thus, there are fewer negative terms in Cov than in CovB. 10 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard LjUbl Crtom B'je MuTrSo Ma'rib Potto Ra'ec 0 50 100 150 200 PC, Figure 3: PCA on the means: presentation of seven stations in two-dimensional space of PC1 by PC2; 92.2% of total variance is explained. PC1 reflects positive impact of low air temperature, PC2 reflects positive impact of surplus of cloudy over clear days. The sum of symbolic variances is 5754.2, the between component explains 3170 (55.1%), and the remaining 2563.2 (44.9%) is due to the within component. In this case, we can conclude that the gain in information, when we analyze the intervals instead of the interval midpoints, is large, it is nearly 45%. Let us find out the corresponding impact on the PCA results. 3.2.2 PCA on symbolic covariance matrix Table 4 shows the PCA results based on the symbolic covariance matrix. The first two principal components explain 86.4% of variance, the first three 95.1%. For PC\, the loads for D.Cold and D.SnCov are dominant, for PC2 the dominant loads are D.Warm and D.Clear (positive), for PC3 D.Clear (negative). Hence, the PCi is positively correlated with low air temperature, as in the PCA on the means; however, other results are different: PC2 is positively correlated with D.Warm and D. Clear, PC3 is negatively correlated with D.Clear. Visualisation of these PCA results in two-dimensional space is based on the approach presented in Le-Rademacher and Billard (2012). For each station, a 7-dimensional poly-tope is obtained. Figure 4 (upper plot) presents the projection of these polytopes onto the PCi by PC2 plane. Considerable overlapping is presented. The plot shows that the variability in PC1 (D.Cold and D.SnCov) is dominant, for Rateče it is the greatest; however, variability in PC2 (D.Warm and D.Clear) is comparable for all stations. Only two pairs of stations do not overlap: PortoroZ-airport and Ratece, Bilje and Ratece. The polytopes for two extreme stations, PortoroZ-airport and Ratece, are presented on Figure 4 (lower plot). Symbolic Covariance Matrix for Interval-Valued Data 11 Table 3: Covariance matrix Cov for interval-valued variables, variances are in bold; it is decomposed into CovB and CovW (below). The respective sum of variances is presented below the corresponding matrix. Cov D.Cold D.Warm D.Storm D.Prec D.SnCov D.Clear D.Cloud D.Cold 1363.7 -74.1 30.8 537.4 1313.6 105.0 630.2 D.Warm -74.1 720.3 224.4 262.1 -53.4 399.3 407.2 D.Storm 30.8 224.4 153.9 146.1 127.6 265.9 191.4 D.Prec 537.4 262.1 146.1 443.2 615.6 163.6 557.6 D.SnCov 1313.6 -53.4 127.6 615.6 1595.7 187.5 750.6 D.Clear 105.0 399.3 265.9 163.6 187.5 724.6 202.2 D.Cloud 630.2 407.2 191.4 557.6 750.6 202.2 752.9 Sum of variances = 5754.2 CovB D.Cold D.Warm D.Storm D.Prec D.SnCov D.Clear D.Cloud D.Cold 1056.0 -435.8 -144.9 258.0 941.1 -233.8 251.5 D.Warm -435.8 286.8 19.2 -67.0 -482.5 8.8 -39.2 D.Storm -144.9 19.2 46.3 -18.4 -68.8 60.1 -24.8 D.Prec 258.0 -67.0 -18.4 183.1 282.0 -148.7 209.8 D.SnCov 941.1 -482.5 -68.8 282.0 997.7 -196.6 270.7 D.Clear -233.8 8.8 60.1 -148.7 -196.6 322.3 -209.5 D.Cloud 251.5 -39.2 -24.8 209.8 270.7 -209.5 278.9 Sum of between variances = 3171.0 CovW D.Cold D.Warm D.Storm D.Prec D.SnCov D.Clear D.Cloud D.Cold 307.7 361.7 175.8 279.4 372.5 338.8 378.7 D.Warm 361.7 433.5 205.2 329.1 429.2 390.5 446.5 D.Storm 175.8 205.2 107.6 164.5 196.3 205.9 216.1 D.Prec 279.4 329.1 164.5 260.2 333.6 312.2 347.8 D.SnCov 372.5 429.2 196.3 333.6 598.0 384.1 479.9 D.Clear 338.8 390.5 205.9 312.2 384.1 402.2 411.7 D.Cloud 378.7 446.5 216.1 347.8 479.9 411.7 474.1 Sum of within variances = 2583.2 12 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard Table 4: PCA on the intervals, results for the first three principal components: cumulative percentage of variance explained, principal component loads (dominant loads are in bold). PCi PC2 PC3 Cum.% of var. exp. 62.0 86.4 95.1 D.Cold 0.569 -0.277 -0.102 D.Warm 0.081 0.663 0.279 D.Storm 0.079 0.261 -0.119 D.Prec 0.309 0.172 0.271 D.SnCov 0.636 -0.220 -0.180 D.Clear 0.127 0.512 -0.767 D.Cloud 0.384 0.275 0.452 From these plots, it is observed that the internal variability for Ratece is greater than it is for PortoroZ-airport. 3.2.3 PCA on CovB and CovW We proceed with PCA on CovB, this is identical to the classical PCA on the interval midpoints, the results are in Table 5 (left) and are plotted in Figure 5 (upper plot); they are consistent with the PCA results on the means. Since CovW depicts the within interval information, PCA on CovW allows an insight into the variability within the interval variables, see Table 5 (right) and Figure 5 (lower plot). In this case, the PC explains 93.2%, the PC2 explains additional 5.4%. The loads for PC1 for all variables have similar magnitude, while for PC2 the dominant load is D.SnCov; accordingly, PC1 is positively related to all the variables, PC2 is positively related to D.SnCov. The scores are calculated using the midpoints. The stations are located along the diagonal, from PortoroZ-airport at the lower end to Ratece at the upper end, revealing the increase of interval variability from the lower to the upper end. This result is consistent with the fact that PortoroZ-airport has tighter intervals, Ratece has larger intervals. 3.2.4 Programs used Algorithms for deriving the PCA results on the symbolic covariance matrix along with the corresponding polytops are available at Le-Rademacher and Billard (2012, Supplementary material - online version). Their R script (R Core Team, 2013) was upgraded with PCA on CovW and CovB and adapted for our case-study. 3.3 Other PCA approaches for interval-valued variables Other PCA approaches on interval data are described in the literature. As stated before, Le-Rademacher and Billard (2012) give a detailed insight into these methods. We shall limit ourselves to only some of them. Symbolic Covariance Matrix for Interval-Valued Data 13 Figure 4: Projection of 7-dimensional polytopes onto 2-dimensional space of PCi by PC2 , upper plot: for all 7 stations; lower plot: for PortoroZ-airport and Ratece. PCi explains 62.0% of variance, it reflects the positive impact of D.Cold and D.SnCov; PC2 explains 24.4% of variance, it reflects the positive impact of D.Warm and D.Clear. 14 Katarina Kosmelj, Jennifer Le-Rademacher and Lynne Billard 0 T - 0