Metodolosˇki zvezki, Vol. 2, No. 1, 2005, 115-123 Dimensionality Reduction Methods Luigi D’Ambra1, Pietro Amenta2, and Michele Gallo3 Abstract In case one or more sets of variables are available, the use of dimensional reduction methods could be necessary. In this contest, after a review on the link between the Shrinkage Regression Methods and Dimensional Reduction Methods, authors provide a different multivariate extension of the Garthwaite’s PLS approach (1994) where a simple linear regression coefficients framework could be given for several dimensional reduction methods. 1 Introduction When the number of variables is very large, as well as, in presence of more than one sets of them playing a logical asymmetrical role (explanatory and response variables), it may be advantageous to find for each set a linear combination of variables (latent variables) having some properties in terms of correlation, covariance or variance. The criteria for an appropriate new basis depends, of course, on the application. One way of approaching this problem is to project the data on the maximum data variation subspace, i.e. the subspace spanned by the largest principal components (Principal Component Analysis - PCA). Nevertheless, the study of multivariate predictions could be, also, faced with several approaches, for example, Constrained Principal Component Analysis (CPCA) (D’Ambra and Lauro, 1982). In customer satisfaction evaluation where the relationships between expectations and perceptions are taken in account, an analysis could be developed by looking for the subspace, maximizing the covariance between the projected scores of both sets. This subspace provides the largest singular values of the covariance matrix between expectation and perception data (D’Ambra et al., 1999). Finally, when the goal is to predict a dependent variable as well as possible in terms of least square error, an appropriate model is Reduced Rank Regression (RRR). In general, when the goal is to predict more dependent variables by substituting the set of observed explanatory variables with a fewer sequence of orthogonal latent variables, Dimensional Reduction Methods (DRM) should be applied. The commonly used DRM methods are Principal Component Regression (PCR), Canonical Correlation Regression (CCR), RRR and Partial Least Squares (PLS; Wold, 1966). These methods, together with the shrinkage ones, play an important role in order to overcome the collinearity problem. 1 Department of Mathematics and Statistics, University of Naples “Federico II”, Italy; dambra@unina.it 2 Department of Analysis of Economic and Social Systems, University of Sannio, Italy; amenta@unisannio.it 3Department of Social Science, University of Naples “L’Orientale”, Italy; mgallo@iuo.it 116 Luigi D’Ambra, Pietro Amenta, and Michele Gallo The paper is organized into 5 sections. In Section 2 the basic notation is given. Section 3 briefly presents the linkage between the shrinkage regression methods and the dimensional reduction methods. In this Section we also propose an extension of the Principal Covariates Regression (de Jong and Kiers, 1992) in order to find a continuum among the DRM method. Main focus of this paper is in Section 4. Following the Garthwaites’ PLS approach (1994), we show how a simple linear regression coefficients framework could be given for the considered DRM methods. Last Section includes some conclusive remarks on the methodology proposed, as well as topics for further research. 2 Notation Let Q1,..., Qk,..., Qk be K response variable groups observed on N statistical units and collected in a matrix Y* = [Y1|.. .| Yk |.. .| YK~\, of order ( N, J^k=1 Qk), where Y(NxQ ),. . . , Y(NxQ ),...,Y(KNxQ ) are K different matrices. The k-th matrix with generic element yikq (i = 1,..., N; q = 1,..., Qk) denotes the value of the q-th criteria variable observed on the i-th statistical unit for the k-th response variable groups. Moreover, let X(Nx J) be a matrix of independent variables with rank (X) = S < min (N, J). The generic element xij (i = 1,..., N; j = 1,..., J) is the value of the j-th independent variable observed on the same i-th statistical unit. In this paper we assume that all variables have zero mean as regards the weight diagonal metric D whose general term is 1/N. Let PX = X(XTX)-1XT orthogonal projector onto the sub space spanned by the columns of X with XT the transpose of matrix X. Finally, let T(S) be an orthogonal matrix of order (N x S) containing S latent variables so as to obtain the fitted response matrix by Y(S) = T(S)(T(TS)T(S))-1T(TS)Y = XB(S) withLS = XTT(S-1)(T(TS-1)XXT T(S-1))-1T(TS-1)X . Let denote X the standardized X matrix. 3 Shrinkage regression and dimensional reduction methods for multivariate analysis In literature many shrinkage regression methods have been proposed. PCR, PLS, RRR and Continuum Regression (CR) are only some among the most famous ones (Stone and Brooks, 1990; Frank and Friedman, 1993; Brown, 1993; Brooks and Stone, 1994). These methods should be used when a large singular value is associated to two or more independent variables with ”large” variance decomposition portions. These variables may determine collinearity problems with unrealistic and shaky ordinary least square coefficients bOLS = (XTX)-1XTyk (k = 1,..., K, q = 1,..., Qk). An approach to solve the collinearity problem consists in replacing the factor (XTX)-1 in expression of bOLS with a better-conditioned matrix G. In the PCR, the matrix G is given from a spectral decomposition of XTX: XTX = J^s=1 ?jvjvjT where S < min (N, J) is the rank of X. Differently, PLS looks for a vector c (||c|| = 1) such that the scalar product yTXc is maximal and b oc c. This leads to consider the predictor bPLS oc XTyk replacing (XTX)-1 with a better-conditioned matrix G oc Ip. Finally, Hoerl (1962) and Hoerl and Kennard (1970) recommend the use of the ridge regression Dimensionality Reduction Methods 117 with bRR = (XTX + ?Ip) 1XTyk and ? > 0. In Table 1 all the conditioned matrices for the different techniques are given. Table 1: Several conditioned matrices G. General solution b = GXT yk OLS PCR PLS RRR Y Conditioned matrix G G=(XTX)-1 G = Ej?-1vjvjT GocIp Goc(XTX + ?Ip)-1 G = A Predictor b (XTX)-1XTykq (Ej?-j 1vjvjT)XTyqk *XTykq (XTX + ?Ip)-1XTykq AXTykq When there is only one dependent variable (yk for each k = q = 1) the OLS, PLS and PCR could be considerated like a particular case of the CR (Stone and Brooke, 1990). The coefficient b is determined by simple regression of y on a one dimensional Xc, where the coefficient vector c is chosen by maximising different criteria: the squared correlation coefficient r2(y, Xc), the covariance Cov(y, Xc) and the variance Var(Xc), respectively. Stone and Brooke (1990) suggest a general principle to determine the coefficient vector c, for a fixed continuum solution parameter ? > 0. The coefficient c is obtained by the maximization of T (?, c) = (yTXc)2 \Xc\ oc r2(y,Xc) \Xc\ subject to the con- strain ||c|| = 1. Where for ? = 0, ? = 1 and ? —> 00we have the continuum solution among OLS, PLS and PCR, respectively. Many of these shrinkage regression methods can be seen in a more general multivariate framework based on a common objective function for the DRMs (Abraham and Merola, 2001). All the DMRs objective functions are measures of association between couples of unit norm latent variables, which are linear combinations of the dependent variables (uj = Ykdj) and of the independent ones (tj = Xaj). These measures are expressed in term of squared covariance between the latent variables t j and uj as well as their variance, respectively (Table 2). When XTX is almost singular, it is possible to highlight that the “PCR smooth” criteria of this matrix can be used in other approaches obtaining mixed DRMs. In same time the “PCR smooth” criteria can be obtained by mixed DRMs approaches (i.e. in CPCA we can obtain as solution matrix YkTX( -1T Tk one). (Ej vjvjT)XTYk which is equivalent to the PCR Table 2: Objective functions of the DRMs. Method Object function Solution matrix PCA max(a TjXTXaj) XTX CCR max[(aTjXTYkdj)2/(\\tjr\\ujr)] (XTX)-1XTYk(YkTYk)-1YkTX RRR max[(aTjXTYkdj)2/\\tjr] (XTX)-1XTYkYkTX CPCA* max(djYTPXYdj) YkTX(XTX)-1XTYk SIMPLS max(aTjXTYkdj)2 (I-Lj)-1XTYkYkTX *withthe constraints ajaj = djdj = 1, aTjXTXai = 0, j > i. 118 Luigi D’Ambra, Pietro Amenta, and Michele Gallo 3.1 A different approach to Principal Covariates Regression In literature there is a trade-off between the RRR and the PCR aims: the former tries to maximize the variance of the criterion variables retained by the predictors latent subspace while the latter tries to maximize only the variance of the predictors with PLS considered as a compromise. A similar continuum can be obtained with an extension of the Principal Covariates Regression (PCovR) or “Weighted maximum overall redundancy” (de Jong and Kiers, 1992; Abraham and Merola, 2001). In order to find a low-dimensional sub-space of the predictor space spanned by the columns of X accounting for the maximum variation of X and Yk, we propose to consider the model ( T = XW \ X = TZX + EX (3.1) { Yk = TZYk + EYk where T contains scores on S components, W is the J × S matrix of component weights with ZX and ZYk loading matrices, of order (S × J) and (S × Qk), containing the regression parameters that relate the predictors and the response variables to the components in T, respectively. Following de Jong and Kiers (1992), we propose to maximise the following least-squares loss function ? \\X - TZXf + µ \\XTYk - ZXZYk\\2 + (1 - ? - µ) \\Yk - TZYk\\2 (3.2) with TTT = I and TTEX = TTEYk = 0. The least-squares solutions are given by the first S eigenvectors of matrix ?XXT + (1 - ? - µ)YkYkT + µXXTYkYkT if X spans the complete space and T contains scores on all components with Yk = X(XTX)~XTYk. W may be computed by regression of T on X, if XTX has not full rank, otherwise, with W = X~T where X~ is any generalized inverse of X. We introduce two parameters (? and µ), both varying between 0 and 1, so that µ tells how much the model is PLS like and (1 - ? - µ) determines its Multiple Linear Regression (MLR) nature. We highlight some special cases: • for ? = 0 and µ = 0 if S = min[rank(X), rank(Yk)] than the solution leads to MLR, with an emphasis on fitting Yk, otherwise to RRR if S < min[rank(X),rank(Yk)] • for ? = 1 and µ = 0 the solution puts an emphasis on reconstructing X with a PCA of X or with PCR if we use the principal components as predictors for Yk; • for ? = 0 and µ = 1 the solution leads to Partial Least Squares of X and Yk; • finally, for µ = 0 and for any admissible value for ?, we have the original PCovR solution. In case of ? = 1/2, the authors find a compromise situation comparable to PLS regression (de Jong and Kiers, 1992). Dimensionality Reduction Methods 119 4 Simple Linear Regression Coefficients approachto DRM In order to investigate the dependence structure between X and the Yk, we define the matrix Y* = IY1 |.. .| Yk |.. .| YK of order ( N, J^k=1 Q k ) . The generic q-th column of the k-th matrix Yk is given by yk = J^j=1 fjxj bk where yk is given by the weighted sum of simple linear regression considering slope coefficient b jq = fj xTx xTj yqk with weights fj and intercept equal to zero. For this weight Garthwaite (1994) suggests fj = 1/J or fj = xjxj according to different weighting policies. Matrix Y* = \Y1 |...| Yk |...| YK can be also expressed as Y* = XFB = __J 2-^j=1 fjPxj Y ? with MX = diag(xT1x1 x xj), F = diag(f1,..., fJ), B = MX1XT*Y and Pxj = xj (xTx x The dependence structure between X and Y ?, in a best approximation subspace, could be displayed on the principal axe ts so as KQk min ts ^2^2^2\\fjPxjyq j = 1 k=1 q=1 -fjtstTs Pxj y j q (4.1) subject to constraints tT t = 1 and tT' t = 0 for s' = s. This leads us to the extraction of the eigenvalues ?s and eigenvectors ts associated to the eigen-system Y*Y*Tts = ?sts. Table 3: Special cases of the proposed approach. (t) First solution. Variance Criteria Covariance Criteria • PCA( ^*) is equivalent Multiple PCR(X) (t) MCOA (Y1,..., YK) ^ COA (Y1,..., YK,Y*) (t) OMCOA-PLS( Y1,...,YK,Y*) • Cov (Y*a,Y*b) is equivalent to PLS (Y*,X) PLS(X, Y*)with X metric equal to M •T,K 1Cov2Y*a,Ykd k) is equivalent to OMCOA–PLS The analysis of Yk and X, based on the above mentioned criteria, lead to well known techniques and interesting properties (Table 3), where MCOA stands for Multiple Coiner-tia Analysis (Chessel and Hanafi, 1996); COA stands for Concordance Analysis (Lafosse and Hanafi, 1997); OMCOA stands for Orthogonal Multiple Coinertia Analysis (Vivien, 1999), and finally OMCOA-PLS is the acronym for Orthogonal Multiple Coinertia Analysis - Partial Least Squares (Vivien and Sabatier, 2000). This approach highlights an equivalence between the variance and covariance criteria in Table 3. Moreover, this can be also showed following two different approaches: the former (B matrix approach) is based on the matrix B of regression coefficients. An uncen-tred PCA on matrix B is equivalent to PLS (Y*,X) as well as the uncentred PCA on B' J 120 Luigi D’Ambra, Pietro Amenta, and Michele Gallo leads to COA, OMCOA, OMCOA-PLS, Multiblock-PLS (Wangen and Kowalski, 1988) and Generalized Constraint Principal Component Analysis (Generalized CPCA; Amenta and D’Ambra, 2001). The latter (Crossed regression approach) can be performed by using the (Y2k=1 Qk) x J simple linear regressions of each generic q-th column of the k-th matrix Yk against each xj (D’Ambra et al., 1998, 2001). We can write ^k Qk matrices XBg (g = 1,..., ^k Qk) with Bg diagonal matrix containing the J weighted regression coefficients bgj. In order to analyze the common structure of these ^kQk matrices we consider the MCOA approach with generic metric Mg. Briefly, MCOA is a technique that enables the simultaneous analysis of Z tables. According to the Z subsets of pg variables (g = 1,..., Z), MCOA considers Z statistical triplets: (Xg, Mg, D) with Mg positive defined symmetrical matrix (metric) and Xg of dimensions (pg x pg) and (N x pg), respectively. It optimizes the variance within each table and the correlation between the scores of each individual table and synthetic scores providing a reference structure. MCOA first searches for a set of Mg-normalized ug vectors, maximizing the projected variance of Xg on ug and an auxiliary D-normalized vector v(1), maximizing the projected variance of XgT on v(1), such that the squared covari- ance between them is optimized, max J^g=1?g (XgMgug \v(1)) , where ?g represents a V / D weight assigned to each Xg. This weight can be uniform, the inverse of global inertia or the inverse of the greatest eigenvalue of each table. The first order solutions ug s and v(1) are given by a PCA of the weighted table ^^ = [? 1 X 1\...\? z X z ] according to the eigen decomposition of the matrix X(1)QX(1) with Q = diag(M1,..., MZ). In similar way, for the solution of order 2, MCOA searches for Mg-normalized ug vectors and an auxiliary D-normalized vector v(2) by using the same optimization criterion with the additional orthogonal constraints ug Mgug = v(1)TDv(2) = 0. Solutions of order 2 are given by the first order PCA solution of the juxtaposed residual matrix [X1 - X1P1( \...\XZ - XZPZ ]withPg the Mg orthogonal projection operator onto the subspace spanned by the vector ug . The successive solutions are found in similar way. By applying the MCOA approach to the Z = ^k Qk matrices (g = 1,..., ^k Qk), first order solutions u g s and v(1) are then given by a PCA of the weighted table X(1) = [? XB1\...\?y Q XB ^ Qk] = XM with M = [?1 B1\...\?y Q B ^ Qk]. The first order solutions are given by the eigen decomposition of the ^^QX(1)T = XMQMTXT matrix with Q = diag(M1,..., M ^ k Qk). Solutions of order 2 are given by the first order PCA solution of the juxtaposed residual matrix (1)T (1)T X(2) = [XB1 - XB1P1(1)T \...\XBj2kQk - XBj2 -XBEQkP Q ] Qk - XBJ2kQkPY,kQk . We remark that if M g = I then the first solution of PCA of X(1) is equivalent to the same solution of a PCA of matrix X with diagonal metric containing the weighted sums of the explained variances by each xj. If Mg = diag(1/ygTyg) and fj = xjxj then this approach is equivalent to a PCA on the matrix X with diagonal metric (J^ ?gBgMgBgT) Dimensionality Reduction Methods 121 of the weighted sums of the coefficients of determination r2g: X(^ ?gBgMgBgT)XT. We highlight that this approach can be considered as an asymmetrical extension of MCOA of K response variable groups Yk (k = 1,..., K) respect to a set of predictive variables X. Moreover, the weighted sum of the explained variances by eachxj can be used as weight within the Garthwaite’s univariate approach as well as within the Multiple Coin-ertia Analysis. In this sense, it is interesting to note the role played by the coefficient regression bgj within the different proposed approaches as well as it’s easy to show that all the proposals are linked by transition formula. Obviously, this approach works also with a single dependent variable y as well as with a single group of variable (K =1). This proposal provides a suitable conditioned matrices G within the shrinkage regression methods too (see Table 1). The approach based on the yk as sum of orthogonal projections onto single rank subspaces spanned by the xj ’s, leads also to consider the covariance between the xj ’s and the yk’s. In this case, we have cov(X, Y*) = AXTY* where A is a matrix of order (J x J) whose general element is the weighted paired regression coefficient among the x'js: aj,j> = fjcov(xj,xj>)/var(xj>), (j,jr = 1,..., J). If we refer to the q-th column of Yk, we obtained the predictor bY = AXTyk. In this way we can consider the matrix Aas an alternative conditioned matrix for collinearity problem in Table 1. We remark that this approach tries to get back the relationships among the predictor variables which are loosed in simple linear regression. 5 Conclusions The main aim of this paper is to find the linkage between several multidimensional techniques like MCOA, PLS, OMCOA-PLS, COA, OMCOA, Multiblock - PLS and Generalized CPCA, within a simple linear regression framework. At the same time new methodological proposals are done. These results are particularly important when the matrix of explicative variables has a rank lower than min (N, J) that could lead to problems of stability. Another advantage of this approach is that it can be performed without specialized software. An extension of this framework, to several matrices of explicative and dependent variables, will appear in a next paper. An extension to categorical variables is also under investigation. Acknowledgements The present paper is financially supported by the Cofin04 fund (responsible prof. L. D’Ambra) and Cofin04 (responsible prof. P. Amenta). References [1] Abraham, B. and Merola, G. (2001): Dimensionality reduction approach to multivariate prediction. In Esposito Vinzi V. et al. (Eds.): PLS and related methods, CISIA, 3-17. 122 Luigi D’Ambra, Pietro Amenta, and Michele Gallo [2] Amenta, P. and D’Ambra, L. (2001): Generalized constrained principal component analysis. In Borra, S., Rocci, R., Vichi, M., and Schader, M. (Eds.): Advances in Classification and Data Analysis, Springer, 137-144. [3] Brooks, R. and Stone, M. (1994): Joint continuum regression for multiple predic-tands, JASA, 89, 1374-1377. [4] Brown, P.J. (1993): Measurement, Regression and Calibration. Oxford: Oxford Univ. Press. [5] Chessel, D. and Hanafi, M. (1996): Analyse de la co-inertie de K nuages de points. Revue Statistique Applique´ e, XLIV, 35-60. [6] D’Ambra, L. and Lauro, N.C. (1982): Analisi in componenti principali in rapporto ad un sottospazio di riferimento, Rivista di Statistica Applicata, 15, 1-25. [7] D’Ambra, L., Amenta, P., Rubinacci, F., Gallo, M., and Sarnacchiaro, P. (1999): Multidimensional Statistical Methods based on Co-Inertia for the Customer Sat-isfaction Evaluation. International Quality Conference IQC, 8-11/12/99, Bangkok, Thailandia. [8] D’Ambra, L., Sabatier, R., and Amenta, P. (1998): Analisi fattoriale delle matrici a tre vie: sintesi e nuovi approcci. (Invited lecture) Atti XXXIX Riunione SIS. Sor-rento. [9] D’Ambra, L., Sabatier, R. and Amenta, P. (2001): Three way factorial analysis: synthesis and new approaches. Italian Journal of Applied Statistics, 13, 101-117. [10] de Jong, S. and Kiers, H.A.L. (1992): Principal covariates regression. Part I. Theory, Chemometrics and Intelligent Laboratory Systems, 14, 155-164. [11] Frank, I.R. and Friedman, J.H. (1993): A statistical view of some chemometrics regression tools. Technometrics, 35, 109-148. [12] Garthwaite, P.H. (1994): An interpretation of partial least squares, JASA, 89, 122-127. [13] Hoerl, A.E. (1962): Application of ridge analysis to regression problems, Chemical Engineering Progress, 58, 54-59. [14] Hoerl, A.E. and Kennard, R.W. (1970): Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55-67. [15] Lafosse, R. and Hanafi, M. (1997): Concordance d’un tableau avec K tableaux: d e´finition de K+1 uples synthe´tiques. Revue de Statistique Applique´ e, 45, 4, 111-126. [16] Stone, M. and Brooks, R.J. (1990): Continuum regression: cross validated sequen-tially constructed prediction embracing ordinary least squares and principal compo-nent regression. J. Royal Stat. Soc., B, 52, 237-269. Dimensionality Reduction Methods 123 [17] Vivien, M. (1999): Nouvelles approches en Analyse Multitableaux. Me´moire de stage de DEA, Universite´ Montpellier II. [18] Vivien, M. and Sabatier R. (2000): Une extension multi-tableaux de la re´gression PLS, Revue Statistique Applique´ e, 49, 31-54. [19] Wangen, L.E. and Kowalski, B.R. (1988): A multiblock partial least squares al-gorithm for investigating complex chemical system, Journal of Chemometrics, 3, 3-20. [20] Wold, H. (1966): Estimation of principal components and related models by iterative least squares. In Krishnaiah, P.R. (Ed.): Multivariate Analysis, NY: Ac. Press.