ANNALES KINESIOLOGIAE • 1 • 2010 • 2 123 THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH Ksenija BOSNAR 1, Franjo PROT 1 1 University of Zagreb, Faculty of Kinesiology, Croatia Corresponding author: Ksenija Bosnar University of Zagreb, Faculty of Kinesiology, Horvaćanski zavoj 15, 10000 Zagreb, Croatia. e-mail: xenia@kif.hr ABSTRACT The usual approach to criterion-related validity is developed under the canonical correlation model and is based on the maximization of the correlation of test results and the chosen criteria. The standard measures of validity are canonical correlation in the case of several test results and criteria, multiple correlation in the case of several test results and one criterion, and bivariate correlation in the case of one test and one cri- terion. In kinesiology, as well as some other disciplines, standard measures of validity are not always appropriate, being sensitive of the value of degrees of freedom. There- fore, the measures of validity based on the maximization of covariance of test results and chosen criteria proposed by Momirović et al. (1983), including robust canonical correlation analysis, robust regression analysis, robust discriminant analysis and re- dundancy analysis, may be more appropriate. The example in favour of this method of validation is presented. Keywords: validity, robust methods, quasi canonical analysis ROBUSTNA ALTERNATIVA STANDARNEMU POSTOPKU TESTA VELJAVNOSTI IZVLEČEK Pristop določanja kriterijske veljavnosti v modelu kanonične korelacijske analize temelji na maksimizaciji korelacije rezultatov testa in zbranih meril. Standardna mera veljavnosti v primeru več testov z več kriteriji je kanonična korelacija, v primeru enega kriterija je mera veljavnosti multipla korelacija, bivariatna korelacija pa je mera ve- review article UDC: 519.2:004 received: 2010-10-02 ANNALES KINESIOLOGIAE • 1 • 2010 • 2 Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 124 ljavnosti v primeru enega prediktorja in enega kriterija. Kot v drugih vedah, tudi v ki- neziologiji standardna mera veljavnosti zaradi občutljivosti števila stopenj prostosti ni vedno primerna. Zato so mere veljavnosti, zasnovane na maksimizaciji kovarianc med rezultati testa in kriteriji, ki jih je predlagal Momirović s sodelavci (1983) in vključujejo robustne kvazikanonične analize kovariance, robustno regresijsko analizo, robustno deskriminantno analizo in analizo prepokrivanja mogoče primernejše. Predstavljen je primer v potrditev opisane metodologije. Ključne besede: veljavnost, robustne metode, kvazi kanonična analiza INTRODUCTION The usual approach to establish any criterion-related validity is developed under a correlation model and is based on the maximization of the correlation of test results and chosen criteria (Gliner & Morgan, 2000). The standard measures of validity are canonical correlations in the case of several test results and criteria, multiple correlation in the case of several test results and one criterion, and bivariate correlation in the case of one test and one criterion. As a measure of reliability, the fi rst canonical correlation is well defi ned as being the maximal correlation between linear combinations of two sets of variables on the given set of data. The properties of canonical correlation are that it is sensitive to the regularity of correlation matrices, outliers, and the difference between the number of entities and the number of variables. Statistically signifi cant canonical correlation can be obtained if only two variables (one variable from each set) have substantial product-moment correlation. Multiple correlation is the special case of canonical correlation and it is sensitive to the same condition related instabilities when there is a relatively small number of entities in relation to the number of variables. The promoters of canonical correlation analysis as a validity technique are in favour to this approach (Galton, 1954; Mekota & Blahus, 1984), but there are some contra statements which points out its inadequacy (Cohen & Cohen 1983, 2010). In kinesiology, as well as some other disciplines, standard measures of validity are not always appropriate. Most often, the problem lies in the small number of participants in the research sample. In some cases, the population is so small that it is impossible to cumulate even the modest number of subjects in the study. For example, try to estimate the predictive validity of the Slovenian translation of an anxiety test in prediction of the success of Formula One drivers, or curlers while they sweep the rock down the ice, or competitors in pole vaulting. Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 ANNALES KINESIOLOGIAE • 1 • 2010 • 2 125 Momirović, Dobrić & Karaman (1983) proposed the method of the analysis of the relationship of two sets of quantitative data based on the maximization of covariances of not necessarily orthogonal linear combinations of two sets of variables. The method is not dependent on the regularity of correlation matrices, and it not so sensitive to outliers. The results are not sensitive to high correlations between a single pair of vari- ables. It is robust to the number of degrees of freedom and could be applied when the research sample is small (Knežević & Momirović, 1996). The same applies to robust linear regression analysis based on the maximization of covariance proposed by Štalec & Momirović (1983). Robust methods proved to be much more convenient in differ- ent analyses of the relationship between two sets of data. Here is the proposition to use robust methods based on the covariance maximization in defi ning criterion-related validity in kinesiology. ALGORITHM Based on the development in the standard canonical correlation approach (Hotel- ling, 1933) and Tucker’s interbattery factor analysis (Tucker, 1958) Momirović et al. (1983) proposed the method of analysis of the relationship of two sets of quantitative data based on the maximization of covariances of not necessarily orthogonal linear combinations of two sets of variables. To facilitate the comparison of both of the meth- ods, a procedure for simultaneous analyses was developed (Bosnar, Prot & Momirović, 1984; Knežević & Momirović, 1996). A synopsys of the algorithm which simultane- ously applies canonical correlation and quasi canonical analysis of covariance is pre- sented (Prot, 2008). SPSS matrix macro program QCCR (Momirović, 1996), an im- plementation of the simultaneous canonical correlation and quasi canonical covariance analysis procedures is used and the main results presented. We have a set of entities, E, measured with two set variables V1 and V2 and as a result we have two matrices Z1 and Z2 in standard normal form (i.e. columns of Z1 and Z2 are centered to 0 normalised to 1, and divided by n-1/2), and unit summation vectors e1 and e2. Z1 = E  V1 | Z1 = (z1)i v ; | i = 1, … n ; v = 1, … , m1 ; | Z1t e1 = 0 ; | diag (Z1t Z1) = I Z2 = E  V2 | Z2 = (z2) i w ; | i = 1, … n ; w = 1, … , m2 ; | Z2t e2 = 0 , |diag (Z2t Z2) = I ANNALES KINESIOLOGIAE • 1 • 2010 • 2 Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 126 The description of the set of entities E on two sets of variables V1 i V2, one of them is set of variables to be validated with respect to another one as a set of criterion variables, in standard normal form. So, then: R11 = Zt1 Z1; R22 = Zt2 Z2; and R12 = Zt1 Z2, are: matrix of correlations within the fi rst set; matrix of correlations within the second set and matrix of correlations between the fi rst and second set, respectively. Canonical correlation analysis (Hotelling, 1933) Z1 x1p = k1p | rp = k1pt k1p = max Z2 x2p = k2p | k1pt k1p = k2pt k2p = dpq | k1pt k2q = 0; p¹ q where p, q = 1,...,m; m = min(m1,m2), a δpq Kronecker symbol. The well known solution (see Anderson, 1984) defi ned as an extremum of: f(x1p, x2p, l1p , l2p ) = x1pt R12 x2p – 1/2 l1p (x1pt R11 x1p - 1) – 1/2 l2p (x2pt R22 x2p – 1) where l1p and l2p are Lagreange multipliers. As a result, we have two sets of canonical variates represented with matrices K1 and K2 in standard normal form for signifi cant canonical roots. Correlation between K1 and K2 are canonical validites. Quasi canonical covariance analysis (Momirović et al., 1983) Let Z1 (n, m1) and Z2 (n, m2) be two centred data matrices, obtained as a description of set E of n entities over two sets V1 and V2 of quantitative, elliptically distributed variables. Maximization of covariances between linear composites of variables belong- ing to the sets V1 and V2 is defi ned, under some constraint, as: Z1 xp = l1p | rp = lpt l 2p  max Z2 yp = l2p | rp > rp+1, p = 1,…, r – 1; | r = min(m1, m2) | xpt xp = ypt yp = dp and can be reduced to the singular value decomposition of matrix C12 Z1t Z2 n-1, defi ned as extremum of f(y1p, y2p, h1p, h2p) = y1pt R12 y2p – 1/2 h1p (y1pt y1p - 1) – 1/2 h2p (y2pt y2p – 1) Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 ANNALES KINESIOLOGIAE • 1 • 2010 • 2 127 where h1p and h2p are Lagrange multipliers, and maximisations of covariance as pointed out by Tucker (1958). As a result, we have two set of canonical variates rep- resented by matrices L1 and L2 in standard normal form for important spectral values. Correlations between K1 and K2, and between L1 and L2 are canonical and quasi canonical validites, respectively, which is the main interest of this paper. The example The comparison of standard measures of validity and measures based on the maxi- mization of the covariance appropriate linear combination of test results and the chosen criteria was performed on the data in an attempt to defi ne the ability of tactical thinking in team sports. Lanc (1967) proposed the measure of tactical thinking ability in team sports composed of four problem tasks from football, handball, basketball and volley- ball. The convergent validation of the concept was done by four intelligence tests by Reuchlin & Valin (1953), measuring spatial, perceptual, verbal and numerical ability. The results, as well as conclusions, were ambiguous and Lanc (1967) lost interest in researching tactical thinking in team sports. The data obtained on the sample of 90 students of physical education were re-analyzed by biorthogonal canonical correlation analysis (Hotelling, 1936) and by canonical analysis of covariance (Momirović et al., 1983). The correlations of variables are in Table 1, and the comparisons of the results of the canonical correlation analysis and canonical covariance analysis are in Tables 2–4. The correlation of variables of two sets are low, starting from zero values to highest r=0.292. Hotelling’s canonical correlation analysis show an insignifi cant fi rst canoni- cal correlation (ρ=0.387), however, an algorithm by Momirović et al. (1983) provided a smaller but statistically signifi cant correlation (ρ=0.291, p=0.005). The sample of 90 subjects was too small to prove the relationships by maximization the correlation of linear composites of two sets of variables, but large enough for obtaining signifi cant correlation when using the algorithm based on maximization the covariance of linear composites of two sets of variables. ANNALES KINESIOLOGIAE • 1 • 2010 • 2 Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 128 Table 1: The correlations of intelligence tests results (spatial, verbal, perceptive and numerical) and results in four problem tasks of tactical thinking (football, handball, basketball and volleyball). Spatial Verbal Percep- tual Numeri- cal Football Handball Basket- ball Volley- ball Spatial 1 Verbal .168 1 Perceptual .462 .135 1 Numerical .337 .277 .522 1 Football .115 .083 -.057 .109 1 Handball .292 .063 .162 .126 .181 1 Basketball .227 .031 .186 .038 .195 .502 1 Volleyball .193 .003 .031 .204 -.007 .437 .351 1 Table 2: The comparison of the results of canonical correlation analysis and canonical covariance analysis: ρ denotes correlation coeffi cient, ρ2 denotes determination coef- fi cient, F-value is the value of F test, df 1 and df 2 are degrees of freedom 1 and 2, respectively, and pF and px are the probabilities of statistical signifi cance of F-test and ρ2 test, respectively; for both analysis n=90. Analysis ρ ρ2 F-value df 1 df 2 pF Canonical covariance .291 .085 8.157 1 88 .005 Wilk’s λ χ2 test df px Canonical correlation .387 .150 .760 23.169 16 .109 There is no doubt that canonical covariance analysis is an appropriate method for samples qualifi ed as small. As proposed by Hošek et al. (1984) it seems reasonable to perform both canonical correlation and canonical covariance analysis whenever inves- tigating the relations of two sets of variables using samples with a modest number of entities. This example shows that only taking into consideration the results of Hotel- ling’s canonical correlation analysis could lead to misinterpretation. Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 ANNALES KINESIOLOGIAE • 1 • 2010 • 2 129 Table 3: The comparison of the results of canonical correlation analysis and canonical covariance analysis: X denotes weights, F denotes structure of the fi rst factor and C denotes structure of the fi rst cross-factor; index cc denotes the results of canonical cor- relation analysis and qc denotes the results of canonical covariance analysis. Tactical thinking in team sports Xcc Xqc Fcc Fqc Ccc Cqc Football .718 .242 .537 .334 .208 .097 Handball .007 .672 .217 .856 .084 .270 Basketball .499 .514 .134 .772 .052 .206 Volleyball .998 .475 .662 .687 .257 .191 Table 4: The comparison of the results of canonical correlation analysis and canonical covariance analysis: X denotes weights, F denotes structure of the fi rst factor and C denotes structure of the fi rst cross-factor; index cc denotes the results of canonical cor- relation analysis and qc denotes the results of canonical covariance analysis. Intelligence tests Xcc Xqc Fcc Fqc Ccc Cqc Spatial .067 .806 .295 .864 .114 .313 Verbal .032 .149 .104 .340 .040 .058 Perceptual .187 .384 .298 .749 .115 .149 Numerical .121 .425 .586 .704 .227 .165 Program Data analysis have been realised by the general macro program QCCR (Momirović, 1997) and for the simultaneous application of canonical correlation analysis and ca- nonical analysis of covariances, more often called as quasi canonical covariance analy- sis. The 767 lines of code macro program QCCR is realized in SPSS Macro language. Canonical correlation analysis is implemented according to Hotelling (1935, 1936) and Cooley and Lohnes (1971), where the signifi cance of canonical correlations is tested according to the procedure proposed by Bartlett (1941). Canonical analysis of covari- ances is implemented according to the method proposed by Momirović et al. (1983). The comparison of results of those two methods is realised by the procedure proposed by Bosnar, Prot & Momirović (1984). This program is a revision of the program QCCR written by Momirović, Dobrić, Bosnar & Prot (1984) in SS language Zakrajšek, Momirović & Štalec (1974). ANNALES KINESIOLOGIAE • 1 • 2010 • 2 Ksenija BOSNAR, Franjo PROT: THE ROBUST ALTERNATIVE TO STANDARD VALIDITY APPROACH ..., 123–130 130 REFERENCES Bosnar, K., Prot, F. & Momirović, K. (1984). Neke relacije između kanoničke i kvazikanoničke korelacijske analize. In K. Momirović, J. Štalec, F. Prot, K. Bosnar, N. Viskić-Štalec, L. Pavičić, & V. Dobrić, Kompjuterski programi za klasifi kaciju, selekciju, programiranje i kontrolu treninga (pp. 5–22). Zagreb: Fakultet za fi zičku kulturu. Cohen, J., & Cohen, P. (1983, 2003). Appendix 4. Set correlation as a General Mul- tivariate Data-analytic Method. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (2nd edition) (pp. 487–518). Hillsdale, New Jersey: Lawrence Erlbaum Associates. Gliner, J. A., & Morgan, G. A. (2000). Research Methods in Applied Settings: An Integrated Approach to Design and Analysis. Mahawah: Lawrence Erlbaum As- sociates. Hotelling, H. (1936). Relations between two sets of variantes. Biometrika, 28, 321– 377. Hošek, A., Bosnar, K., & Prot, F. (1984). Comparison of the results of quasicanonical and canonical correlation analysis in various experimental situations. In V. Lužar, & M. Cvitaš (eds.), Proceedings of International Symposium ‘Computer at the Uni- versity’ (pp. 610.1–610.7). Zagreb: University Computing Centre. Knežević, G., & Momirović, K. (1996). Algorithm and program for analysis of re- lations between canonical correlation analysis and covariance canonical analysis. [Algoritam i program za analizu relacija kanoničke korelacijske analize i kanoničke analize kovarijansi in Serbian]. In P. Kostić (ed.) Merenje u psihologiji, 2, 57–73. Beograd: Institut za kriminološka i sociološka istraživanja. Lanc, M. (1967). Neke relacije između testova kognitivnih funkcija i taktičkih sposob- nosti u sportskim igrama (magistarski rad), Zagreb: Visoka škola za fi zičku kulturu. Mekota, K., & Blahuš, P. (1983). Motoricke testy v telesne vychove. Praha: Statni pedagogicke nakladatelstvo. Momirović, K., Dobrić, V., & Karaman, Ž. (1983). Canonical covariance analysis. In: Proceedings of 5th International Symposium ‘Computer at the University’ (pp. 463–473). Cavtat. Reuchlin, M., & Valin, E. (1953). Tests collectifs du Centre de Recherche BCR. Bul- letin de l’Institut National d’Orientation Professionelle, 9, 1–152. Štalec, J., & Momirović, K. (1983). Some properties of a very simple model for robust regression analysis. Proceedings of International Symposium “Computer at the University” (pp. 453–461). Cavtat. Tucker, L. R. (1958). An Inter-Battery Method of Factor Analysis, Psychometrika, 23(2), 111–136.