Scientific paper Classification of White Varietal Wines Using Chemical Analysis and Sensorial Evaluations Katja [nuderl,1 Jan Mocak,2* Darinka Brodnjak-Voncina3 and Bibiana Sedlackova4 1 Metrology Institute of the Republic of Slovenia, Tkalska 15, SI-3000 Celje, Slovenia 2 University of Ss. Cyril and Methodius, Faculty of Natural Sciences, Nam. J. Herdu 2, SK-91701, Trnava, Slovakia 3 Faculty of Chemistry and Chemical Engineering, Smetanova 17, SI-2000 Maribor, Slovenia 4 Stre~nianska 16, 85105 Bratislava, Slovakia * Corresponding author: E-mail: jan.mocak@ucm.sk Received: 16-07-2008 Abstract The ways of application of multivariate data analysis and ANOVA to classification of white varietal wines are here demonstrated. Wine classification was performed using the following classification criteria: wine variety, year of production, wine producer, and wine quality, as found by sensorial testing (bouquet, colour, and taste). Subjective wine evaluation, made by wine experts, is combined with commonly used chemical and physico-chemical properties, measured in analytical laboratory. Importance of the measured variables was determined by principal component analysis and confirmed by analysis of variance. Linear discriminant analysis enabled not only a very successful wine classification but also prediction of the wine category for unknown samples. The wine categories were set up either by three wine varieties, or two vintages, wine producers; two or three wine categories established by wine quality reflected either total points obtained in sensorial evaluation or the points obtained for a particular quality descriptor like colour, taste and bouquet. Keywords: Multivariate data analysis; Principal component analysis; Discriminant analysis; Feature selection; ANOVA; Sensory analysis 1. Introduction Wine belongs to the commodities, which are very frequent objects of falsification.12 Wine is considered falsified when it has not been made in accordance with a specified method but is presented as a valuable product under an official trademark or when it's declared location of production is not true. Therefore it is necessary to develop procedures which make possible wine classification and authentication, i.e. verification of the selected sample with regard to the wine variety.1-4 In addition; wine classification according to its producer or locality as well as year of production is also frequently demanded. Methods of multivariate (multidimensional) data analysis (MVA) use multidimensional statistics for investigation of relations and interactions inside a large table of data.5'6 They are often employed in analysis of food, natural substances, or environment. For wine classification the MVA methods are especially useful.27-9 Measured or observed wine properties represent variables, which characterize the studied wine sample (generally considered as an object). Each variable can be regarded geometrically as an axis in the multidimensional space defined by all variables. Then each wine sample represents a point in this multidimensional space and its coordinates are given by the corresponding values of the used variables. Performed research has been focused on several possible ways of classification of white varietal wines, based on the results of chemical analysis. Three typical kinds of Slovak white varietal wines, Welsch Riesling, Grüner Veltliner and Chardonnay, were analyzed during two consecutive years using eighteen selected chemical and physico-chemical descriptors (variables), most fre- quently used for wine characterization. In addition, sensorial analysis of all examined wine samples was provided. This study concerns an optimal choice of variables with regard to their effective use for wine classification and employs several methods of multivariate data analysis. Significant differences among the studied wine samples render a satisfactory characterization and classification of wines with respect to (a) variety, (b) vintage, and (c) sen-sorial quality. For this purpose, principal component analysis, discriminant analysis and ANOVA were used as the main chemometrical tools. 2. Experimental 2. 1. Wine Samples Altogether 46 samples of varietal wines, namely Welsch Riesling (22 samples), Grüner Veltliner (18 samples), and Chardonnay (6 samples) of the vintages 1999 (22 samples) and 2000 (24 samples) were analyzed successively. The wines were produced by two Slovak producers, located in Bratislava and Hlohovec. Sampling was made by the Research Institute of Viticulture and Viniculture, Bratislava, Slovakia. Determination of the concentrations of commonly analyzed wine components was used for the chemical characterization of wine samples. All variables used in this study are listed in Table 1. Table 2 represents the averaged concentrations of physical and chemical parameters for each variety in both vintages of wine samples used for statistics. The codes 0, 1 and 2 stated in the "Variety" column denote Veltliner, Riesling and Char-donnay, respectively. of Slovak white wines. Since the wine limpidity was almost equal for the evaluated samples and all evaluators, this particular feature was omitted from the final data table. Consequently, the finally evaluated sensorial criteria were colour, bouquet, taste and total points. In order to obtain two or three wine categories distinguished by sensorial quality, all wine samples were sorted by the given total points and then the median as well as the lower and upper terciles were calculated. According to the median the wine samples were categorized into two groups of 25 better (denoted as "good") and 21 worse ("bad") wines. The use of the terciles (i.e. the percentiles 0.3333 and 0.6667) resulted in three groups of the evaluated wine samples: 17 "good", 15 "medium" and 14 "bad". Unequal class memberships are due to the assignment of the border values to one of the groups. In fact, "good" and "bad" denote first-class and not fully superior sensorial features of the examined wine samples, respectively, and are used as such only as the labels. 2. 3. Analytical Methods All analytical methods were made according to Slovak Technical Standards STN 560216, which conform European Union Council Regulations No. 2679/90 of 17 Sept. 1990 determining methods for the analysis of wines. Iodometric titration methods were applied for determinations of free and total sulphur dioxide as well as for determination of reducing sugars. Potentiometric methods using glass and saturated calomel electrodes were employed for determination of total acidity, volatile acidity and pH; volatile acids were separated from the wine by steam di- Table 1. Investigated characteristics of wine samples representing variables in chemometrical evaluation and their corresponding codes. Code Variable Code Variable Code Variable v1 SO2 free v7 Tartaric acid v13 Ethanol v2 SO2 total v8 Lactic acid v14 Total extract v3 Total acidity v9 Reducing sugars v15 Sugar-free extract v4 Volatile acidity v10 Glucose v16 Ash v5 Citric acid v11 Fructose v17 pH v6 Malic acid v12 Density v18 Polyphenols 2. 2. Sensorial Analysis Sensorial analysis was made by a group of seven and eight wine experts for the vintages 1999 and 2000, respectively. They assessed the following wine properties: colour, limpidity, bouquet, and taste (where also the overall impression was evaluated) using in total a twenty-point scale, expressing the total sum of the acquired points. This overall evaluation was used as the main wine quality descriptor. The maximal number of the points ascribed to colour as well as limpidity was 2.0, that for bouquet was 4.0 and 12.0 for taste. This way of evaluation was commonly used by the Research Institute in the study stillation and then the distillate was titrated by sodium hydroxide solution in a way similar to the total acid determination. A calibrated pycnometer was used for measuring density as well as for the density measurement of the distillate in ethanol determination. The difference of the total extract, finally obtained also by pycnometry, and the determined content of reducing sugars were used for calculating the value of the sugar-free extract. Ash was determined by ignition of the wine extract at 550 °C followed by a gravimetric endpoint. Citric, malic, tartaric and lactic acids were determined enzymatically with a final spec-trophotometric determination. Glucose and fructose were o o 00 1» 1» o 00 lo o on o CN o o CO CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD ^ ^ CD CD CD CD CD ^ CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD c CN C^ E^ ^ 10 CN 00 10 10 00 C^ 0 r- 00 10 C^ CN r- ^ ••iD CO c^ c^ ^ c^ c^ 00 0^ 0 \o 00 C^ 00 0 r- ^ co > CD CD CD CD CD CD CD CD CD CD CD CD CO c^ c^ CD CD on ^ CD on CD CD CD CD CD CD CD CD CD CD CO ^ ^ CO CO on CD CD CD o 00 (N \o CO 0^ 0 0 0^ CO C^ 10 C^ 10 CN CO 0 0^ 10 CN C^ CO c^ 0 0 10 00 0 r- 0 0^ 90 CO 10 C^ c^ 00 00 c^ c^ 00 10 10 C^ on r- on c^ 0 c^ \o 00 CO r- CO 0 CO CN C^ IO o IO o o oor^r^r^r^oor^r^r^r^r^r^oooooooooooor^ooo^o^o^o^o^o-šooooooooo^ o\ o\ o\ o\ o\ o\ o\ o\ o\ o\ ^CNCNCNOOOOOOOO O O O O (N (N (N O ^ CN CO Tf lo ^O determined also enzymatically with a final spectrophoto-metric measurement of the reaction products. Total polyphenols were determined by Folin-Ciocalteu assay with a spectrophotometric endpoint. 2. 4. Statistical Analysis Statistical treatment of the obtained data was performed using program packages SYSTAT 9 (SPSS Inc., Chicago, U.S.A.), STATGRAPHICS Plus 5.0 (Manugistics, Inc., Rockville, U.S.A.), S-PLUS v. 4.0 (Insightful Corp., Seattle, WA, U.S.A.) and Microsoft EXCEL. SYSTAT was used for calculations of linear discriminant analysis and ANOVA. For the latter task, a general linear model (GLM) option was used, enabling various ways of ANOVA, Smir-nov-Kolmogorov test of the data normality and Bonferroni test of means for each possible pair of factors corresponding to the selected classification criteria. Calculations of principal component analysis were performed by STAT-GRAPHICS. The S-PLUS package was exclusively used for bootstrapping, by which 1000 replications were generated for each of the six wine sample groups - given by three wine varieties and two basic classes of wine quality ("good" and "bad"). From the normal distributions, generated for all six groups, seven octiles (0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875 percentiles) were calculated and used as the computer generated wine samples. Altogether 42 test samples were generated in this resampling procedure. MS EXCEL was used for the data preparation, percen-tile calculation and summarization of the results. 3. Results and discussion 3. 1. Principal Component Analysis Principal component analysis (PCA) is a basic way used for characterizing multidimensional data, providing a satisfactory representation of the studied objects by projecting the original data set from the high dimensional space onto the lower dimension space. Often two or three most important principal components (PC's), calculated by linear combination of original variables, sufficiently represent the total variability of the original data. This MVA technique does not need any training set of data (in which the categorization of the objects into the selected classes is known) and represents unsupervised learning.10 In our case, the first two PC's, calculated from all variables, account for 41.7% of the total data variability as shown in Fig. 1, which represents the position of the samples of three wine varieties and two vintages. Even though the data categories are not involved in the PCA calculations, it was practical to mark the wine samples by the category where they belong to so that it might be possible to recognize some natural grouping of the studied wines. This approach was then applied to all considered classification criteria. Scatterplot