Scientific paper Orthogonal Projection to Latent Structures Combined With Artificial Neural Networks in Non-destructive Analysis of Ebastine Powder Fawzia Ahmed Ibrahim and Mary Elias Kamel Wahba* Department of Analytical Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura, 35516, Egypt * Corresponding author: E-mail: marywahba@ymail.com Received: 31-05-2013 Abstract A new method orthogonal projection to latent structures (O-PLS) combined with artificial neural networks is investigated for non-destructive determination of ebastine powder via near-infrared (NIR) spectroscopy. The modern NIR spectroscopy is efficient, simple and non-destructive technique, which has been used in chemical analysis in diverse fields. Being a preprocessing method, O-PLS provides a way to remove systematic variation from an input data set X not correlated to the response set Y, and does not disturb the correlation between X and Y. In this paper, O-PLS pretreated spectral data was applied to establish the ANN model of ebastine powder, in this model, the concentration of ebastine as the active component was determined. The degree of approximation was employed as the selective criterion of the optimum network parameters. In order to compare the OPLS-ANN model, the calibration models that use first-derivative and second-derivative preprocessing spectra were also designed. Experimental results showed that the OPLS-ANN model was the best. Keywords: Orthogonal projection to latent structures; artificial neural networks; near-infrared spectroscopy; ebastine; degree of approximation 1. Introduction Near-infrared spectral analytical technique for quantitative and qualitative analysis is finding wide application in as different fields as agriculture, food, chemical industry,1-6 and especially in pharmaceutical industry,7-10 mainly due to its advantages over other analytical techniques, such as being expeditious, without destruction, low cost, being adaptable for almost all kinds of samples in all states, and with little or no sample preparation. Frequently the objective with this characterization is to determine the concentrations of different components in the samples. Compared to conventional analytical method, NIR spec-troscopy not only attracts the attention of researchers in pharmaceutical industry, but also draws more attention of researchers in other research and exploitation areas with its unparalleled advantages. However, NIR spectra often contain serious systematic variation that is unrelated to the responses Y, and the analyte of interest absorbs only in small parts of the spectral region. For solid samples this systematic variation is mainly caused by light scattering and differences in spectroscopic path length. Furthermore, the baseline and slope variations may often constitute the major part of the variation of the sample spectra. The variation in X that is unrelated to Y may disturb the multiva-riate modeling and cause imprecise predictions for new samples and also affect the robustness of the model over time. So the first step of a multivariate calibration based on NIR spectra is often to preprocess the data. Preprocessing methods commonly used for NIR spectral data include smoothing, derivation, multiplicative signal correction (MSC) and standard normal variate (SNV). These signal corrections are different cases of filtering, practical effect of the first derivative is that it removes an additive baseline. The second derivative removes also a multiplicative baseline. But the drawback of using derivatives is the inevitable change of the shape of the spectra. SNV and MSC remove both additive and multiplicative baseline variation without altering the shape of the spectra. These methods, however, require a spectral region which is less dependent on chemical information.11 Being a generally applicable preprocessing and filtering method, O-PLS provides a way to remove systematic variation from an input data set X not correlated to the response set Y;12 In other words, to remove variability in X that is orthogonal to Y, and does not disturb the correlation between X and Y. By applying this method, model complexity has been reduced and the ability of prediction has been preserved, effective improvement of interpretational ability of both correlated and non-correlated variation in the NIR spectra, and no time-consuming internal iteration is present, making it very fast to calculate. In quantitative analysis, artificial neural networks are more and more widely applied during the past several years.13-20 The main advantage of ANNs is their anti-noise and robust non-linear transfer ability. In proper model, ANNs result in lower calibration errors and prediction errors. They are an alternative to model non-linear data sets when the more classical multivariate calibration methods fail. In this work, a method for expeditious, non-destructive analysis of ebastine (Fig. 1) as active component in ebastine powder has been developed by using O-PLS method combined with artificial neural networks. After the NIR spectra were acquired, O-PLS was applied to remove the non-correlated systematic variation, thus enhancing the chemical information in the spectra. The filtered data ho and Portho are the score matrix and loading matrix of orthogonal components, respectively, so the filtered data can be obtained by remove TorthoPTortho from X. Therefore, the calculation of TorthoPTorttw is the main step of O-PLS method. The O-PLS preprocessing method with a single response set y is described as following: 1. Optionally transform, center and scale the raw data to give the matrices X and y. 2. Calculation of the parameters w, t, p, u and c with the normal NIPALS method for single y.21 wT = YtX / YtY w = w / ||w|| t = Xw / wTw cT = tTy / tTt u = yc / cTc pT = tTX / tTt (1) (2) (3) (4) (5) (6) Where w represents the weight vector of X, t is the score vector of X, p is the loading vector of X, u is the score vector of y and c is the loading vector of y. 3. Calculation of weight, score and loading vector of the orthogonal variation. Figure 1. Structural formula of ebastine Wonho = P - [wTP / ^ w Wortho = Wortho/ ||wortho|| ^ortho XWortho / W ortho Wortho Portho X ^ orthJ ^ortho ^ ortho (7) (8) (9) (10) then were used as the input data during the establishment of ANN model, in this model, the concentration of ebastine was determined. Subsequently, the calibration models that use first-derivative and second-derivative preprocessing spectra were designed to compare with the OPLS-ANN model. Of all the optimal models, OPLS-ANN model shows the best result. Where wortho represents the weight vector of orthogonal variation, tortho is the score vector of orthogonal variation and portho is the loading vector of orthogonal variation. 4. Calculation of the residual matrix and save found parameters. E = X-t p ortho ortho ortho (11) 2. Theory 2. 1. O-PLS Preprocessing Method In order to simplified interpretation of the data, O-PLS method uses input data set X and the response set Y to filter and remove variation in X not correlated to Y, and O-PLS does not show any degradation of results compared to non-treated data. For an input data set X, TorthoPTortho represents the matrix of orthogonal components, where Tort_ Eortho represents the residual matrix. For saving of found parameters, set T = [T t ] P = ortho ortho ortho ortho = [P'orthJPortho] Wortho = [WorthoWortho] (12) Where Tortho is score matrix of the orthogonal components, Portho is the loading matrix of the orthogonal components and Wortho is the weight matrix of the orthogonal components. 5. For additional orthogonal components, return to step 2 and set X = £ortho, otherwise continue to next step. 6. Get the filtered data. F — y T P1 ^ o- PLS~A~l ortho r ortho P (13) ortho ortho is the matrix of orthogonal components 7. Filtered method for unknown input data set Xnew of the new samples. e = X -X W (PT W )-1PT (14) new new new ortho 1 ortho ortho' ortho 1 ' After preprocessing with O-PLS method, the filtered data EO-PLS does not contain any variation that is orthogonal to y, so the stability of the model has been greatly improved. 2. 2. Artificial Neural Networks The current interest in artificial neural networks is largely due to their ability to mimic natural intelligence in its learning from experience.22 They learn from examples by constructing an input-output mapping without explicit derivation of the model equation. Artificial neural networks are parallel computational devices consisting of groups of highly interconnected processing elements called neurons. Neural networks are characterized by topology, computational characteristics of their elements, and training rules. Traditional neural networks have neurons arranged in a series of layers. The first layer is termed the input layer, and each of its neurons receives information from the exterior, corresponding to one of the independent variables used as inputs. The last layer is the output layer, and its neurons handle the output from the network. The layers of neurons between the input and output layers are called hidden layers. Each layer may make its independent computations and may pass the results yet to another layer. In feed-forward neural networks the connections among neurons are directed upwards, i.e. connections are not allowed among the neurons of the same layer or the preceding layer. Networks where neurons are connected to themselves, with neurons in the same layer or neurons from a preceding layer, are termed feedback or recurrent networks. Of all the ANNs, the back-propagation algorithm is perhaps the most widely used supervised training algorithm for multilayered feed-forward networks.23 A feed-forward phase is initially performed on an input pattern to calculate the net error, then, the algorithm uses this computed output error to change the weight values in the backward direction. The error is slowly propagated backward through the hidden layers. Therefore, every layer is fully linked to the succeeding layer and the outputs from the hidden layer act as the inputs for the output layer. Figure 2 shows the outline of the O-PLS method combined with artificial neural networks. 2. 3. Evaluation of Artificial Neural Networks The present criterion of optimization of the network is to minimize the error of the training set or the monitoring set. However, it is very easy to choose an over fitting model, namely, when the test set error is less than the precision of the reference method and this may or may not be at the minimum of the test set error. This kind of network is unsteady when it is used to predict an unknown sample. These unsteady factors are usually due to excessive number of iterations. To avoid these two kinds of situations, a new evaluation criterion of the network, the degree of approximation, is employed.24 The definition of this criterion is given by Equations (15) and (16) ea = (ni /n)ei + (n/n)ec + \ei'ec\ (15) Where ea is the error of the approximation, e1 and ec are the relative standard errors of training set and monito- Figure 2. Outline of O-PLS combined with artificial neural network. ring set, n1 and nc are the sample numbers of training set and monitoring set, n is the whole number of known samples, and n1/n and nc/n are the weights contributed to the error of approximation (ea) by training set and monitoring set. D = c/e„ (16) Where Da represents the degree of approximation and c is a constant number by which Da is adjusted to get a good chart. It is very obvious that the smaller ea, the larger Da can obtain the better ANNs models, which are approaching to the real data. Therefore, the effects of both training set and monitoring set are considered in this evaluation criterion. The predictive abilities of training set, monitoring set and test set were compared in terms of the relative standard error (RSE),25,26 defined as x 100 (17) Where n is the number of samples included in the validation set, CREF and CNIR are concentrations of samples provided by the State Drug Standard method and the NIR method, respectively. 3. Experimental 3. 1. Apparatus and Software All of the NIR diffuse reflectance spectra were measured with a ShimadzuR UV-vis-NIR-3100 spectrophotometer equipped with ISR-3101 integrating sphere. Data were transferred to a microcomputer through a RS-232C interface. The extended deltabar-delta back-propagation training routines contained in the Neural Works Explorer software package were used. Near-infrared spectral analysis software, from spectrophotometer, enables recording of spectra and their mathematical processing of derivation. The preprocessing software of O-PLS method was designed in our laboratory. 3. 2. Preparation of Samples All of the pharmaceutical raw materials, including ebastine as active component and starch as main excipient were supplied by Meivo Pharmaceutical Company (Cairo, Egypt). Laboratory samples were prepared from overdosing to underdosing production samples. In this way, samples containing ebastine at three different concentration levels (viz., the nominal content and concentrations approximately 3% above and 3% below the stated value) were prepared. The average concentration of ebastine was 77.63%, and the concentration range of all the samples was 65.17-91.11%. All of the samples were homogenized in the shaker mixer. Each sample was shaken for approximately 30min. before its NIR spectrum was recorded. This was followed by further mixing for 10 min. and recording of the spectrum once more. When two consecutive spectra were identical, the sample was considered to be homogeneous; otherwise, the process was repeated until its condition was met. 156 batches of different concentrations of experimental powder samples were divided into three groups stochastically: the training set including 90 samples, the monitoring set including 42 samples and the test set including 24 samples. Table 1 shows the statistical data of the reference concentrations of the three sets. The reference concentrations of ebastine were measured according to the British Pharmacopoeia.27 3. 3. Recording of NIR Spectra Short-wavelength NIR spectra were measured for individual powder over the wavelength range 780-1100 nm Figure 3. Short wave length NIR reflectance spectra of ebastine powder. Table 1. Component contents of ebastine Ebastine (%g/g) Starch (%g/g) Maximum Minimum Average Maximum Minimum Average Training set 91.11 65.17 79.78 11.89 31.83 20.22 Monitoring set 90.87 65.63 77.82 11.63 32.57 22.18 Test set 82.42 68.77 75.29 18.98 30.23 24.71 at 1 nm intervals. Each recorded spectrum was the average of 12 scans and all measurements were obtained in the reflectance mode. The entrance slit of NIR spectrophoto-meter used was 20 nm. The original spectra for different concentration samples were shown in Fig. 3. As can been seen, the spectra baselines shift considerably. Therefore, the preprocessing of the input data is necessary. 4. Results and Discussion 4. 1. Selection of Number of Orthogonal Components Here eigenvalue criterion is employed to estimate the number of orthogonal components; the eigenvalue approach is to analyze the ratio of || p-[wTp/(wTw)]w\\/ ||p||, which becomes zero for correlated O-PLS components if no orthogonal variation is present in X. A plot of the ratio Wp-^p/^w)]w||/1|p|| versus the number of orthogonal components gives a good indication of the number of orthogonal components to extract. In Fig. 4 this ratio for each OPLS component is shown, and three orthogonal components were removed from X, because after three components the amplitude is down at O-PLS component Figure 4. Selection of number of orthogonal components. noise level. After the removal of three orthogonal components from X, the residual matrix is used as input data of artificial neural networks. 4. 2. Training and Optimization of ANN Models In this paper, three layers back-propagation network was used. The properties of the training set data determine the number of input and output neurons. The pretreated spectral data of the samples were regarded as input nodes. The different number of input nodes (the different interval of wavelength) was changed in order to scan the data. Be- cause there was only one kind of active ingredient in eba-stine powder samples, so the output layer contained one neuron. Neural networks were trained with different number of hidden neurons and numbers of cycle. At the beginning of a training run, both momentum and learning coefficient were initialized with optional values. Being in operation process, the modifications of the network input nodes (10-80), hidden nodes (4-31), learning coefficient (0.01-0.43), momentum (0.01-0.43) and number of iterations (600-3300) were selected by the back-propagation of the error and the degree of approximation. When all the adjustable parameters of the neural network were optimized, the neural network showed a high ability of generalization. The training set was used to train the network; the monitoring set was used to avoid over-fitting and the maximal degree of approximation was used to determine the network topology parameters (number of input, hidden, iterations, momentum and learning coefficient). While the network was optimized; the testing data were fed into the network to evaluate the trained network. 4. 3. The Establishment of ANN Model by O-PLS Preprocessed Spectral Data 4. 3. 1. Selection of Number of Input/output Nodes and Hidden Nodes The different number of input nodes, namely, the different interval of wavelength was changed in order to sieve the data. Figure 5 shows the effect of the different number of input nodes. 0 10 20 30 40 50 60 70 80 90 Input nodes Figure 5. Effect of input nodes: (a) relative standard error of training set; (b) relative standard error of monitoring set; (c) degree of approximation. Obviously, the relative standard errors of both training set and monitoring set reduced gradually when the number of input nodes increased. When number of input nodes was 50 (the interval of wavelength was 8 nm), the network had the highest degree of approximation. When the number of input nodes exceeded 50, the relative standard error of training set decreased and monitoring set increased. Thus, the degree of approximation deduced visibly, the network appeared the over fitting phenomenon. The number of hidden nodes had great effect on the predictive result. Figure 6 shows the effect of hidden nodes. Figure 6. Effect of hidden nodes: (a) relative standard error of training set; (b) relative standard error of monitoring set; (c) degree of approximation. Both curves a and b jumped obviously, and it was difficult to determine the optimum hidden nodes from them. Curve c represents the degree of approximation. Constant number c in Eq. (16) was selected discretionary for the adjustment of optimization. Therefore, the degree of approximation had enlarged function. We could determine the optimum hidden neurons were 15 via the largest degree of approximation. 4. 3. 2. Selection of Momentum and Learning Coefficient Learning coefficient and momentum affect the stability and convergence of the ANNs models. As a gene- Figure 8. Effect of momentum: (a) relative standard error of training set; (b) relative standard error of monitoring set; (c) degree of approximation. ral rule, the higher coefficient and momentum will lead to network instability. Figures 7 and 8 show the effect of learning coefficient and momentum. When learning coefficient and momentum arrived 0.1, the network models had the highest degree of approximation. The R.S.E. of training set decreased while the R.S.E. of monitoring set increased when learning coefficient and momentum exceeded 0.1, thus, the network appeared an over fitting phenomenon. Here the curve of the degree of approximation displayed its advantage again: it could enlarge the relative standard error of the network expressly. 4. 3. 3. Selection of Number of Iterations Number of iterations is very important for determi-nability of network models. In Fig. 9 the degree of approximation and relative standard errors are determined by the learning periods. The highest degree of approximation was established when number of iterations reached 1700 iterations. 0.0 0.1 0.2 0.3 0.4 Learning coefficient Figure 7. Effect of learning coefficient: (a) relative standard error of training set; (b) relative standard error of monitoring set; (c) degree of approximation. O 1000 2000 3000 Number of iterations Figure 9. Effect of number of iterations: (a) relative standard error of training set; (b) relative standard error of monitoring set; (c) degree of approximation. We can see the OPLS-ANN model had the smallest RSE and the best R. To further verify the reliability of the network models, 24 samples of test set were prepared. The optimal models were used to predict the concentrations of the active component. The results were listed in Table 3, too. Because the testing set did not join in training networks, so it had the highest RSE and the lowest R compared with that of the training set and monitoring set. 5. Conclusions In this work, a new method that O-PLS combined with artificial neural networks is introduced for non-destructive quantitative analysis of ebastine powder samples on NIR spectroscopy. Very satisfactory results were obtained with the proposed method, and the application of O-PLS presents interpretation of the spectral data. In order to compare the OPLS-ANN model, other calibration models that using first-derivative and second-derivative pretreated spectra were also designed. On the basis of the results, the ANN model that based on O-PLS method had the smallest RSE and best R. Therefore, the OPLS-ANN model is the best. 6. References 1. Y. Roggo, L. Duponchel, J. P. Huvenne, J. Agric. Food Chem. 2004, 52, 1055-1061. 2. D. Cozzolino, A. Chree, J. R. Scaife, I. Murray, J. Agric. Food Chem. 2005, 53, 4459-4463. 3. C. Miralbes, J. Agric. Food Chem. 2003, 51,6335-6339. Table 2. Optimum parameters used for construction of ANN models Pretreated spectra Input/output Hidden Momentum Learning Number neurons neurons coefficient of iterations First derivative spectra 50/1 11 0.2 0.07 2000 Second derivative spectra 40/1 13 0.15 0.1 1700 O-PLS pretreated spectra 50/1 15 0.1 0.1 1700 Table 3. Linear regression parameters and errors of ANN models Spectra Set Intercept Slope R RSE% First derivative spectra Training set 0.0443 0.976 0.994 1.12 Monitoring set 0.0965 0.997 0.997 1.22 Test set 0.0675 0.934 0.996 1.34 Second derivative spectra Training set 0.0321 0.883 0.992 1.34 Monitoring set 0.0677 0.895 0.998 1.56 Test set 0.0447 0.991 0.996 1.68 O-PLS pretreated spectra Training set 0.0941 0.945 0.998 1.08 Monitoring set 0.0559 0.856 0.997 1.11 Test set 0.0376 0.867 0.995 1.26 4. 3. 4. The ANN Models Design of Derivative Pretreated Spectra and Evaluation In order to compare the OPLS-ANN model, by using the same approach, the calibration models that use first-derivative and second-derivative preprocessing spectra were also designed. The selection of corresponding topology parameters are shown in Table 2. When all network parameters were optimized, the artificial network had a high ability to predict samples. To evaluate the ANN models, the linear regression equations of the reference concentration values and NIR concentration values were established (Fig.10). The intercept and slope represented the linearity degree of the reference concentration values and NIR concentration values. The intercept, slope of regression equation and R (correlation coefficient) are shown in Table 3. The RSE of training set and monitoring set are also shown in Table 3. 90- (0 c o « soli u e ° 7