Informatica 38 (2014) 51-58 51 Bilinear Grid Search Strategy Based Support Vector Machines Learning Method Li Lin, Zhang Xiaolong, Zhang Kai and Liu Jun School of Computer Science and Technology, Wuhan University of Science and Technology, China Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, China E-mail: lilin@wust.edu.cn Keywords: support vector machines, model selection, parameters optimization, protein structure prediction Received: October 21, 2013 Support Vector Machines (SVM) learning can be used to construct classification models of high accuracy. However, the performance of SVM learning should be improved. This paper proposes a bilinear grid search method to achieve higher computation efficiency in choosing kernel parameters (C, y) of SVM with RBF kernel. Experiments show that the proposed method retains the advantages of a small number of training SVMs of bilinear search and the high prediction accuracy of grid search. It has been proved that bilinear grid search method (BGSM) is an effective way to train SVM with RBF kernel. With the application of BGSM, the protein secondary structure prediction can obtain a better learning accuracy compared with other related algorithms. Povzetek: Razvita je nova metoda iskanja parametrov za metodo SVM. 1 Introduction Support Vector Machines (SVM) is a new machine learning method based on statistical learning theory and structural risk minimization [1-3]. The core function of SVM identifies the maximal margin hyperplane and a set of linearly separable data, classifies data correctly, so as to maximize the minimum distance between data and the hyperplane. A number of recent studies on SVM attempt to explore simple and efficient methods to solve the problem of maximal margin hyperplane [4-6]. Many of these works study the performance of SVM learning [7-9]. Several kernel functions can be used in SVM, such as linear function, polynomial function, RBF function, Gaussian function, MLPs with one hidden layer and spline. SVM is used to construct accurate classification models and has been widely applied, such as in handwritten character recognition, web page/text automatic classification, gene analysis and so on [10]. However, there is still no widely accepted way of selecting kernel function and its parameters in SVM learning. The selection of parameters for SVM algorithms usually depends on large-scale search. SVM learning is a kind of quadratic programming (QP) problem. Despite its advantages, there are a number of drawbacks in selecting hyperparameters in the size of matrix involved in the QP problem. Therefore, this paper proposes a bilinear grid search method to compute the penalty parameter and the kernel parameter (C, y) of SVM using RBF kernel. This method is efficient in reducing the training space in QP. Bilinear grid search algorithm has the advantages of both bilinear search and grid search. The proposed algorithm expands the search range of (C, y) so that it can perform SVM learning with a small size of training samples to construct classification models with high accuracy. The rest of the paper is structured as follows: Section 2 introduces SVM learning and relevant search strategies; Section 3 proposes bilinear grid search method in SVM learning with RBF kernel; In section 4, we conduct experiments to test the efficiency and applicability of the proposed algorithm; Finally, Section 5 is devoted to concluding remarks and future research recommendations. 2 Search strategy for SVM learning SVM classification can be described as: Given: - A training set of instance-label pairs (x, yi), i = 1,...,l, where xi e Rn andy e{l,-l}' . Find: - The solution to the minimum value of 1 T l -wTw + C 2 i = 1 where y.(wTZi + b) > 1 , £ > 0, i = 1,...,l. Here, the training vector xi are mapped into a higher - (probably infinite-) dimensional space using function # as Zi =$(xi) ; C(C >0) is the penalty parameter of the error term. Usually, the formation (1) can be considered as the following dual problem: 1 T T • Minimize F(a) = ~a Qa-e a , subject to 0 (x) + b) = sgn( ]T a, ytK(X, x) + b) . The above definition is employed to minimize the predictable error in SVM learning. Several kernel functions can be used in SVM learning, including linear kernel, polynomial kernel, sigmoid kernel, radial basis function (RBF) kernel (also called Gaussion kernel) etc. This paper selects RBF kernel as the SVM kernel function, i.e., RBF kernel K (x, y) = exp(-y || x - y ||2),y> 0 . The RBF kernel nonlinearly maps the training data into a higher dimensional space, so it can handle non linear relation between the class labels and the attributes. Keerthi and Lin [10] prove that a linear kernel with a penalty parameter C has the same performance as the RBF kernel with (C,y) (C is the penalty parameter, y is the kernel parameter). In addition, the application of sigmoid kernel in SVM learning and the similar parameters to RBF kernel are given by [9]. It is known that the number of hyperparameters influences the complexity of model selection. In RBF kernel, 0 < K ^ 1 . However, for polynomial kernels, there are two cases: yxTx. + y > 1 means its value is infinite; o < yxTxj + y < 1 is the opposite. The authors of [12] believe that since there is no inner product of two vectors, the application of the sigmoid kernel has some limitations. As mentioned above, there are two hyperparameters in model selection for the RBF kernel [11]: the penalty parameter C and the kernel width y. We can improve SVM learning by optimizing the parameter pair (C, y). Several methods can be used to compute these two parameters [12]. (C,y) can be computed in the same way as (log C,logy). When searching for a good set of log c and logy, it is usual to form a two-dimensional uniform grid (n x n) in the training space to find a set of (log C,logy) which has the smallest generalization error in SVM classification. This method is called grid search method. This method searches for n2 pairs of (C,y). Keerthi and Lin [11] propose a simple and efficient heuristic method for computing (C, y). It forms a unit slope which cuts through the middle part of the good region and searches for a good set of (C,y) within the good region. Suppose line C~ is the optimal penalty parameter of linear SVM, follow the procedures below (call it bilinear search method) to compute C~ : • Search for the best C of linear SVM and denote it as c . •Fix C , and search for the satisfying (C,y) log y = log C - log C using the RBF kernel. Keerthi and Lin [11] have difficulty in deciding the range of log C for the computation of c in the first step. This paper employs an improved bilinear search method to solve this problem by searching for logy = logC - logC with 0.5 C , C and 2 c respectively. The best C~ is computed from the range of logC . Grid search is time-consuming. Based on the bilinear search method adopted by Keerthi and Lin [11], we propose an improved bilinear search method to decide (c, y). First, identify a 'better' region (the range of log C is larger than that of [11]), and compute a (Q ) pair. Then, invoke an improved grid search to obtain a better pair (q) than (qy) for accurate prediction. It is stated that the grid search method can be improved by a improved grid search method, to obtain a better set of (c, y) and an accurate SVM model. 3 Bilinear grid search algorithm In SVM learning with RBF kernel, several methods can be applied to compute (c, y). As aforementioned, (Q, y) of the (coarse) grid search can be optimized using the improved grid search, to acquire a more suitable set of (C2 ,y2) for training accurate SVM models. Bilinear search method is used to search for the best parameter C~ in linear SVM. These parameters, 0.5 C , C and 2C are acquired in this paper, and computed with the relatedyos,y2 respectively. In [13], the advantage of determining (C,y) with the improved bilinear search method is also presented. Due to the complexity of search space, grid search method requires n2 pairs of (C,y) to be tried, while bilinear search method requires only 2n . Compared to bilinear search, grid search method usually has a higher accuracy of prediction. The bilinear grid search method proposed in this paper retains the advantages of these two methods: it attempts to search for (C,y) with less training points while maintaining not the accuracy of SVM models. Details algorithm is presented as follows: First, compute the best C using bilinear method and denote it as C . Then, compare 0.5 C, C and 2 C to search for the best parameter pair (Cbt, jbt) among (Cj, y), using the improved bilinear search method. According to (Cbt, Jbt), invoke a finer search using aimproved grid search smaller grid spacing of 20.25) in the scope of [2_2, 22] around the best (Cbt, jbt) to obtain (Cfinai, final). Denote (Cfinai, final) as the optimized (C, y) and use it to train a SVM model with RBF kernel and acquire the objective SVM model with the highest accuracy. Bilinear Grid Search Strategy Based Support. Informatica 38 (2014) 51-58 53 Algorithm: Bilinear grid search algorithm Input: Training examples Output: Classification model with the best accuracy Begin 1: Map the training data to the SVM space; 2: Select a linear kernel SVM; 3: Search for the best Cof linear SVM and call it c ; 4: for j = 0.5 C, C, 2 C do 5: Compute the j according to log j = log C-log c using the RBF kernel; 6: Select the best (Cbt, jtt) from the (Cj, jj); 7: For (Cbt, Ytt), invoke improved grid search to do 8: For k = 2-2 to 22 step 2025; 9: Compute their (Ck, jk); 10: Select the best (Cfinai,final) among (Ck,jk); 11: Train the SVM with RBF kernel using (Cfinal,Jfinal) > 12: Obtain the classification model with the best accuracy End. In the process, evaluate the accuracy of all models with 10-fold cross-validation. For grid search method, we uniformly discretize (C,y) within a [-10, 16] x [-15, 11] region i.e., 272 = 729 training points. For bilinear search method, we search for C using the value of uniformly spaced log Cin [-10, 16]. Then, discretize [-15, 11] as values of logj and check all points that matches log y = log C - log C (compared with bilinear search method, the improved bilinear search method takes all three values of 0.5 C~ , C~ , and 2 C~ to satisfy the bilinear equation). 4 Experimental results The proposed bilinear grid algorithm has been evaluated and compared to existing algorithms. This section presents the experimental results. Classification accuracies of grid search, bilinear search, improved bilinear search, and bilinear grid search are compared in this section. The experiments employ 10 sets of data chosen from UCI database [15]. These data are trained on LIBSVM [16] with four methods respectively, namely the grid search method, bilinear search method, improved bilinear search method and bilinear grid search method. Table 1 presents the basic information of the 10 data sets. For example, the Breast-cancer (BC) data set includes 9 attributes, 683 examples, and 2 classes. Table 2 demonstrates the model errors of these 4 different search algorithms. Figures inside the parentheses indicate set (Cfinai, jfinai), which is computed by our proposed bilinear grid search. It shows that bilinear grid algorithm is very competitive compared with grid search in terms of testing error. Among these 10 data sets, both bilinear grid search and grid search obtain the same accuracy on 6 data sets (i.e., Breast-cancer, Iris, Vowel, Wine, Wpbc, Zoo); Bilinear grid search trains more accurate models than grid search on 2 data sets (Credit-screening, Letter-recognition). On Diabetes and Wdbc, bilinear grid search obtains higher accuracy compared with grid search, even though the latter obtains higher accuracy during training. On all the 10 data sets, bilinear grid search learns more accurate models than bilinear search and improved bilinear search. Data set attribute example class Breast-Cancer (BC) 9 683 2 Credit-screening (CS) 15 690 2 Diabetes (DIAB) 8 768 2 Iris(IR) 4 150 3 Letter-recognition (LR) 16 20000 26 Vowel(VO) 10 528 11 Wdbc 10 569 2 Wine 13 768 3 Wpbc 33 194 2 Zoo 16 101 7 Table 1 : Training data set. Table 3 shows the number of training SVMs required by these 4 different algorithms. For all the 10 data sets, grid search needs to run the same Data Grid search Bilinear search IB search BG search BC 0.027(-3,-3) 0.030(-4,-2) 0.030(-4,-2) 0.027(2.8,-3) CS 0.130(2,-1) 0.139(3,-1) 0.130(2,-1) 0.128(2.5,-1.5) DIAR 0.225(0,-4) 0.244(-3,0) 0.234(-3,-1) 0.226(-1.8,-2.5) IR 0.026(2,-3) 0.046(-2,-2) 0.026(0,-1) 0.026(0,-1) LR 0.020(10,2) 0.019(5,1) 0.019(6,1) 0.019(6,1) VO 0.003(3,2) 0.003(6,1) 0.003(6,1) 0.003(6,1) Wdbc 0.019(3,-5) 0.040(-3,-1) 0.031(-2,-1) 0.021(-0.5,-2.3) Wine 0.005(0,-2) 0.028(-2,0) 0.016(-2,-1) 0.005(0,-2) Wpbc 0.164(6,-5) 0.201(1,-3) 0.190(2,-3) 0.164(3.8,-3.5) Zoo 0.039(10,-9) 0.138(-2,-3) 0.049(0,-2) 0.039(1.3.-3) IB search: Improved Bilinear search. BG search: Bilinear Grid search. Table 2: Model error comparison of bilinear grid search with other search methods. 54 Informatica 38 (2014) 51-58 training SVMs for 729 times because it trains SVMs with the same grid 272. Both bilinear search and the improved bilinear search require a smaller number of training SVMs. The number of training SVMs of bilinear grid search algorithm is much smaller compare with grid search algorithm. From Tables 2 and 3, we can see that bilinear grid search algorithm has the best performance in terms of accuracy and the number of training SVMs. For large data sets, bilinear grid search algorithm is preferable over grid search algorithm, since the former checks fewer points on the (log C, log y) two-dimension plane, thus saves computing time. The experimental results show that, with the largest training SVMs, grid search method generates higher accuracy of prediction than bilinear search method because the latter searches a smaller number of training SVMs. Bilinear grid search method retains the advantages of both bilinear search and grid search, thus reducing the number of training SVMs (compared with bilinear search and grid search method), while obtaining a competitive accuracy of prediction. Therefore, it is preferable over grid search. L. Lin et al. 5 BGSM on protein secondary structure prediction Due to potential homology between proteins in the training and testing set, the selection of protein database for secondary structure prediction is complicated. Homologous proteins in the database may generate misleading results. This is because in some cases the learning method memorizes the training set. Therefore protein chains without significant pairwise homology are used for developing our prediction model. To have a fair comparison, we train and test the same 130 protein sequences used by Rost and Sander [17] and Jung-Ying Wang [18]. These proteins, taken from the HSSP (Homology-derived Structures and Sequences alignments of Proteins) database [19], all have less than 25% of the pairwise similarity and more than 80 residues. Meanwhile, we also train and test the same sevenfold cross-validation are used in Rost and Sander [17] and Jung-Ying Wang[18]. Table 4 lists the 130 protein sequences used for seven-fold cross-validation. The secondary structure assignment was done using the DSSP (Dictionary of Secondary Structures of Proteins) algorithm [20], which distinguishes between the eight secondary structure classes. The eight classes are reclassified into the following three classes: H (a -helix), I (n -helix), and G (310-helix) are classified as helix (a ), E (extended strand) as & -strand (&), and all others as coil (c). Table 5 lists the reclassification process. Note that different classification methods influence the prediction accuracy to some extent, as discussed by Cuff and Barton [21]. For an amino acid sequence, the objective of secondary structure prediction is to predict a secondary structure state (a , &, coil) for each residue in the sequence. Data set Grid search Bilinear search IB search BG search BC 729 47 87 376 CS 729 53 105 394 DIAB 729 46 84 373 IR 729 49 93 382 LR 729 53 105 394 Vowel 729 54 106 395 Wdbc 729 47 87 376 Wine 729 44 83 372 Wpbc 729 53 105 394 Zoo 729 50 96 385 IB search: Improved Bilinear search. BG search: Bilinear Grid search. Table 3: Comparison of SVM training times. Set A 256b_A 2aat 8abp 6acn 1acx 8adh 3ait 2ak3_A 2alp 9api_A 9api_B 1azu 1cyo 1bbp A 1bds 1bmv 1 1bmv 2 3blm 4bp2 Set B 2cab 7cat_A 1cbh 1cc5 2ccy_A 1cdh 1cdt_A 3cla 3cln 4cms 4cpa_I 6cpa 6cpp 4cpv 1crn 1cse I 6cts 2cyp 5cyt R Set C 1eca 6dfr 3ebx 5er2 E 1etu 1fc2 C fdl H 1dur 1fkf 1fnd 2fxb 1fxi A 2fox 1g6n A 2gbp 1a45 1gd1 O 2gls A 2gn5 Set D 1gp1 A 4gr1 1hip 6hir 3hmg A 3hmg B 2hmz A 5hvp A 2i1b 3icb 7icd 1il8 A 9ins B 1l58 1lap 5ldh 1gdj 2lhb 1lmb 3 Set E 2ltn_A 2ltn_B 5lyz 1mcp_L 2mev_4 2or1_L 1ovo_A 1paz 9pap 2pcy 4pfk 3pgm 2phh 1pyp 1r09 2 2pab A 2mhu 1mrt 1ppt Set F 1rbp 1rhd 4rhv_1 4rhv_3 4rhv_4 3rnt 7rsa 2rsp_A 4rxn 1s01 3sdh_A 4sgb_I 1sh1 2sns 2sod B 2stv 2tgp I 1tgs I 3tim A Set G 6tmn_E 2tmv_P 1tnf_A 4ts1_A 1ubq 2utg_A 9wga_A 2wrp_R 1bks_A 1bks_B 4xia A 2tsc A 1prc C 1prc H 1prc L 1prc M Table 4: 130 Protein sequences name used in experiments Bilinear Grid Search Strategy Based Support. Informatica 38 (2014) 51-58 55 Structural Structural Structural Structural Structural Structural character name character name character before conversion character after conversion H a -helix H helix H H G 310-helix E strand I I ^ -helix C The rest G E Extended strand E E B P -bridge B C T Turn T S Bend S C The rest C (a) (b) (c) Table 5: (a) Eight types structural character and name (b) Three classes structural character and name (c) Reclassification between eight types and three classes. For fair comparison, we train and test same 130 protein sequences used by Rost, Sander and Jung-Ying Wang. These proteins are taken from the HSSP database. The secondary structure assignment was done according to the DSSP algorithms, which are distinguished by eight secondary structures classes, and then three classes. Moving window and multiple alignment methods are used for encoding. We apply the moving window method for the 17 neighbouring residues in our study. Each window has 21 possible values, including 20 amino acids and a null input. Therefore, the number of data points is the same as the number of residues when each data point has 21x 17=357 values. Before testing these proteins, we employ multiple alignment method to acquire more evolutionary information and protein family information. Having replaced single sequence orthogonal coding, input vector is obtained by aligning the similarity between unknown sequences and known sequences. Then, we can obtain evolutionary information by finding out whether these sequences are homologous. Figure 1 is an example of using evolutionary information for encoding. we align four proteins. In the gray column, the based sequence has residue 'N' while the multiple alignments in this position are 'N', 'A', 'S' and 'E' (indicating point of deletion in this sequence). Finally, we treat frequencies as the values of output coding. Therefore, the coding scheme in this position is as follows: A = 0.2, S = 0.2, E = 0.2, N=0.4. Prediction is conducted for the central residue in the windows. In order to allow the moving window to overlap the amino- or carboxyl-terminal end of the protein, a null input was added to each residue. Therefore, each data point has 21 x 17 = 357 values and each data can be represented as a vector. Note that data set RS130 consists of 24,387 data points in three classes where 47% are coil, 32% are helix, and 21% are strand. An important fact about prediction is that training errors are not significant; only test errors (i.e. accuracy for predicting new sequences) count. Therefore, it is important to estimate the overall performance of a learning method. Previous research proposed different methods to evaluate accuracy. The most common method applied in secondary structure prediction is the overall three-state accuracy (Q3). It is defined as the ratio of correctly predicted residues to the total number of residues in the database under consideration. Q3 is calculated by Q3 = qa+ qP + goon N x 100 where N is the total number of residues in the test data sets, and q is the number of residues of secondary structure type 5 that are predicted correctly. We carry out several experiments to optimiza hyperparameters using bilinear grid search method. The ranges of C and y are both [2-8, 2-7,..., 28], and cross-validation fold is 7. Fig.2 and Fig.3 are the running result charts of command-line and contour. In the chart of command-line, the best parameter (C,y) = (1.0,0.03125), and its classification accuracy is 70.8123%. 56 Informatica 38 (2014) 51-58 L. Lin et al. sequence to process : multiple alignment : SH3 ■■■F Y D N L Q Q Y L N" al i gnl ■"F Y D N L Q Q Y L N" al i gn2 ■■■Y F S A L R H Y I N" al i gn3 ■■■Y Y T S L R H Y L N" al i gn4 ■■■Y A A E L R R Y I N" 1 V 0 0 0 0 0 0 0 0 0 0 2 L 0 0 0 0 100 0 0 0 60 0 3 I 0 0 0 0 0 0 0 0 40 0 4 m 0 0 0 0 0 0 0 0 0 0 5 F 40 20 0 0 0 0 0 0 0 0 6 W 0 0 0 0 0 0 0 0 0 0 7 Y 60 60 0 0 0 0 0 100 0 0 8 G 0 0 0 0 0 0 0 0 0 0 9 A 0 20 20 20 0 0 0 0 0 0 10 P 0 0 0 0 0 0 0 0 0 0 11 S 0 0 20 20 0 0 0 0 0 0 12 T 0 0 20 0 0 0 0 0 0 0 13 C 0 0 0 0 0 0 0 0 0 0 14 H 0 0 0 0 0 0 40 0 0 0 15 R 0 0 0 0 0 60 20 0 0 0 16 K 0 0 0 0 0 0 0 0 0 0 17 Q 0 0 0 0 0 40 40 0 0 0 18 E 0 0 0 20 0 0 0 0 0 0 19 N 0 0 0 40 0 0 0 0 0 100 20 D 0 0 40 0 0 0 0 0 0 0 a output : A=0- 2 S=0- 2 E=0- 2 N=0- 4 Figure 1: An example of using evolutionary information for coding secondary structure. ËÛC : VFythori23Vpython. exe HEI □ [local] -1.0 -6 .5 69 .2418 (best c =0 707106781187, g=0 03125, rate =70 3408) M [local] -1.0 -4 .0 65 .6866 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.0 -5 .5 68 .9589 (best C =0 707106781187, g=0 03125, rate =70 3408) i [local] -3.0 -5 .5 67 .2243 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -0.5 -5 .5 70 .1562 (best C =0 707106781187, g=0 03125, rate =70 3408) j [local] -3.5 -5 .5 65 .1823 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -1.0 -5 .5 69 .8487 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -5 .0 67 .3597 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -6 .0 68 .4914 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -3 .5 51 .2691 (best C =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -6 .5 68 .2577 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -4 .0 58 .2482 (best C =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -5 .5 68 .3069 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.0 -3 .0 48 .1609 (best C =0 707106781187, g=0 03125, rate =70 3408) [local] -3.0 -3 .0 46 .9594 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -0.5 -3 .0 54 .2912 (best C =0 707106781187, g=0 03125, rate =70 3408) [local] -3.5 -3 .0 46 .5904 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -1.0 -3 .0 51 .3101 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] -2.5 -3 .0 47 .4146 (best c =0 707106781187, g=0 03125, rate =70 3408) [local] 0.0 -5. 0 70.8123 (best c = 1.0, 9=0.03125, rate =70.8123) [local] 0.0 -6. 0 70.0947 (best c = 1.0, g=0.03125, rate =70.8123) [local] 0.0 -3. 5 65.5677 (best c = 1.0, 9=0.03125, rate =70.8123) [local] 0.0 -6. 5 69.5822 (best c- 1.0, g=0.03125, rate =70.8123) ERK1 d Figure 2: the result chart of command-line. Table 6 lists the accuracy of different methods on RS130 data set. The average accuracy for bilinear grid search method is 70.8%, which is competitive compared with those methods proposed by Rost, Sander and Jung-Ying Wang. The average accuracy for the method of Rost and Sander [17] is 68.2% , which employs neural networks for encoding. Other techniques must be incorporated in order to increase accuracy to 70.0%. Jung-Ying Wang utilizes basic SVM to [18] obtain 70.5% of the accuracy . The experiment used the same data set (including the type of alignment profiles) and secondary structure Bilinear Grid Search Strategy Based Support. Informatica 38 (2014) 51-58 57 Figure 3 : The result chart of contour. definition (reduction from eight to three secondary structures) as those employed by Rost and Sander [17], and Jung-Ying Wang [18]. The same accuracy assessment of the prior ones is used as well, so as to ensure the fairness of comparison. Different methods Secondary structure prediction accuracy % Neural networks 68.2 Neural networks incorporated With other techniques 70.0 SVM 70.5 SVM with bilinear grid Search method 70.8 Table 6: Comparison of different methods' accuracy on RS130 data set. 6 Conclusion In this paper, we demonstrate an approach of optimization in SVM learning. The proposed bilinear grid search method can effectively improve learning performance and enhance the accuracy of prediction. A comparison has been made between grid search method, bilinear search method and bilinear grid search method when selecting optimal parameters for RBF kernel. Experiment results prove that the proposed algorithm retains the advantages of both bilinear search method and grid search method. With the application of BGSM, the protein secondary structure prediction also obtains better learning accuracy compared with other algorithms. Acknowledgement This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61273225, No.61100055 and No.31201121, the Natural Science Foundation of Hubei Province under Grant No. 2011CDB233. Reference [1] Vladimir N. Vapnik (1998). Statistical learning theory. J. Wiley & Sons, New York. [2] C. Cortes, Vladimir N. Vapnik (1995). Support vector networks. Machine Learning, Vol.20, No.3, pp.273-297. [3] Vladimir N. Vapnik (2000). The Nature of Statistical Learning Theory (Second Edition). Springer Press. [4] B Scholkopf, AJ Smola (2002). Learning with kernels. MIT Press. [5] Kai Zhang, Tsang, I.W. , Kwok, J.T.(2009). Maximum margin clustering made practical. IEEE Transactions on Neural Networks, Vol.20 , No.4, pp. 583 - 596. [6] E Blanzieri, F Melgani (2008). Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE transaction on Geoscience and Remote Sensing, Vol.46, No.6, pp.1804-1811. 58 Informatica 38 (2014) 51-58 [7] GB Huang, QY Zhu, CK Siew (2006). Extreme learning machine: theory and applications. Neurocomputing. [8] S. Fine (2001). Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, Vol.2, pp.243-264. [9] K. M. Lin and C. J. Lin(2003), A Study on reduced support machines. IEEE Trans. on Neural Computation, Vol.14, No.6, pp. 1449-1559. [10] Li. Lin and Zhang Xiaolong (2005). Optimization of SVM with RBF Kernel. Computer Engineering and Applications(in Chinese), Vol.29, No. 10, pp.190-193. [11] S. S. Keerthi, C. J. Lin(2003). Asymptotic behaviours of support vector machines with Gaussian kernel. Neural Computation, No.5:1667-1689. [12] O. Chapelle, V. Vapnik et al(2002). Choosing multiple parameters for support vector machines. Machine Learning, Vol.46, pp.131-159. [13] P. Wang, X. Zhu(2003). Model Selection of SVM with RBF Kernel and its Application. Computer Engineering and Applications(in Chinese), Vol.24, pp.72-73. [14] H. T. Lin, C. J. Lin(2003), A Study on Sigmoid kernels for SVM and the training of Non-PSD kernels by SMO-type methods. Technical Report, National Taiwan University. L. Lin et al. [15] Blake C., Merz C.(2013), UCI Repository of Machine Learning Databases. http://www.ics.uci.edu /mlearn/MLRepository.html, Dept. of Information and Computer Science, University of California. [16] C. C. Chang, C. J. Lin (2013). LIBSVM: A library for support vector machines. Software Available on-line at: http:// www.csie.ntu.edu.tw/~cjlin /libsvm /index.html . [17] B. Rost and C. Sander(1993). Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology, Vol.23, No.2, pp.584-599. [18] Jung-Ying Wang(2002). Application of Support Vector Machines in Bioinformatics. Taipei: Department of Computer Science and Information Engineering, National Taiwan University. [19] http://www.cmbi.kun.nl/gv/hssp. [20] W. Kabsch and C. Sander(1983). Dictionary of protein secondary structure: Pat-tern recognition of hydrogen-bonded and geometrical features. Biopolymers, Vol.22, No.12, pp.2577-2637. [21] J. A. Cuff and G. J. Barton(1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Struct. Funct. Genet., Vol.34, pp.508-519.