Informática 38 (2014) 367-376 367 Incremental Hierarchical Fuzzy Model Generated from Multilevel Fuzzy Support Vector Regression Network Ling Wang, DongMei Fu and LuLu Wu School of Automation & Electrical Engineering, University of Science and Technology, Beijing, 100083, China Key Laboratory of Advanced Control of Iron and Steel Process (Ministry of Education), Beijing, 100083, China Keywords: FCM clustering, fuzzy association rules, incremental hierarchical fuzzy model, multilevel fuzzy support vector regression network Received: March 23, 2014 Fuzzy rule-based systems are nowadays one of the most successful applications of fuzzy logic, but in complex applications with a large set of variables, the number of rules increases exponentially and the obtained fuzzy system is scarcely interpretable. Hierarchical fuzzy systems are one of the alternatives presented in the literature to overcome this problem. This paper presents a multilevel fuzzy support vector regression network (MFSVRN) model that learns incremental hierarchical structure based on the Takagi-Sugeno-Kang(TSK) fuzzy system with the aim of coping with the curse of dimensionality and generalization ability. From the input-output data pairs, the TS-type rules and its parameters are learned by a combination of fuzzy clustering and linear SVR in this paper. In addition, an efficient input variable selection method of the incremental multilevel network is proposed based on the FCM clustering and fuzzy association rules. To achieve high generalization ability, the consequence parameters of a rule are learned through linear SVR with a new TS-kernel. This paper demonstrates the capabilities of MFSVRN model by conducting simulations in function approximations and a chaotic time-series prediction. This paper also compares simulation results from the single-level counterparts-FSVRN and Jang's ANFIS model. Povzetek:Predstavljen je hierarhični mehek model, zgrajen iz večnivojske mehke regresijske mreže. 1 Introduction As is widely known, fuzzy rule-based systems have been proposed and successfully applied to solving problems such as classification, identification, control, etc. At present, one of the important issues in fuzzy rule-based systems is how to reduce the total number of involved rules and their corresponding computation requirements. In a single fuzzy system, in order to maintain a specified accuracy, the number of rules grows exponentially with the number of input variables or input fuzzy sets, but the interpretability of the fuzzy system is lost. Hence, to deal with the "curse of dimensionality" and the rule-explosion problem, hierarchical fuzzy system (HFS) was proposed [1] and have attracted much attention in recent years. Most hierarchical fuzzy systems [2-9] consist of a number of low-dimensional fuzzy systems in a hierarchical form, the number of rules can be reduced to as low as being a linear function of the number of input variables. A HFS is made up of a set of fuzzy subsystems or modules. These modules are linked such a way that the output of a module is the input of other ones. Fig. 1 depicts some possible hierarchical models for 4 input variables and 2 hierarchical levels or 3 hierarchical levels. The incremental structure is shown in Fig.1(a), in which the inputs of a module is the output of the previous ones, along with external variables[2-6]. The aggregated structure is shown in Fig.1(b), in which the first level is made up of a set of modules receiving the input variables, and each variable is used as an input only in a single module. The outputs of the modules in the first level are the inputs of the modules which constitute the next level, and so on [7-9], level3+' levels levell+J lX 2 lX L",HF i.--Hr ^ fe Figurel: Example of HFS Structure. Our research focused on the incremental hierarchical structure. In this hierarchy, it is important to determine its input variables and their interactions in different levels. In general, by analyzing the relative importance of input variables, the most important input variables are assigned to the lowest level and the least important input variables are assigned to the highest level [12-13]. In order to assign the correlated or coupled input variables to the same level in hierarchical fuzzy system, Chung and Duan [13] introduced a correlation matrix to iX^ iX^ JCj lX^ (ty 368 Informatica 38 (2014) 367-376 L. Wang et al. determine the correlated or coupled input variables. In[22], a multi-objective genetic algorithm (MOGA) is adopted to determine the input variables of each level module and the hierarchical architecture. In [6], an evolutionary algorithm (EA) is investigated to select the input variables and the topology in HFS. But GA and EA are all very time-consuming searching process. In the hierarchical fuzzy system design, each singlelevel module serves as individual reasoning level. As a fuzzy reasoning mechanism, Takagi-Sugeno-Kang (TSK) type is most widely used. The well-known Jang's ANFIS[21] model based on layered network architecture was firstly proposed. In [10] and [11], the neural-fuzzy network is also used to realize TSK fuzzy reasoning. Usually, the consequent in Takagi-Sugeno fuzzy model is a crisp function of antecedent variables, and the recursive least squares (RLS) is used to determine the parameters of nonlinear consequents. However, this further RLS tuning is based on empirical risk minimization and lack of the high generalization ability. The idea of solving this parameter estimation problem by incorporating SVR (support vector regression) into TSK model has recently attracted a lot of attention [23-24]. In [23], a SVR-based FNN was proposed that the fuzzy if-then rules are generated based on the extracted SVs. Since the number of SVs in an SVR is usually very large, the model size of a designed FS is equally large. In [24], fuzzy c -means (FCM) is used to generate fuzzy rules and their antecedent parameters, the consequent part parameters are obtained by SVR learning. In contrast to fuzzy neural network, the use of SVR for TSK learning has a smaller number of parameters and the use of kernel transformation retains the SVR's good generalization ability. This paper proposes a multilevel fuzzy support vector regression network (MFSVRN) model to design a fuzzy system to solve the dimensionality problem and generalization performance. In order to achieve efficient input variables in each level, we adopt the FCM clustering algorithm [15] and fuzzy association rules mining method [19] to construct a MLFSVRN model with incremental architecture. In addition, based on neuro-fuzzy systems, a novel TSK type fuzzy reasoning system with support vector regression learning mechanism called fuzzy support vector regression network (FSVRN) is applied to the modules of hierarchical fuzzy system. To improve the prediction performance of hierarchical fuzzy system, the consequence parameters of a rule are learned through linear SVR with a new TS-kernel. In the view of singlelevel FSVRN, the overall network-Multilevel fuzzy support vector regression network (MLFSVRN) is formulated. The paper is organized as follows: Section 2 briefly describes the data mining techniques used in this paper. Section 3 presents our research methodology including input selection algorithms and construction of the multilevel support vector regression network models. Section 4 shows the experimental results. Finally, Section 5 concludes the paper with our final remarks. 2 Related knowledge 2.1 Fuzzy C-Means (FCM) clustering Fuzzy c-Means (FCM) clustering algorithm [15-16] is an iterative optimization algorithm that minimizes the cost function (1) subjecting to uiJ e [0,1]; J (U, v)=ss i=1 j=1 w - j (1) Here, uij is the degree of membership of xi in the cluster j , which is defined as (2) (1/ x - )' 1 m-1 (2) S a/| X - M |) 2 )1 m-1 k=1 where vj is the center of the j th cluster, vt = 1 j n S (uy)m • x- (3) S ("j)m i=1 X is the i th data sample, n is the number of data points; m e (1, x) is a weighting constant. The optimal clusters are produced by minimizing the objective function. 2.2 Fuzzy association rules For the numerical data, fuzzy association rules [19] are easily understood by humans because of the fuzzy termsets. In order to mine fuzzy association rules, we apply FCM clustering to transform each of the numeric variables into fuzzy sets (fuzzy partition) with its membership function u, and then these fuzzy partitions are used to generate fuzzy rules. Meanwhile, the centre of each fuzzy set and the maximum and minimum value for each partition of the input data points are determined by FCM. By finding the centre of each partition, we can label it very easily according to the data point at which the core occurs. The labelling of each partition is very important as it helps a lot in the -eventual generation of fuzzy association rules. Given a set of records, n is the record number, two items X = {xj,x2,...,xp} and Y = ...,yq}, p is the length of the itemset X , q is the length of the itemsetY . The fuzzy set membership value of variable x j in the i th record is denoted u(xyy). The Apriror approach is used for extracting fuzzy itemsets from a fuzzy data set based on interesting measures(support and confidence) and able to generate fuzzy association rules. A fuzzy association rule is an implication of the form( X is A) ^ (Y is B). A and B are fuzzy sets that characterize X and Y respectively. The measures of support, confidence and correlation have been fuzzified for the purpose of fuzzy association rules. The fuzzy support of X is defined as follows. 2 U y i=1 Incremental Hierarchical Fuzzy Model... Informatica 38 (201414) 367-376 369 f. support i n p ( x)=-zn u( xj ) i=1 j=1 Where X ={xi|xi ={xy\j = 1, —, p) i = 1, - fuzzy support of X ^ Y is defined as follows. ( support * 1 _ y H ( x ^ y)=n z n u(xj ) n n u(xj ) i=1 j=1 j=1 (4) .The (5) The fuzzy confidence of X ^ Y is defined as follows. fUUDDort ( X ^ Y) f ( x ^ Y) = supportK_i 1 confidence^ ^ ^ 1 J fs support ( x ) (6) The fuzzy correlation measure of the association rule X ^ Y can be measured by computing P X U Y) Wer^ceC X ^ Y) Correlation( X, Y) = - (7) P(X)P(Y) 4Upp0rt(Y) In order to mine fuzzy association rules, the definitions of fuzzy support and fuzzy confidence are used in Fuzzy Apriori instead of their crisp counterparts used in Apriori. 3 Multilevel fuzzy SVR network (MLFSVRN) modelling of incremental type In order to determine a multilevel fuzzy SVR network model, we must determine the model structure and initial parameters. Based on an incremental type structure like the one shown in Fig.1(a), it is quite possible to consider some influential variables to the first level, the less influential ones to the next level, and so on. To do so, we must find a set of candidate variables, which play a significant role in determining the output. 3.1 Variable selection based on FCM clustering and fuzzy association rules In this paper, fuzzy association rules are used to select more representative variables from the original ones to improve the later prediction performance. The different fuzzy confidence and fuzzy support values between input variables and output variables are examined by using fuzzy association rules. In this paper, cross-validation with best parameter grid search is adopted to obtain the best rules. we only report the best rules that are based on the fuzzy confidence value = 1 and the fuzzy support value = 0.03. These rules are further ranked as their importance by ImportanCX, Y) = log ; fconfidenff ^ X) fsuppor(X) =lo p(X/Y) PX) (8) variables are calculated. The term ID(xi) is used to represent the influence degree (ID) of xt ,i.e., important( xt) + correlation( xt) ID( xi ) =- SUM Here, the ID term important(xi) is used to represent the best result of the degree of importance of the fuzzy association rules with Xj, which is done by (8), In a similar way, the term correlation( xt) is used to represent the best result of the degree of correlation of the fuzzy association rules with Xj, which is done by (7). For all extracted variables from fuzzy association rules, these two items are added up and described as SUMd = ^. j (important(xt) + correlation(xt)). 3.2 Constructing a MLFSVRN model with incremental architecture Based upon the analysis method just described in Section 3.1, the influential input variables can be obtained and consequently a MLFSVRN model with incremental architecture can be constructed as shown in Fig.2. The construction algorithm of MLFSVRN Model with Incremental Architecture can be summarized as follows. Step 1- Initialization: The number of levels is h. This identified model is called model h and its output is denoted by y(h) . All n input variables are put in a set S . Let SUMjd s(important(xt) + correlation(xt)) . A threshold value Tinc is set to control the model structure(the number of levels). A large Tinc will set the combination of the representative input variables rigorously and hence generate complicated networks while a small value will set the combination of the representative input variables loosely and generate the networks with few sublevels. Such arrangement is used to make the first level reserve enough system information and let the first level contain at least two input variables. Step 2- Determination of Level 1: 1) Choose n most influential inputs as the input variables of the first level and write them as xf-1 's. In order to make the first level contain enough system information, the value of n1 is determined by and n1 > 2 < Tinc In particular, according to (8), the rule that has the importance value less than 0.8 is excluded. As a result, the extracted rules that relate to the important variables will be obtained. In order to determine the most influential input variables, the influence degree of K^) ^-vi importar(Xp) + correlatia(XP) = z=1 SUMid (9) According to (9), the first level of the model architecture contain at least two input variables. Here, the term important(Xp) is used to represent the best result of the degree of importance of the fuzzy association rules with x(1), the term correlation(x(1))is used to represent 370 Informatica 38 (2014) 367-376 L. Wang et al. the best result of the degree of correlation of the fuzzy association rules w ith a*)1' . 2) Set the level index h = 2 . Remove these input variables from S. Step 3— Recalculation: Recalculating SUMID and the influence degree values of the variables left in S, i.e., IDjXj) = ^portanix^correlatiatx,) sumid Step 4—Determination of level h: Choose nh most influential input variables, i.e., x'/1" .s , from S and assign them to level h . The number nh is determined by ll)i¿¡h') > '/,-,„ and nh> 2. Step 5—Termination: Remove these nh input variables from S . If 5 is empty, the algorithm terminates, otherwise go back to Step 3, let 7? = 72 + 1. nh is the total number of input variables (ordinary system inputs and input from previous level). x) h) (i = 1, • • •, nh) is input variables being determined by the proposed method in Section 3 and assigned to this level; j/h\h ' n m nh nh+l ■J h-1). a, (A) J.h) 0 J - diJ a) are crisp consequent part parameters; Fuzzy set A^J is employed with the following Gaussian-type membership function: m1A) = exPi ( i w (z) ■ (12) Here, ni'j,) and <7j correspond to the center and width of the fuzzy set respectively, which are determined by FCM clustering. The firing strengthn(jh\z) of rule j is calculated by Ah), Cz) = Y\Mf =exp z^-mf , (h) (h), 2 (z) -ml') Figure 2: The most parsimonious incremental architecture for a three-level-input system. 3.3 TSK type MLFSVRN model structure and learning 3.3.1 TSK fuzzy inference algorithm In this section, a TSK fuzzy inference algorithm in its each single-level module of MLFSVRN model is presented. Consider a TSK fuzzy system, the / th fuzzy rule in level h will have the form Rule(;A): IF (^isA^.-.^W«.) and J \ 1 U n„ nh,j! (y^XsB^) - exp (13) Where If the single-level fuzzy system contains r rules, then according to the simple weighted sum method [20], the output would be ( ifc+i M J=i r -Y. j=i tfXïY aoj Where aj = (i^-if+ 4A))-exp By setting -W_in(h) -(h) (h)r (h) w\ '[miy (h)-. (h) - (A) • m . = w, m. "hJ1 J J (14) (15) Eq.(14) becomes Where Incremental Hierarchical Fuzzy Model... Informatica 38 (201414) 367-376 371 J=i p- lli^-^l o? (vfflif® ■ nif}) • exjt — j| N j=1 _ ll^-^li J1 (16) By adopting the kernel trick, a TS-kernel is integrated as KTSCz(h\m(f) = (i(A) • mf) • exp And setting ^ a^j exp M ' ui then (16) can be further rewritten as: \z{h)-mf\ (17) data IS transformed to the N -âf) (vczwxvc4h)) )+*/ hW k=1 Where af) and âf) are solved by SVR. Eq.(19) can be represented as k=i >1 r ( N \ j= 1 U=i (20) = S + ^ = S ^fK^.Sff) + b{ J=i JV Where wf = £ (af -âf ) (21) k=1 j=1 Eq.(17) is the output of TSK-type fuzzy system, which is equivalent to the output of the SVR. For each level module, the parameters riij and <7j in the TS-kernels and the number of rules are determined by FCM clustering, the weighting parameters w1^"1 and bias l>"'> are determined by the linear SVR. Each input According to Eq.(21), the weighting parameters w'" are obtained. 3.3.2 Structure of single-level modules The structure of each single-level FSVRN module is based on the structure like the four-layered Fuzzy neural network in Fig. 3, which consists of the membership function, fuzzy rules, weighted consequent and output layers, and their functions are briefly described below in the our context. In the following descriptions, (i)U- represents the output of the i th node in jth layer of level h, the 1 th input of a certain node in current layer of level h. Layer 1: Each node in this layer corresponds to a membership function of ordinary system inputs and input from previous level [see (22) and (23)]. (22) and M (23) vector F(i(A)) = [li(i(/,)),y2(i(/,)),...,yr(i(/,))] , where Vj(z{h)) = KTS(z{h\mf) is the output of the / th TS- kernel. The vector is fed as input to a linear SVR, and the training data pairs are represented by (18) The optimal linear SVR function is 0(hl =M(hJi) (/">) VA = 2, Here, M(^{x[h)) and M^ i (/(/M)) are the corresponding membership value of the input (or intermediate) variable connected to it by (12). Ni]h) is equal to the total number of inputs to level h, including the membership value of system inputs and intermediate input. When h = 1, there will not exist any intermediate inputs and only (22) is applied. Layer 2: The output of each node in this layer represents the firing strength of a rule (19) C$=fl(>ih)) = (24) i=i The determination of (24) is similar with (13). /;• denotes the number of preconditions in rule node i and NiJ>) indicates the total number of rules in level h. Layer 3: The output of each rule is computed in this layer. In level h = 1 where no intermediate variable appears, the function has the form 372 Informatica 38 (2014) 367-376 L. Wang et al. Cg) = f ni \ Z j j Vi = 1,2,-, N® (25) Vi = 1,2, -, Mh) (26) V j=0 and when h > 1 ="(h) f ï> h z f ' V j=° where N3(1) and N3(h) correspond to the total number of nodes in the third layer for Level land Level 3,respectively. \aj} ]is the consequent parameter set in level h, z0h) = 1;. Layer 4: The final output is computed by summing the outputs of all rules Nf> Off =]Tu(h) (27) j=1 NRh) is the total number of fuzzy rules in level h, which equals to N2h) . Eqs.(14), (16), (17), (18) and (20) determine the output in (27). clustering method. The membership function parameters of all intermediate variables are fixed according to the final outputs. Step 2—Apply Linear SVR to Level h: In order to evaluate the optimal value of all unknown consequent parameters a(i , (25),(26) and(27) can be rewritten in the form of linear SVR according to Eqs.(14), (16), (17), (18) and (20) . The consequent parameters then can be evaluated using Linear SVR method with (15) and (21). Here, the output value y will be used as the desired value of y (h) . Step 3—Forward Computation: The output of Level h, y(h), can be computed using the evaluated a<-Jh>i' s. Step 4—Termination: Set h = h +1. If h < H, go to Step 2; otherwise, the training process stops. Figure 3 : The structure of the four-layered network. 3.3.3 Learning algorithm As we have found in 3.3.1, the consequent part of the TSK fuzzy inference rule is a linear combination of all consequent parameters, which can be reconstructed in the form of linear SVR. So, the linear SVR algorithm is first applied to evaluate the optimal values of all consequent parameters of TSK-type MLFSVRN model. Given a set of training data pairs and setting the desired output of each single-level module as same as the final system output. The linear SVR learning algorithm of a TSK-type MLFSVRN model with incremental architecture is listed as follows. Step 1—Initialization: 1) Divide the input variables into H subsets Xi -(h) Ah) x: Jh) 2 h = 1, , H jaccording to the variable selection method proposed in Section 3, each of them attached to a single-level reasoning network module. 2) Set the level index h = 1 and initialize appropriate membership function parameters based on the FCM Figure 4: MLFSVRN model with incremental architecture (a) six-dimensional example (b) Mackey-Glass time series. n h Incremental Hierarchical Fuzzy Model... Informatica 38 (201414) 367-376 373 4 Simulation results In this section, the proposed method has been evaluated for nonlinear system identification and Mackey-Glass chaotic time-series prediction. Section 4.1 discusses a six-dimensional example, which is used to validate the variable selection analysis method described in Sections 3. Section 4.2 discusses the Mackey-Glass chaotic time series prediction, aiming at demonstrating the satisfactory learning behavior and good generalization ability of the MLFSVRN models. 4.1 Six-dimensional example The six-dimensional nonlinear system was given by the following equation: y=(-A+j 17 . the equation shows chaotic behavior. In our simulations, we set r = 30. In this paper we used x(t-30), x(t-24), Xi-18), x(t-12), x(t-6) and x(t) as input variables to predict the value of x(i + (■>). To construct the MLFSVRN with incremental architecture, using the variable selection method proposed in section 3, the input variables are grouped into three subsets {^i-24), A