Modelling Dynamic Systems with Artificial Neural Networks and Related Methods Juš Kocijan Nova Gorica 2023 Model ing Dynamic Systems with Artificial Neural Networks and Related Methods Original title: Modeliranje dinamičnih sistemov z umetnimi nevronskimi mrežami in sorodnimi metodami Author: Juš Kocijan Edition: English edition Reviewers of original: Prof. Aleš Belič and Prof. Igor Škrjanc English translation: Juš Kocijan Proofreading: LINGULA, jezikovni center, d.o.o. Text formatting and cover layout: Juš Kocijan Publisher: University of Nova Gorica Press, Vipavska 13, SI-5000 Nova Gorica, Slovenia Publication year: 2023 ISBN: 978-961-7025-29-3 (PDF) Published in PDF: http://www.ung.si/sl/zalozba/ http://www.ung.si/en/publisher/ 15 February 2023 Free e-publication. Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 140589315 ISBN 978-961-7025-29-3 (PDF) This work is licensed under a Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International License. How to cite: Kocijan, J. (2023). Model ing Dynamic Systems with Artificial Neural Networks and Related Methods. University of Nova Gorica Press. https://www.ung.si/en/publisher/ Foreword Modelling Dynamic Systems with Artificial Neural Networks and Related Methods can be used as a textbook for the field it covers or as an introductory textbook for more advanced literature in the field. It is intended for undergraduate students, especially at the postgraduate level, who have sufficient knowledge of dynamic systems, as well as for professionals who wish to familiarise themselves with the concepts and views described. The textbook is not intended to be a detailed theoretically based description of the subject but rather an overview of the identification of dynamic systems with neural networks and related methods from the perspective of systems theory and, in particular, its application. The work is intended to inform the reader about the views on this subject, which are related but treated very differently in different research fields. I used the material for the textbook as a basis for lectures at the Faculty of Electrical Engineering of the University of Ljubljana. I would like to thank Miro Štrubelj, who created many of the figures, Dr. Gregor Gregorčič, who allowed me to use some of the pictures from his doctoral thesis, and all others who directly or indirectly influenced the creation of this work. Ljubljana, Autumn 2007 Juš Kocijan Addendum 2023 The English-language version of the textbook was created in 2022 for English-speaking students. The textbook follows its Slovenian original, except for some minor corrections and additions. Ljubljana, Winter 2023 Juš Kocijan 1 2 Contents 1 Introduction to artificial neural networks 1 1.1 Short overview of the development of the topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 About artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Radial basis-function network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Neural networks for modelling nonlinear dynamic systems . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Identification of linear dynamic systems 13 2.1 Summary on linear system identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Mechatronic-system identification example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Identification of nonlinear dynamic systems 27 3.1 General information on the identification of nonlinear systems . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Example of the identification of a pH-neutralisation process . . . . . . . . . . . . . . . . . . . . . . . . 35 4 Control with artificial neural networks 41 4.1 Neural networks in control systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Predictive control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5 Local-model networks and blended multiple-model systems 51 5.1 Velocity-based linearisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Blended multiple-model systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6 Design of gain-scheduling control 63 6.1 The design with velocity-based linearisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Example of control design: a single-segment robot manipulator . . . . . . . . . . . . . . . . . . . . . . 66 6.3 Example of the control design of a gas-liquid separator . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3 CONTENTS 7 Identification of nonlinear systems with Gaussian processes 73 7.1 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2 Identification of dynamic systems with GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.3 Example of the identification of the pH-neutralisation process . . . . . . . . . . . . . . . . . . . . . . . 82 7.4 Control design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4 Chapter 1 Introduction to artificial neural networks Artificial neural networks (ANNs) are a concept and Cell body method for solving various problems that have been around for more than half a century. In this textbook, we will refer to them as ‘neural networks’. Their applications are highly diverse, both in professional fields (engineering, computer science, natural sciences, social sciences, etc.) and by problem domain (regression, classification, clustering, etc.). Dendrites Neural networks are widely used for the experimental mod-Axon elling of dynamic systems, especially complex and nonlinear systems, and consequently for the design of automatic control systems. In this textbook, we will highlight particular problems in this area. Synapses It is important to bear in mind that this is an introductory overview that does not claim to be exhaustive but rather Figure 1.1: Biological model of an artificial neuron gives insight into the most common use of neural networks for the experimental modelling of nonlinear dynamic systems and some of the specific problems that arise in this 1.1 Short overview of the develop- context. ment of the topic Neural networks are classified as one of the most commonly used methods in computational intelligence. The In this subsection, we summarise the main milestones in classification and the relationship to the methods of so-the development of artificial neural networks. The aim is called artificial intelligence go back to the beginnings of to illustrate the evolution of the perspectives that have neural networks. Artificial neural networks arose from the emerged with the developments. The overview is based idea of replacing a biological neuron, as shown in Figure on [10] from a systems perspective. It is a holistic view of 1.1, with an artificial neuron. In this way, they would form the neural network and its properties in terms of inputs the basic elements of an artificial brain. and outputs. This pattern of thinking was later surpassed by the re-1943 In this year, McCulloch and Pitts [12] presented alisation that neural networks are nothing more than ba-models of biological neurons, which were the basic sic mathematical functions for approximating an arbitrary elements of circuits for solving computational prob-nonlinear relation. lems. The idea was that they would replace the functions of a biological neuron. In the remainder of this chapter, a brief overview of the evolution of neural networks is given. This is followed 1949 Hebb, a psychologist by profession, published a book by a classification of neural networks according to vari- [7] in which he described neuron learning as tuning ous criteria. Next, we will describe two types of the most the weights of connections between units (i.e., neu-commonly used neural networks: the multilayer percep-rons). The various and much later developed rules tron and the radial basis-function network. We will then for parameterising the weights in neural networks, describe how neural networks can be used to model dy-which can be represented as units with connections, namic systems. are essentially derivatives of this method. 1 Introduction to artificial neural networks 1959 Rosenblatt [21] described a single-layer neural net-tification using nonlinear regression. They thus re-work consisting of elements which he called a ‘per-moved the artificial intelligence aspect from neural ceptron’. Neural networks consisting of these ele-networks. These and similar methods were renamed ments, albeit in a modified form, are still among the ‘methods of computational intelligence’ to avoid un-most commonly used neural networks today. The necessary misunderstandings. basic elements of the neural network are the switch-after 2010 Deep learning greatly extends the capabili-ing functions; as already mentioned, the network has ties of neural networks, especially in speech, image, only one layer. This means that the input signal object recognition, and beyond. from a particular input passes through only one layer to the output neuron. This simple structure limits the possibilities of the neural network. 1.2 About artificial neural networks 1960 Widrow and Hoff [24] introduced a neural network of linear elements called Adaline (ADAptive LINear Element). This network can be represented electri-The basic building block of a neural network is a neuron. cally as inputs weighted by resistors and connected It is described by a function to a sumator. In principle, a neuron is a linear func-X yi = f ( wijxj + θi), (1.1) tion, and the Adaline network approximates input-j output mapping by using a weighted sum of linear where x functions. The authors also derived the first analyt-j are the inputs of the neuron, weighted by the values of w ical method for determining the weights, which they ij are weighted and together with the bias values θ called the delta rule. This is essentially the least-i and the activation function f (·) determine the outputs of the neuron y squares method for optimisation. i. The artificial neuron is shown in Figure 1.2. 1963 Widrow and Smith [25] used the Adaline neural net-q work to stabilise an inverted pendulum. i Bias x1 1969 In their book ‘Perceptrons’ [13], Minsky and Papert showed the limitations of (single-layer) perceptron w11 x neural networks. They cannot be used, for example, 2 w12 to separate the elements that are not linearly separable. This means that a single-layer network can- . . . . . . y not be used to train to imitate the logical function 1 Inputs Output f(.) S ‘exclusive or’ (XOR). The authors have also demonstrated that the limitations of single-layer networks can be overcome with a two-layer network, but they w1n did not show how to set the weights of a multilayer network. x The publication of [13] marked the end of the era of n single-layer neural networks; the results led to a gap in research on neural networks. Figure 1.2: Representation of the basic element of a neural network - a neuron 1986 Rumelhart et al. [22] published a method for learning weights for a multi-level network. They called this learning method ‘backpropagation’ because it This formal form of the neuron was defined by McCulloch was interpreted as error propagation from the out-and Pitts [12], and the form has essentially remained un-puts of the neural network to the inputs. With this changed to the present day. The activation function can work, they enabled the use of the multilayer percep-have different forms. Similarly, neurons can be connected tron for the classification of functions that are not in different ways. Neural networks have evolved in many linearly separable or can represent an arbitrary non-directions and are classified in different ways. The most linearity. This finding led to a boom in the research common classification criteria are the training, the topol-and use of artificial neural networks. ogy, or the purpose of the neural networks. late 1980s Artificial neural networks began to be used for automatic control, e.g., [20]. Training of neural networks 1990 Narendra and Parthasarathy [14] used neural networks to identify and control dynamic systems. The parameters of a neural network are usually understood to be the weights and biases that must be deter-1995 Sjoeberg et al. showed [23] that neural networks mined by an optimisation procedure. This parameterisa-can be viewed from a systems perspective as iden-tion is called learning or training. There are three types 2 1.2 About artificial neural networks of learning or training: supervised learning, unsupervised learning and reinforcement learning. Critic Supervised learning starts with a set of input data and Reinforcement signal a target set of output data, and according to the chosen cost function, a neural network is trained to map between the given input and output data. The first methods were developed for single-layer neural networks. These are the Reinforcement perceptron learning rule (Hebb’s rule)[7]: learning System controller ∆wij = γyixj , where ∆wij is the weight change and γ is a constant that determines the learning rate, and the delta rule (Widrow-Hoff rule)[24]: Figure 1.3: Reinforcement learning schematic ∆wij = γx(di − yi)xj , where (d As new optimisation methods are developed, they are also i − yi) is the difference between the desired value and the actual output value. This rule essentially turns used for neural network learning. Stochastic optimisation out to be the least squares method of optimisation. methods, such as searching for parameters and structures using evolutionary methods (genetic algorithms and geThe oldest training method for multilayer networks is the netic programming), are also popular. backpropagation method. This is essentially the Delta rule, generalised for nonlinear problems. A closer look at Recently, deep learning [2] has been developed and has the method shows that it is a first-order gradient optimi-become a very popular training method. sation method, specifically the steepest descent method. Backpropagation is used to calculate the gradient of the cost function. This is applicable to all types of nonlinear Topology of neural networks systems but differs in computational complexity depending on the system. To improve its weaknesses, which are mainly slow convergence and the possibility that the op-According to their topology, neural networks are divided timisation becomes stuck in a local minimum, some im-into those with complete connectivity, local connectivity. proved versions have been developed, such as [10]: and layer-wise connectivity. For example, the Hopfield neural network is a neural network with perfect connec- • tivity. learning with momentum and • variable learning rate. The Hopfield neural network has a switching activation function or a linear function with saturation. Its charac-First-order gradient methods are known to be inefficient; teristic feature is a feedback loop from outputs to inputs. Newton optimisation methods, which belong to second-It is called ‘associative memory’ and is mainly interest-order gradient methods, are often used to solve nonlinear ing from a theoretical point of view. It is less useful in optimisation problems. Among the most commonly used practice. Boltzmann machines are a derivative. are the Newton optimisation methods with Boltzmann machines are Hopfield networks with hidden • Gauss-Newton modification and layers. Their parameters (i.e., the weights) are determined using a stochastic rule of weight variation, called ‘simu- • Levenberg-Marquardt modification. lated annealing’. Self-organised neural networks, unlike supervised networks, An example of a neural network with local connectivity use unsupervised learning. Self-organised networks are is the Kohonen neural network. It is also an example of trained from input data only. They are mainly used in a network that is mainly used for data clustering. Its pattern recognition for clustering, vector quantisation and main feature is that it maintains topological connectivity dimensionality reduction. A typical example is the Koho- (mapping) between data. nen network, which is also called the Self-Organising Map (SOM) [6]. An example of a neural network with multilayer mapping and connectivity is the multilayer perceptron, which will Reinforcement learning is an approach to dynamic pro-be discussed in greater detail later. gramming for training the neural network used as a feedback controller. This means that it has a specific applica-Another possible division is into feedforward and recur-tion. The learning consists of two phases: parameter es-rent neural networks (Figure 1.4). This division is based timation and parameter evaluation. It is shown schemat-on the direction of the signal or data flow through the net-ically in Figure 1.3. work. Feedforward networks mainly have direct flow from 3 Introduction to artificial neural networks with input-output data pairs (x, y) such that y = f (x). Neural networks are universal approximators [8]. This means that there is always a neural network with the appropriate dimensions that approximates the nonlinear objective function with arbitrary accuracy. It is often interpreted from the literature that neural networks have the ability to learn from examples. This essentially means that the parameters of the network are determined by optimisation with respect to the chosen cost function, for example: n 1 X E(k) = (dj(k) − yj(k))2 (1.3) 2 j=1 and with the chosen optimisation algorithm (learning rule). The neural network that learns in this way can use the learned association for prediction, which is often interpreted as the ability to generalise. There is no single definition of neural networks. There are Figure 1.4: Example of recurrent network topology only common properties that unite the different models. These common properties of neural networks are: input to output, while recurrent networks have full and local connectivity. Recurrent networks differ from feedfor- • a structure that consists of a large number of simple ward networks, where signals flow exclusively from input elements connected to each other, to output, in that recurrent networks also contain feedback • they usually have a variable topology, which means loops from the outputs. that the connections change or the values of the weights on the connections change or the number Purpose of neural networks of elements in the network changes, • the structure allows for parallel processing of infor-Neural networks can be divided coarsely or finely accord-mation, ing to their purpose. At a coarser level, neural networks can be divided into those for classification and those for • the information is processed according to the latent regression. (i.e., internal) states of the network and the inputs to the network. Classification is a process in which the data are sorted into a finite number of sets. This means that mapping There are many other common properties that distinguish is done in a quantised output space. Regression is a pro-neural networks from each other: cess in which the output values can take on any value. In regression, an input-output mapping is performed in a continuous output space. • the methodology of the processes, In this textbook, the focus is on using neural networks • the types of networks, and similar approaches to model dynamic systems and using these models to design automatic control. Therefore, • the learning rules, classification and special neural networks will not be dis- • the application domains, and cussed. The focus will be exclusively on regression, This means that supervised learning of mainly feedforward net- • other features. works will be examined. Artificial neural networks today Main properties of neural networks Neural networks have long ceased to be merely an area of A neural network is a nonlinear mapping from the input research. They have been a magnet for research since 1985. space Rn to the output space Rm of the data. Figure 1.5 shows the number of publications in the area f : Rn ⇒ Rm (1.2) of dynamic-systems control over the last century. Since 4 1.3 Multilayer perceptron 800 In the early days, researchers were mainly interested in understanding how biological systems work with the help 700 of mathematical models. There was a very strong con-600 nection between biological research and the research of artificial neural networks. Expectations were high. The 500 neural network was supposed to be the beginning of the artificial intelligence. As we have already said, it turned 400 out not to be so. On the one hand, it was a disappoint-300 Number of published papers ment, but on the other, a new technology that is very well suited for efficient methodologies for nonlinear mappings 200 had arrived. 100 One of the areas where neural networks are very useful is 0 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 the modelling of dynamic systems and their use for auto-Year matic control. The main application of neural networks is the identification of nonlinear dynamic systems, but there Figure 1.5: The growth of the number of publications - are also implementations in automatic control. We have artificial neural networks for control (Source: G.W. Ng: devoted a separate chapter to this. Application of Neural Networks to Adaptive Control of Nonlinear Systems)[16] The extensive literature on neural networks is also followed by software that can be used to work with neural networks. One of the better-known programs for system identifica-then, the number of publications has continued to increase. tion is based on the Matlab software package (e.g., [15] Moreover, it is a technology that is currently used in many and [17]). applications. Domains and applications of neural networks include: 1.3 Multilayer perceptron • aviation (autopilot, fault detection, etc.), • automotive systems (automatic steering, etc.), A multilayer perceptron, shown schematically in Figure 1.6, is described by the following equation. • banking (document recognition, etc.), Hidden • electronics (steering, pattern recognition, etc.), layer Input Output • language (recognition, etc.), layer layer • manufacturing ( guidance, prediction, etc.), • medicine (signal analysis, etc.), Inputs Outputs • accounting (various analyses, etc.), • robotics ( guidance, sensing, etc.), Weighted links • telecommunications (data compression, etc.), Neuron • transportation (diagnostic systems, etc.), • military industry (signal processing (radar, sonar, Figure 1.6: Multilayer perceptron etc.), • entertainment industry (animation, special effects, etc.), X X X y w w w )+w2 )+w3 ). • i = fi( ij fn( jk fm( klxl +w1 insurance (value optimisation, etc.), 0l 0k 0j j k l • (1.4) other. It is called multilayer because it consists of at least one hidden layer (the activation functions fn, fm) and an out-Have artificial neural networks fulfilled the expectations put layer fi. The multilayer perceptron is a universal ap-that the researchers had in the initial phase of develop-proximator [8], which is true if it has one or more hidden ment? layers. 5 Introduction to artificial neural networks f the desired approximation level. We use the Levenberg-1 f Marquardt modification of the Newtonian optimisation 1 method, or the Levenberg-Marquardt learning method for training (i.e., optimisation). The results of the optimisation are shown in Figures 1.8 to 1.10. Finally, we obtain 0 a neural network with five neurons in the hidden layer. I 0 I -1 Identification results (1 neuron in the hidden layer) -1 200 original NN model Sigmoid function Linear function 150 100 Figure 1.7: Two examples of activation functions 50 y 0 The most common activation functions (examples in Fig- -50 ure 1.7) are: -100 • sigmoid, -150 0 0.2 0.4 0.6 0.8 1 • hyperbolic tangent, x • linear, Figure 1.8: A multilayer perceptron with a neuron in the • Gaussian, hidden layer • signum function. Identification results (3 neurons in the hidden layer) 200 The activation function in the hidden layer is often the original NN model sigmoid function. 150 1 100 f (I) = , (1.5) 1 + e−aI 50 y where a, I ∈ R, while the most common output function is linear (Figure 1.7). 0 -50 The multilayer perceptron is a typical example of a forward neural network used for regression. Let us look at -100 two examples of how we can approximate a nonlinear func- -150 tion by a multilayer perceptron. 0 0.2 0.4 0.6 0.8 1 x Example of approximating a function of one vari-Figure 1.9: A multilayer perceptron with two neurons in able the hidden layer We approximate the function Example of approximation of a function of two y = 30 100(x − 0.6)2(x − 0.1)(x − 0.8) − 5e−5x variables π i + 0.9 sin(20x + ) (1.6) 4 Let the function where x is the input variable and y is the output variable, z(x, y) = x cos(2x) + y sin(2y) − 0.75 (1.7) by a multilayer perceptron with a hidden layer. The activation function of the hidden layer is a sigmoid function, where x and y are input variables and z is the output while the output is a linear function. variable and are approximated by a multilayer perceptron with a hidden layer. Again, the activation function of the We approximate the function in several steps by first op-hidden layer is a sigmoid function and that of the output timising the weights of the network with one neuron in layer is a linear function. The structure of the hidden the hidden layer, then with two, and thus increasing the layer again consists of one to five neurons. The results are number of neurons in the hidden layer until we reach shown in Figures 1.11 to 1.13. 6 1.4 Radial basis-function network Identification results (5 neurons in the hidden layer) 200 original NN model 0.6 150 0.4 0.2 100 0 −0.2 z 50 −0.4 y −0.6 0 −0.8 −1 −1.2 -50 1 0.5 1 0.5 -100 0 0 −0.5 −0.5 −1 −1 -150 y x 0 0.2 0.4 0.6 0.8 1 x Figure 1.13: A multilayer perceptron with five neurons in Figure 1.10: A multilayer perceptron with five neurons in the hidden layer. (target area is dark grey, learned area is the hidden layer white) 1.4 Radial basis-function network 1 The radial basis-function (RBF) is the second most com-0.5 monly used neural network. The network contains non-0 z linear transformations of the input data, which (when −0.5 weighted) sum to form the output of the network, as shown −1 in Figure 1.14. The neural network in Figure 1.14 imple- −1.5 1 Hidden 0.5 1 layer 0.5 0 0 −0.5 −0.5 −1 −1 y x Figure 1.11: A multilayer perceptron with a neuron in the Inputs Outputs hidden layer. (target area is dark grey, learned area is white) 0.6 0.4 0.2 Figure 1.14: Radial basis-function network 0 −0.2 z −0.4 −0.6 ments the approximation function f : Rn → R, which can −0.8 −1 be generalised to multiple outputs, described mathemati- −1.2 1 cally as follows: 0.5 1 0.5 0 0 N −0.5 −0.5 X −1 −1 y = f (x) = w y x igi(x, γi). (1.8) i=1 The function g is called the basis function and in the case Figure 1.12: A multilayer perceptron with two neurons in of this neural network it is a radial function, meaning the the hidden layer. (target area is dark grey, learned area is function where the output depends on a distance between white) inputs r. Possible radial functions are: • g(r) = r, a linear radial function, 7 Introduction to artificial neural networks • g(r) = r2, a quadratic function, State • g(r) = exp(−r2 ), a Gaussian function (Figure 1.15), ρ2 • g(r) = [1 + exp( r2 )−1], logistic function, ρ2 • g(r) = r2 log(r), various spline functions, • 1 g(r) = (r2 + ρ2) 2 , polyquadratic function function, • other radial functions. Input 1 0 . 9 Equilibrium curve 0 . 8 0 . 7 0 . 6 (r) 0 . 5 g Figure 1.16: Distribution of RBF network centres 0 . 4 0 . 3 y ( k + 1 ) 0 . 2 0 . 1 0 - 1 - 0 . 5 0 0 . 5 1 r x Figure 1.15: Gaussian basis function i For example, with multiple inputs, the Gaussian basis g i function would look like this: " 1 (x1 − γi1)2 (x2 − γi2)2 y ( k ) u ( k ) gi = exp − + + . . . 2 ρ2 ρ2 i1 i2 !# (xn − γin)2 + , (1.9) Figure 1.17: Specific RBF corresponding to the given non- ρ2in linearity where xj; j = 1, . . . , n are input data, γij; j = 1, . . . , n are centres of the radial function g Since the weights w i and ρij ; j = 1, . . . , n are i vary linearly with respect to the out-scaling factors of the radial function g put, it is not difficult to find their optimal values with i. Radial functions often lie in N points of the input data (e.g., γ respect to the minimum squared error. The RBF network i = xi i = 1 . . . n ≤ N ). Alternatively, their position and choice of contains additional parameters γ i and ρi, whose values radii ρ are generally unknown. If we optimise the values of these i are part of the optimisation process or learning. For an example of the distribution and an illustration of parameters in addition to the weights, the result is a non-how the radial function is constructed, see Figures 1.16 linear optimisation problem, like the optimisation of the and 1.17. multilayer perceptron was. A successful optimisation approach in this case is the orthogonal least squares (OLS) An important property of radial basis-function networks method. is their stability, which depends on the choice of radial basis functions. For example, since Gaussian basis functions have the property of approaching zero as they move away Example of a function of two variables from the centre, this means that the output of the neural network is also bounded whenever the weights have finite Let the function values. This BIBO (bounded-input-bounded-output) sta-z(x, y) = sin(0.4x3 − 1.6y2 + 0.5) (1.10) bility is so important that the basis functions of RBF networks are mainly Gaussian functions. RBF networks are be approximated by an RBF network. We choose the also universal approximators [19]. Gaussian function as the basis function. We optimise the 8 1.5 Neural networks for modelling nonlinear dynamic systems weights of the grid with a different number of basis functions of width ρ = 0.1, which is increased until the approximation of the given function z(x, y) is sufficient. The results are shown in Figures 1.18 to 1.21. 1 0.5 0 z −0.5 1 −1 0.5 −1.5 1 0.5 1 0 z 0.5 0 0 −0.5 −0.5 −0.5 −1 −1 y x −1 1 0.5 1 0.5 0 0 Figure 1.20: The network with eleven radial basis func- −0.5 −0.5 −1 tions (the target surface is dark grey, the learned surface −1 y x is white) Figure 1.18: The network with three radial basis functions (the target surface is dark grey, the learned surface is white) 1 0.5 0 z −0.5 1 −1 0.5 −1.5 1 0 0.5 1 z 0.5 0 −0.5 0 −0.5 −0.5 −1 −1 −1 y x −1.5 1 0.5 1 0.5 0 0 Figure 1.21: The network with thirty-one radial basis −0.5 −0.5 −1 −1 functions (the target area is dark grey, the learned sur-y x face is white) Figure 1.19: The network with five radial basis functions the internal states of the system. Internal states are usu- (the target surface is dark grey, the learned surface is ally the past values of the input and output signals or their white) derivatives (depending on whether it is a discrete-time or a continuous-time system). 1.5 Neural networks for modelling The usual representation of dynamic systems is that the memory (or derivative) is implemented outside the approx-nonlinear dynamic systems imator of the static function, in the present case the neural network. Consequently, we also bring in the delayed (or Neural networks, as described, are used to model static differentiated) values of the inputs and outputs. In dy-nonlinear functions. We optimise their weights to obtain namic systems, we usually consider the input/output be-the most appropriate relationship between the input and haviour in terms of signals or time series of data, in con-output data. When new data is given as input, the net-trast to static systems in which we deal with individual work representing the input/output mapping predicts the input and output values. When using static approxima-corresponding output values. tors, whether neural networks or other related methods, obtaining time sequences at the output means that the In dynamic systems, nonlinearities result in an output approximator performs its prediction separately at each value that depends not only on the input value but also on time instant or step. This means that at the selected time 9 Introduction to artificial neural networks instant k, the neural network is optimised to predict the Example of the approximation of a first-order dy-value of the dynamic system at the next time instant k + 1 namic system by an RBF network [3] given the values of the inputs consisting of the selected number of delayed values and a current value of the input A dynamic system can be described with the following signal u and the output signal y. This can be illustrated differential equation using the RBF network: by Equation (1.11) y(k + 1) = 0.2 tanh(y(k)) + sin(u(k)). (1.13) ˆ y(k + 1) = f (y(k), y(k − 1), . . . , u(k), . . .), (1.11) The input and output signals used for learning the neural whereˆdenotes the prediction and k is the number of sub-network are each represented by 2207 samples. The dy-sequent time instants. This is called one-step prediction. namic system is modelled with a neural network contain-We will discuss this in more detail in the following chap-ing 20 Gaussian basis functions with ρ = 0.3 randomly ters. distributed in a region where the values of the input and output signals take place. You will learn more about how When the behaviour of a dynamic system is evaluated, it to identify dynamic systems with a neural network in the is done through multi-step prediction or simulation, mean-following chapters. Let us take a look at how the neu-ing testing the behaviour of the network with the full sig-ral network models the nonlinearity of y(k + 1) at the nal, replacing the unknown output signal values with their inputs y(k) and u(k). The results are shown in Figure predictions. The idea of performing a simulation of a neu-1.23. At first glance, the figure shown on the right in Fig-ral network describing a dynamic system is illustrated in Figure 1.22 and can be represented by Equation (1.12) ˆ y(k + 1) = f (ˆ y(k), ˆ y(k − 1), . . . , u(k), . . .). (1.12) Optimising a neural network involves the supervised learning of the feedforward network. However, the model of a dynamic system illustrated by this neural network is actually a recurrent network because of the feedback connections. u(k) Figure 1.23: The nonlinear dynamic system (left figure) and its RBF network model (right figure) [6] z -1 u(k-1) . . . ure 1.23 appears very poor, but on closer inspection, it y(k) z -n Identified model turns out to be sufficient for the domain where the neural u(k-n) network learning data was available. The example is very illustrative and shows how important it is to be aware of . . . the limited possibilities of neural networks; it also shows the complexity of identifying dynamic systems with neural y(k-1) networks. -1 z y(k-2) -2 z . . . y(k-n) -n z Figure 1.22: Block diagram of the simulation of the identified nonlinear dynamic system 10 Bibliography [1] Artificial Neural Network. [14] K. S. Narendra, K. Parthasarathy (1990): Identifi-http://en.wikipedia.org/wiki/ cation and control of dynamical systems using neu-Artificial neural network ral networks, IEEE Transactions on Neural Networks, Vol. 1, No. 1, 4-27. [2] Y. Bengio, I. Goodfellow,A. Courville (2017). Deep learning Cambridge, MA, USA: MIT press. [15] Matlab (1993): Matlab. Mathworks Inc. [3] A. P. Englebrecht (2002): Computational intelli- [16] G. W. Ng (1997): Application of neural networks to gence, An introduction, Wiley and Sons, Chichester. adaptive control of nonlinear systems, UMIST Control Systems Centre Series, Vol. 4, Research Studies [4] I. Grabec, W. Sachse (1997): Synergetics of mea-Press, Manchester. surement, prediction and control, Springer Series in Synergetics, Springer-Verlag Berlin and Heidelberg [17] M. Norgaard, O. Ravn, N.L.L. Poulsen(2002): GmbH & Co. NNSYSID-toolbox for system identification with neural networks. Mathematical and computer modelling [5] G. Gregorčič (2004): Data-based modelling of nonlin-of dynamical systems, Vol. 8, No.1, 1-20. ear systems for control, PhD thesis, University College Cork, National University of Ireland, Cork. [18] J. Mikleš, M. Fikar (2007): Process modelling, identification, and control. Springer. [6] S. Haykin (1994): Neural networks: a comprehensive foundation, Macmillian Publishing Company, Inc. [19] J. Park , I. W. Sandberg (1991): Universal approximation using radial-basis-function networks, Neural [7] D. O. Hebb (1949): The organization of behaviour, Computation, Vol. 3, No. 2, 246-257. Wiley, New York. [20] D. Psaltis, A. Sideris, A. A. Yamamura (1988): A [8] K. Hornik (1989): Multilayer feedforward networks multilayer neural network controller, IEEE Control are universal approximators, Neural Networks, Vol. Systems Magazine, Vol. 8, No. 2, 17-21. 2, 359-366. [21] F. Rosenblatt (1959): Principles of neurodynamics, [9] I. Kononenko, M. Kukar (2007): Machine learning Spartan Books, New York. and data mining: introduction to principles and algorithms, Horwood Publishing, Chichester. [22] D. E. Rumelhart, J. L. McClelland (1986): Parallel distributed processing: explorations in the mi- [10] B. J. A. Kröse and P. P. van der Smagt (1994): An crostructure of cognition, The MIT press. introduction to neural networks, Seventh edition, The University of Amsterdam. [23] J. Sjöberg et al. (1995): Non-linear black-box modeling in system identification: a unified overview, Automatica, Vol. 31, No. 12, 1691-1724. [11] D. J. Mackay (2003): Information theory, inference and learning algorithms, Cambridge University Press. [24] B. Widrow, M. E. Hoff (1960): Adaptive switching circuits. In 1960 IRE WESCON Convention Record, [12] W. S. McCulloch, W. Pitts (1943): A logical calculus 96-104, DUNNO. of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 5, 115-133. [25] B. Widrow, F. W. Smith (1963): Pattern recognizing control systems, computer and information sciences [13] M. Minsky, S. Papert (1969): Perceptrons: an intro- (COINS) Symposium Proceedings, Spartan Books, duction to computational geometry, The MIT press. 351-357. 11 BIBLIOGRAPHY 12 Chapter 2 Identification of linear dynamic systems 2.1 Summary on linear system iden- the input and output signals of the system (Figure tification 2.1). The purpose of this chapter is to summarise the basics input output of linear system identification in order to continue with nonlinear system identification in the following chapter. model Details on the theory and practice of linear systems iden-u(t) y(t) tification can be found, for example, in [2], [3], [4], among others. System identification is an experimental determination of Figure 2.1: A schematic representation of a dynamic sys-the temporal behaviour of a system based on measured tem with input and output signals signals. The temporal behaviour is defined within a class of mathematical models and represents the identified pro-The theory and practice of identifying dynamic systems cess in such a way that the differences between the system is a combination of the results of scientific knowledge and and its mathematical model are as small as possible. engineering practice. In addition to the technical literature, there is also a considerable amount of identification The main task in system identification, which is a form software (e.g., [5]). of experimental system analysis, is to find an appropriate model structure that determines the class of models in System identification methods can be classified in many which the model is sought. Estimating the parameters of ways. Some of the classifications according to different this structure is the next step. A basic rule in system aspects include: identification is not to identify parts of the system that are already known. Prior knowledge about the system, • Class of mathematical models: whether it is physical knowledge or knowledge gained from experiments, must be used. One of the classifications of – nonparametric models, mathematical models by colour code is as follows: – parametric models. • • Class of signals used: ‘white-box’ model, the theoretical model obtained from the physical, chemical, etc. insight into the – continuous-time, discrete-time, processes in a system; – deterministic, random, pseudo-random. • ‘grey-box’ model, in which there is some prior knowl- • Error between process and model: edge, which takes the form of – output error, – the full or partial structure of the model, while the parameters have to be estimated from mea- – input error, surements; – equation error or error-in-variables. – the knowledge of the combinations of measured • Simultaneity of measurement and evaluation: signals or their relationships that can be used to support the modelling; – offline, • ‘black-box’ model, by which the structure can be – online. identified and the parameters estimated from the • Data processing: measured data, as there is no prior knowledge of the process. The system model is identified only from – nonrecursive, 13 Identification of linear dynamic systems ∗ direct, System identification methods are often classified accord- ∗ iterative, ing to model classes: – recursive. • nonparametric and • Structure of the models: • parametric models. – linear models, – nonlinear models. Nonparametric models are primarily models of linear systems. They describe the input/output behaviour of a pro-The complete procedure for system identification is de-cess in the form of value tables or curves. Typical represcribed, in the literature (e.g., [2], [4]). The basic steps of sentatives are: system identification can be summarised as follows: • frequency responses (Bode diagrams), 1. identification of the purpose of the model, • weighting functions or impulse responses, step re-2. collecting prior knowledge, sponses, 3. design of experiments, • Fourier analysis, frequency-response analysis, correlation analysis, spectral analysis. 4. carrying out experiments, 5. identification of the model, and Parametric models are used to model both linear and nonlinear dynamic systems. These are models with explicitly 6. evaluation of the identified model. expressed parameters. Most often, they are models expressed in the form of: The whole process is highly iterative and interactive. The steps in which the purpose of the model, the data mea- • differential equations, surements, and the model evaluation are verified are particularly important for the practice of engineering identi- • difference equations, fication. • transfer functions, When we experimentally model (identify) a system, the complexity of the problem strongly depends on whether • state-space equations. we approach it as a static (Equation (2.1)) or dynamic (Equations (2.2) and (2.3)) system. The dynamics of the In identifying a linear model of a system, the order of system increases the complexity of the problem. the linear model must be known. This is determined by the order of the equations describing the system and the • regressors. Regressors are the quantities on which the pre-Static system: diction of the model depends functionally. The parame-F [u(t), y(t)] = 0, (2.1) ters are estimated with optimisation, which uses the sum of the squares of the response errors (the least-squares where u(t) is input signal and y(t) is output signal. method) as the cost function, although other criteria are • Dynamic system [3]: also used, for example, the maximum likelihood of the responses (maximum-likelihood method). F [t, u(t), ˙ u(t), ü(t), . . . , u(m)(t), Linear systems can be represented by the following equa-y(t), ˙ y(t), ¨ y(t), . . . , y(n)(t)] = 0, (2.2) tion in the z-domain [2], [3], [4]: where ˙ u(t), ü(t), u(m)(t) are the first, the second, and B(z−1) C(z−1) the m-th derivatives, respectively, of the continuous-A(z−1)y(z) = u(z) + v(z), (2.3) F (z−1) D(z−1) time input signal u(t) and ˙ y(t), ¨ y(t), y(n)(t) are the first, the second and the n-th derivatives of the contin- where A, B, C, D, F are polynomials of the complex vari-uous-time output signal y(t), respectively. able z. y(z), u(z), v(z) are the transforms of the discrete-time output signal y(k), the input signal u(k) and the noise signal v(k). Depending on which regressors are chosen to F [k, u(k), u(k − 1), u(k − 2), . . . , u(k − m), describe the system, some of these polynomials are equal y(k), y(k − 1), y(k − 2), . . . , y(k − n)] = 0, to 0 or 1. where u(k − i); i = 1 . . . m are delayed samples of the The methods for system identification differ depending on discrete-time input signal u, and y(k − i); i = 1 . . . n the regressors used and/or the optimisation cost function. are delayed samples of the discrete-time output sig-In the continuation, the least-squares method is adopted nal y. for the optimisation: 14 2.1 Summary on linear system identification • finite-impulse-response (FIR) method (A = F = • autoregressive and moving average model with ex-D = 1, C = 0), i.e., the regressors are only delayed ogenous input (ARMAX) input signals, y(k) = a1y(k − 1) + a2y(k − 2) • autoregressive model with exogenous input (ARX) + b1u(k − 1) + b2u(k − 2) (F = C = D = 1), i.e., the regressors are delayed input and output signals, + v(k) + c1v(k − 1) + c2v(k − 2); • output-error method (OE) (A = C = D = 1), i.e., the regressors are delayed values of the input signal and estimates of the output signal (predictions from C ( z - 1 ) v ( k ) the past), A ( z - 1 ) • autoregressive and moving average model with exogenous input (ARMAX) (F = D = 1), where the regressors are delayed input, output and noise sig- - 1 y ( k ) B ( z ) u ( k ) y ( k ) nals, A ( z - 1 ) • the Box-Jenkins (BJ) method (A = 1), which has delayed input signals as regressors, delayed output-signal estimates, the prediction error, and the simu-Figure 2.3: Autoregressive and moving average model with lation error (when the output estimates are used for exogenous input (ARMAX) prediction), • • Output error model (OE) it is also possible to represent the linear system in a state space [3] where the linear system is written y(k) = a1[y(k − 1) − v(k − 1)] in the form : x(k) = Ax(k − 1) + Bu(k − 1), where x(k) = [x + a2[y(k − 2) − v(k − 2)] 1(k), . . . , xn(k)]T is the vector of state variables, u is a vector of input variables and A, B are + b1u(k − 1) + b2u(k − 2) matrices of constant elements, + v(k). (2.5) • other possible regressors. v ( k ) In addition to the methods described above, there are a large number of variants. The methods described differ mainly in the way they incorporate noise into the model. - 1 y ( k ) B ( z ) Noise is unavoidable in real systems. Let us look at some u ( k ) y ( k ) A ( z - 1 ) selected models of the second-order linear dynamic systems (Figures 2.2, 2.3 and 2.4): • autoregressive model with exogenous input (ARX) Figure 2.4: Output error model (OE) y(k) = a1y(k − 1) + a2y(k − 2) + b Each of the methods also has its own recursive versions 1u(k − 1) + b2u(k − 2) for online identification [2], [3], [4]. + v(k); (2.4) Example of the ARX model identification 1 v ( k ) A ( z - 1 ) Let us look at a simple example of the ARX model identification of the first-order system. The system we want to identify is described by the difference equation: - 1 y ( k ) B ( z ) u ( k ) y ( k ) y(k) = 0.9512y(k − 1) + 0.09754u(k − 1) (2.6) A ( z - 1 ) or the transfer function in the z-domain 0.09754z−1 Figure 2.2: Autoregressive model with exogenous input H(z) = (2.7) 1 − 0.9512z−1 (ARX) and is not disturbed by noise. 15 Identification of linear dynamic systems It is a first-order system and the prediction depends on a delayed value of the output and a delayed value of the input. Thus, the two regressors are: y(k − 1), u(k − 1). The structure of the model is given by the one-step-ahead 0.5 prediction equation, meaning that the prediction of values based on known values of the regressors in the previous time step: 0 y(k+1) y(k) = −a1y(k − 1) + b1u(k − 1). (2.8) −0.5 If we insert the sampled measured-signal values, we can 1 0.5 0.5 write Equation (2.8) in matrix form: 0 0 −0.5 y = Ψθ −1 −0.5 u(k) y(k)  y(2)   −y(1) u(1)  y(3) −y(2) u(2) a1   =   (2.9)  .    b . . . 1 . .. .. Figure 2.6: The plane showing the identified model and identifying data It should be noted that the order of the rows can also be permuted, as the best prediction is optimised based on the 2.2 Mechatronic-system identifica- individual values of the signal and not the whole signal. tion example The given system of equations is solved by the method of least squares with the following analytical solution: The purpose of this example is to illustrate the complete ˆ θ = [ΨT Ψ Ψ]−1ΨT y. (2.10) identification procedure for a linear dynamic system in practice and to highlight the problems encountered in this The parameters have been optimised for one-step-ahead procedure. prediction, as this is a property of model-based methods with equation errors, but they must be validated with a simulation (multi-step-ahead prediction) to see if the Description of the mechatronic system model satisfactorily represents the dynamics of the system. The dynamic process to be identified is an electromechani-Figures 2.5 and 2.6 present the data used and the result cal motor-generator laboratory assembly manufactured by of the parameter estimation. ELWE [1]. It is a device designed for experimental laboratory work and training in control design. A schematic representation of the device together with the data acquisition equipment can be found in Figure 2.7. A series-connected 100 W DC motor is mechanically cou-pled to a series-connected DC generator. The load of the generator is two 40 W (220 V) light bulbs. We are interested in the model of the system that has the voltage at the terminals of the motor as system input u and the voltage of the speed transducer as system output y. In order to control the system by a computer, there is a thyristor converter at the input of the system. The speed sensor outputs a voltage of 1 V at 1000 rpm, which is passed through the measuring amplifier of the transducer. The thyristor converter and the amplifier are part of the additional equipment of the motor-generator system from the same manufacturer. Figure 2.5: Input and output signal (left images), covering A PCI -20000 converter, manufactured by Burr-Brown, the area with the measured data (right images) contains, among other things, digital-to-analogue and analogue-to-digital converters required for the acquisition In this example, we have focused solely on the system and injection of signals for system identification. The identification method. In the following example, we will block diagram of the general measurement system for sys-show the whole identification process. tem identification is shown in Figure 2.8. The measured 16 2.2 Mechatronic-system identification example Static characteristic 8 7 6 5 A / D D / A [V] 4 Y P C I - 2 0 0 0 0 3 2 + - 1 U n 0 1 2 3 4 5 6 7 8 9 10 U [V] 2 x 4 0 W G M Figure 2.9: Measured static characteristic as the steady-state characteristic is quite linear around the Figure 2.7: Motor-generator laboratory setup operating point of interest (8.5 V). To model the system over the entire operating range, it would be necessary to create a nonlinear model or settle for a linear model of low PC computer accuracy. Note that in reality linear systems do not exist Computer or are extremely rare. To identify an unstable system, it generated Digital filter digital signal would have to be done it in a slightly different way [4]. D/A converter A/D converter Sampling time, input signal, preprocessing Analogue filter Input signal selection (anti-aliasing filter) Sample-and-Hold Meas. amplifier Different identification methods require different input signals. Common to all methods is that the input signals Actuator Process Sensor must sufficiently excite the system. This means that the input signals must contain a sufficient number of frequency components in the range of interest. This can be achieved by a large number of different experiments or by the vari-Figure 2.8: Measuring-system schematic ability of a single input signal. A good model is obtained only in the part of the frequency range that is excited by the input signal. There are many suitable and less suit-steady-state system characteristics are shown in Figure able input signals. It is not sufficient to use only one form 2.9. of the signal. An input signal must be constructed from signals that each excite a different frequency range. Note While the static model describes only the gain between that the choice of input signal is of paramount importance the input and output signals in steady state, the dynamic to the identification process. Usually, the response of the model describes not only the gain in steady state, but also process to several input signals is recorded and then the transient phenomena of the output variables that occur most appropriate combination of input signal and response when the input variable changes and causes the output is selected. variable to change. In our example, we have chosen a pseudo-random binary The speed of rotation of the system depends on the elec-signal (PRBS) as the input signal. This signal is an ap-tric current through the bulbs, which varies greatly when proximation of white noise. the bulbs are ignited. This can be observed in the static characteristic curve (Figure 2.9) as a voltage drop. The White noise [2] is a signal with statistically independent operating point at which the linearised model is operated values whose power spectral density is equal to a constant. is 8.5 V. The steady-state characteristic must be measured It is therefore quite suitable for system excitation because in order to linearise the static process, which, as can be it excites over the entire frequency range. However, it is seen from Figure 2.9, is not necessary in the present case, practically infeasible because its mean power is infinite [2]. 17 Identification of linear dynamic systems The white-noise approximation with PRBS is easy to con-tling time, which in our example is ≈ 0.2 s, and from here struct and has a limited amplitude at high power density. we choose the sampling time Ts = 0.02 s. The sampling The PRBS has amplitude +a and −a. Sign changes oc-time determines the frequency range in which the signal cur only in discrete time instants kλ, k = 1, 2, . . ., where can be observed (Shannon’s theorem): λ is the length of the time interval, also called duty cycle. The position of the zeros of the spectrum depends on the 1 fmax = = 25Hz duty cycle λ. Therefore, λ is determined by the frequency 2Ts range of interest for system identification (i.e., the esti- ωmax = 2πfmax = 157rad/s. (2.11) mated system dynamics). The frequency range of interest is iteratively identified until it is determined that the input The PRBS input signal was generated with a digital reg-signal excites the entire frequency range of interest. ister of seven bits and, since we chose a duty cycle of The choice of the amplitude a of the input signal should be λ = 5Ts = 0.1 s, we obtain 635 samples. The total mea-a compromise between the largest possible value due to the surement time was therefore 12.7 s. The first zero of the signal-to-noise ratio on the one hand and the safety and power spectrum of the input signal was at ω = 62.8 rad/s. economy of operation and nonlinearity of the system on The most useful subrange is up to the frequency at which the other. In the present example, nonlinearity is a highly the magnitude of the power spectrum drops to half of its influential factor, as can be seen from the static character-low frequency value, which in our case is at 31.4 rad/s, istics (Figure 2.9). We have chosen the amplitude a = ±1 because only in this frequency subrange is the system to V around the operating current (8.5 V) because we want be modelled satisfactorily excited. to remain in the range that is as linear as possible. The input signal and its magnitude spectrum density (i.e., the magnitude spectrum of the aperiodic signal) is shown If the choice of the input signal is restricted by various in Figure 2.11. The magnitude density spectrum of the factors (e.g., limited experimental possibilities), we can at least determine in which frequency range the given input signal excites the system and consequently in which fre-PRBS input signal in operating point 2 quency range our model is valid. 1 To validate the model, we have chosen a sequence of rect-0 angular pulses with a duration of 1 second and an amplitude of ±1, which satisfactorily excites the process in a -1 somewhat narrower frequency range. -2 0 2 4 6 8 10 12 14 time [s] Choice of sampling time Magnitude spectrum 1.5 The rule of thumb for choosing the sampling time [2] is that it should be 10 % of the settling time of the system 1 in response to a step signal. The step response with an amplitude of 1 V in the selected linear range is shown in 0.5 Figure 2.10. Using the figure, we can determine the set-00 20 40 60 80 100 120 140 160 frequency [rad/s] Step signal on input 9.5 Figure 2.11: Input signal and its magnitude spectrum density 9 input signal does not contain the component at 0 rad/s 8.5 in Figure 2.11 because the DC component (ω = 0) has 0 10 20 30 40 50 60 samples (Ts=0.01 s) been eliminated. This will be discussed later. We did not System response choose just one input signal. The signal described was cho-8.5 sen from ten signals with similar properties (PRBS signals with different lengths and duty cycles, (i.e., with different 8 magnitude spectrum densities)) for which we made measurements and selected the most appropriate one based on 7.5 model validation. The number of samples should be large, because the larger the number of samples, the better the 70 10 20 30 40 50 60 samples (Ts=0.01 s) result of the system identification. An offset voltage was added to the input signal generated with a computer so that its average value corresponds to an operating current Figure 2.10: Response to step change of 8.5 V. The response of the system at the operating point 18 2.2 Mechatronic-system identification example (i.e., without the DC component) is shown in Figure 2.12. only identify parameter values. The choice of the model of the system depends on the purpose of the model. In our example, we want a linear parametric model that will System response in the operating point be used for system simulation as accurately as possible 1 around the operating point. If the model has a large or-0.5 der, it describes the dynamics better, but it is also more 0 complex and has poles and zeros that converge. -0.5 -1 We have chosen from among the methods described in [2], which have already been mentioned here. In identi- -1.5 0 2 4 6 8 10 12 14 fying linear systems, we have to decide on the order of time [s] Magnitude spectrum the model. A first estimate of the model order can be 1.5 obtained by analysing the information matrix or by identifying a nonparametric model: the frequency response. 1 The information matrix [2] can be used as follows. The 0.5 value of the determinant of the information matrix is det( [ΨT Ψ Ψ]−1 ), where ΨTΨ is the information matrix, Ψ 0 N 0 20 40 60 80 100 120 140 160 is the matrix of regression vectors and N is the number of frequency [rad/s] regression vectors in the matrix Ψ Ψ. The determinant of the information matrix makes a value drop in the order of the Figure 2.12: Response and its magnitude spectrum density information matrix with respect to the order of the model, which already describes the system sufficiently well. The values of the determinant of the information matrix in An analogue filter can be implemented upstream of the our example were: for the first-order model 0.3194, for analogue-to-digital converter (Figure 2.8) and prevent fre-the second-order model 0.0012, for the third-order model quency aliasing [2]. In the present example, this was done 4.6501 · 10−7, for the fourth-order model 1.4037 · 10−10 and using a high-quality measurement amplifier that has a so on. The greatest loss of value is between the second and frequency characteristic of a low-pass filter. Due to the third-order models. The identified system is therefore of large bandwidth of the amplifier, the aliasing effect was the second order. not completely eliminated. However, this residual effect cannot be separated from the disturbance noise. The second way to determine the order of the model is to use a nonparametric system identification [2]. Since we Before the identification process, the recorded signals have chosen an aperiodic input signal, we can determine should be processed accordingly. We need to remove the the frequency response with several methods: Fourier anal-DC component from the signals that is due to operation at ysis, correlation analysis, or spectral analysis. All meth-the operating point. It leads to a jump in the amplitude ods should generally give the same results. All require density spectra in Figures 2.11 and 2.12. Also, if neces-an aperiodic input signal, while correlation analysis re-sary, the effects of the noise must be reduced, especially quires white noise as an input signal. As mentioned ear-in the estimation of the parameters (this will be discussed lier, PRBS is an approximation of white noise. later), to avoid biased results. This can be done by filtering, but care must be taken with the properties of the Fourier analysis is an otherwise straightforward method, chosen system identification method. We will therefore but it is very sensitive to noise. Fourier transforms are turn to the question of filtering once we have selected the calculated using a discrete or fast Fourier transform. identification method. The frequency response can also be calculated using spec-We can identify nonparametric models for which we do tral analysis. Like Fourier analysis, spectral analysis gives not need to make any assumptions about the structure very poor results in the presence of noise. The variance of the model in advance other than linearity, or we can of the correlation function estimates at large shifts in τ identify parametric models where we need to determine is large. The solution to both problems is smoothing, for the structure. In system identification, these modelling example, with the Hamming window [2]. methods complement each other and, as we will see in our example, we can derive the information for structure Smoothing is a filtering method with variable weights (i.e., identification from one model to another. Nonparametric frequency or time window) that can be used to remove models can also be used to evaluate parametric models. noise. We have to be very careful in choosing the width of the smoothing window, as it has to be narrow compared to the measurement time and wide enough to preserve the Model-structure selection essential information. The details of the smoothing can be found in [2]. Figure 2.13 shows the magnitude and The choice of model structure is very important in system phase frequency response in the frequency range of inter-identification because parameter estimation methods can est without smoothing, obtained by Fourier analysis. The 19 Identification of linear dynamic systems response would be the same if the spectral analysis were In our case, we chose the simplest model (i.e., the nonre-performed without smoothing. Also, Figure 2.13 shows cursive ARX model), which was optimised using the least-both responses of the spectral analysis: without smooth-squares method. This method gives biased results, apart ing and with smoothing using the Hamming window of 35 from a special form of noise filter. The bias can only be samples. reduced by a suitable filtering, which is found iteratively. Such methods are similar to other methods for parameter estimation in terms of bias (generalised least squares, Magnitude diagram 1 10 maximum likelihood). 0 10 The least-squares method is essentially about calculating |G(jw)| the solution to a given system of equations. When esti- -1 10 mating the parameters of an unknown system disturbed -2 10 by noise, we minimise the square of the difference between 0 1 2 10 10 10 frequency [rad/s] the output of the system to be identified and the model. Phase diagram 0 ] The equation of the perturbed process in the z-domain is [ -200 B(z−1) y(z) = z−du(z) + n(z), (2.12) A(z−1) Phase angle degrees -400 0 1 2 10 10 10 frequency [rad/s] where A(z−1) is the denominator of the transfer function in the z-domain, B(z−1) - the denominator of the transfer function in the z-domain, y - output signal, u - input signal Figure 2.13: Magnitude and phase response without and n - noise, smoothing (dashed curve) and with smoothing (full curve) correspondingly in the time domain y(k) = a Note that smoothing can blur peaks in the frequency re-1y(k − 1) − a2y(k − 2) − . . . − any(k − n) sponse. Nevertheless, the resulting frequency response is + b1u(k − d − 1) + . . . + bnu(k − d − n) satisfactory for obtaining an initial estimate of the order of + a1n(k − 1) − a2n(k − 2) − . . . − ann(k − n). the system. Using the frequency response shown in Figure (2.13) 2.13, we used the slope of the magnitude response and the phase response to confirm that the system can be mod-The vector notation of Equation (2.13) is elled as a second-order system. The slope of the phase response can also be used to estimate the dead time. y(k) = ψT (k)θ, (2.14) where When determining the order of the system model, we must θ = [a be aware that it is an iterative process and that a final 1, . . . , an, b1, . . . , bn] (2.15) decision can only be made in the validation phase. The is a vector of the parameters to be estimated. The solu-order of the system determined thus far can only be used tion of the least-squares problem, the derivation, and the as an initial value. conditions are described in detail in [2], is ˆ The selection of dead time, except when dealing with a θ = [ΨT Ψ Ψ]−1ΨT y, (2.16) first-order system, can be relatively tricky. A practical approach to this problem is to estimate the parameters at where y is the output signal vector and Ψ is the matrix different dead times and take the result where the model consisting of vectors ψ for the corresponding element val-fits the data best. In our example, we arrived at a dead ues of vector y. time of 0.04 s using the procedure described above. The re-Filtering is a method that can be used to obtain results sulting dead time was checked against the phase response. that are as unbiased as possible. The process described by Equation (2.12), can also be written as follows. The selection of the parameter-estimation method depends not only on the system to be identified, but also A(z−1)y(z) = B(z−1)z−du(z) + A(z−1)n(z). (2.17) on the form of the disturbance or noise. We always have to identify several models. With the validation procedure, The least-squares method gives unbiased results, when we test the models and decide which model fits best. It is A(z−1)n(z) = v(z) is a white noise, i.e., if we get n(z) very difficult to determine in advance which of the meth-from v(z) via the filter 1 . This condition is often not A(z−1) ods should be used for parameter estimation. We usually met in practice, and it was not met in the present exam-test several methods and decide again after validating the ple, so noise filtering is required. If we decide to filter, models. We have chosen one of the methods described in we must filter the input and output signals with the same the relevant literature [2, 4]. filter. 20 2.2 Mechatronic-system identification example We divide Equation (2.17) by the filter F (z−1) and obtain Prediction-error test y(z) u(z) A(z−1) A(z−1) = B(z−1)z−d + n(z). F (z−1) F (z−1) F (z−1) In the least-squares method, the prediction error should (2.18) be zero. The following tests are available: Unbiased identification results when the error of Equation (2.18) (i.e., its last part) is white noise. • The mean of the error signal e(t) should have normal Suppose n(z) is white noise. In this case, we obtain unbi-distribution with zero mean. From the left parts of ased results when A(z−1) = F (z−1). This means that we Figures 2.14 and 2.15, we see that the error satis-filter signals with an unknown denominator of the transfer fies this test but does so better for the model with function yet to be identified. filtered signals. In practice, of course, n(z) is not white noise. However, it Prediction error e(k) Error distribution can approximate white noise filtered by 1 . Therefore, 0.1 140 A(z−1) 0.08 120 we use a form of filter (e.g., the Butterworth filter), which 0.06 0.04 100 must be such that the last part of Equation (2.18) is as 0.02 80 0 close as possible to white noise, which is iteratively checked 60 -0.02 by evaluating the results. -0.04 40 -0.06 20 -0.08 Following the procedure described above, we have identi- -0.1 0 0 2 4 6 8 10 12 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 time [s] prediction error e(k) fied models of different orders. After validation, the steps of which are described in the following section, we chose a second-order model. This model had two options, which Figure 2.14: Prediction error e(k) between the responses we will consider below in order to select the most suitable of the transfer function ˆ Gpn and the system used for iden- one for our purpose. tification (left image), and the distribution of the error signal e(k) (right figure) If we do not filter the input and output signal, the transfer function of the model is ˆ 0.0281z−1 + 0.0165z−2 G Prediction error e(k) Error distribution 0.1 120 pn(z) = z−2. (2.19) 1 − 1.6379z−1 + 0.6890z−2 0.08 100 0.06 This is a second-order transfer function in the z-domain 0.04 80 0.02 with stable poles and phase nonminimum zeros and an 0 60 -0.02 additional delay of two samples (i.e., 0.04 seconds). 40 -0.04 -0.06 20 -0.08 After an iterative procedure in which the input and output -0.1 0 0 2 4 6 8 10 12 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 time [s] prediction error e(k) signals were filtered with 1 , we obtained a second- A(z−1) order model given by the transfer function Figure 2.15: Prediction error e(k) between the responses ˆ 0.0554z−1 − 0.00215z−2 Gpf (z) = z−2. (2.20) of the transfer function ˆ G 1 − 1.743z−1 + 0.7816z−2 pf and the system used for iden- tification (left image), and the distribution of of the error signal e(k) (right figure) Validation is used for model selection. The main elements of validation are described in the following section. • The distribution of the error signal e(k) must be normal with zero mean. This can be seen from the Model validation right-hand parts of Figures 2.14 and 2.15. The part of the system identification procedure that indi- • The error signal e(k) must be independent of the cates how good the model is and what needs to be changed past inputs (ϕeu(τ ) = 0) for τ < 0. For open loop (order, method, input signal, sampling time) is the most operation, the error must be independent of all in-important part of the procedure. This is the validation of puts. Figures 2.16 and 2.17 show the curves of the the model, or the evaluation of the model. It is carried correlation between the error and the input signal, out iteratively. In validation, we take into account our and we see that their value is small everywhere. observations and judgement and do not just rely on the partial results that the computer provides. Which model • The autocovariance function of the prediction-error of the system we choose depends not only on the fulfil-signal e(k) must be a delta impulse. From Figures ment of the requirements, but also on the purpose of the 2.18 and 2.19, we see that the the autocovariance model. To test the validity of the model, we will use sev-function has its highest value exactly at shift (or de-eral different procedures. We will show the results of the lay) τ =0 and is small at the other shifts. From this, validation for the selected models (2.19) and (2.20). we can deduce a similarity with the delta impulse. 21 Identification of linear dynamic systems Cross-correlation between e(k) and u(k) signals for which the system response is known. There are 0.5 two possibilities: 0 • The validation is done with the data we used to identify the model. This is called verification. • The validation is performed with the data we did not -0.5-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 use to identify the model, but which was collected at tau the same working point. This is called validation or cross-validation. This type of validation tells much Figure 2.16: Cross-correlation between the error e(k) and more about the validity of the model than validation the input signal u(k) for transfer function ˆ G with data that has been used to identify it. pn Cross-correlation between e(k) and u(k) Figures 2.20 and 2.21 show the input-output response re-0.5 sults for both data sets for the two transfer functions selected. Figure 2.20 shows a comparison of the time responses of the two models to the input signal used for iden-0 tification and the error between the measured response and the simulated response of the model ˆ Gpn and between the measured response and the simulated response of the -0.5-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 model ˆ Gpf for the data we used for identification. Fig-tau ure 2.21 shows a comparison of the time responses of the two models to a sequence of rectangular pulses and error Figure 2.17: Cross-correlation between the error e(k) and the input signal u(k) for transfer function ˆ G 1 pf 0.5 Autocorrelation of e(k) 1 0 -0.5 0.5 -1 0 2 4 6 8 10 12 14 0 time [s] -0.5 Simulation error -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0.2 tau 0.1 0 Figure 2.18: Autocorrelation function of the prediction- -0.1 error signal e(k) for transfer function ˆ Gpn -0.2 -0.3 0 2 4 6 8 10 12 14 Autocorrelation of e(k) 1 time [s] 0.5 Figure 2.20: Identification signal: upper figure – the measured system response (dotted curve), the model simula-0 tion response ˆ Gpn (dashed curve), the model simulation -0.5 response ˆ Gpf (solid curve); bottom figure – simulation- -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 tau response error of ˆ Gpn (dotted curve), simulation-response error of ˆ Gpf (dashed curve). Figure 2.19: Autocorrelation function of the prediction-between measurements and the simulated response of the error signal e(k) for transfer function ˆ Gpf model ˆ Gpn and between the measurements and the simulated response of the model ˆ Gpf for data not used for identification. From the figures, it can be seen that the For all the above tests, we can see from the figures that model errors of ˆ G both transfer functions satisfy the conditions, but the trans-pf are on average smaller than for the fer function obtained from filtered signals is better. model ˆ Gpn. From the responses in Figures 2.20 and 2.21, it can be Consistency of input/output behaviour seen that the transfer function identified from the filtered The model is validated by testing how it responds to input signals better describes the behaviour of the process. 22 2.2 Mechatronic-system identification example 1.5 Parameter meaningfulness, model reduction, co-1 variance matrix, parameter errors, model utility 0.5 0 The meaningfulness of the discrete-time model of the -0.5 continuous-time system can be tested by considering the -1 number of poles on the negative real axis in the z- domain. -1.5 2 3 4 5 6 7 8 The number of poles must not be odd. In our example, time [s] there are no poles on the negative real axis. Furthermore, Simulation error a model must agree with the identified system in terms of 0.3 stability. 0.2 0.1 The standard deviation of the estimated parameters is de-0 termined as the square root of the diagonal elements of the covariance matrix of the parameter error [2]. This is -0.1 a quantitative measure of the confidence in the parameter -0.2 0 2 4 6 8 10 12 14 estimates. The greater the covariance, the greater the de-time [s] viations, and the lower the reliability and validity of the model. Figure 2.21: Validation signal: upper figure – the mea-The covariance matrix is expressed as sured system response (dotted curve), the model simulation response ˆ Gpn (dashed curve), the model simulation cov[ˆ θ − θ] = σ2E{[ΨT Ψ Ψ]−1}, (2.21) e response ˆ Gpf (solid curve); bottom figure – simulation-response error of ˆ G where σ2 is the variance of the error of the model [2]. The pn (dotted curve), simulation-response e standard deviations of the individual parameters of the error of ˆ Gpf (dashed curve). transfer function ˆ Gpn are as follows: a1 = −1.6379; σ = 0.0117; Frequency consistency a2 = 0.6890; σ = 0.0112; The consistency between the frequency response obtained b1 = 0.0281; σ = 0.0013; from the Fourier analysis and the frequency response of the b2 = 0.0165; σ = 0.0017; transfer function ˆ Gpf is shown in Figure 2.22. The time responses must be accompanied by a fit of the frequency and for the transfer function ˆ Gpf , the standard deviations response and vice versa. are derived: a1 = −1.7430; σ = 0.0040; Magnitude diagram 101 a2 = 0.7817; σ = 0.0035; b1 = 0.0555; σ = 0.0013; 100 |G| b2 = 0.0022; σ = 0.0017. 10-1 10-2 100 101 It can be observed that the standard deviations of the frequency [rad/s] model parameters obtained from the filtered signals are Phase diagram 0 generally smaller than in the case of the model obtained -100 from unfiltered signals. From all the estimation results -200 shown, it can be concluded that the model obtained from -300 the filtered signals more accurately describes the behaviour phase angle [degrees] of the system we have identified. -400 100 101 frequency [rad/s] We know that we cannot derive the position of the poles in the continuous-time domain from the position of the poles in the discrete-time domain. Zeros at infinity of the Figure 2.22: Test of the frequency response of the trans-s-domain normally transfer to poles on the negative real fer function ˆ Gpf (response ˆ Gpf - solid curve, frequency axis, or at least near the zero coordinate of the z-domain. response for comparison - dashed curve) From all of this, we might conclude that a continuous-time system that we are identifying, assuming that it is In Figure 2.22, it can be seen that the negative magnitude of second order, has no finite zeros. This is confirmed by of the phase angle increases with frequency. This is due to Figure 2.22, which shows that the amplitude response at the dead time, which is also consistent with a model that high frequencies has a slope of 40 dB/decade. The identi-has a delay of 0.04 seconds. fied second-order model has a finite zero, so its response at 23 Identification of linear dynamic systems high frequencies does not optimally match the frequency If we want more accurate information about the model, es-response. If we repeat the identification procedure with a pecially about its behaviour at high frequencies, we would discrete-time model, it has a zero at z = 0, we obtain the need to perform additional experiments or repeat the iden-following model tification procedure. The required accuracy of the model primarily depends on the purpose of the model. This as- ˆ 0.04019z−1 G sessment depends on the user’s judgement. Models can be p(z) = z−2. (2.22) 1 − 1.7040z−1 + 0.7494z−2 used for a variety of purposes, such as designing control algorithms, simplifying complex systems, designing simu-The resulting frequency response (Figure 2.23) shows very lators, designing fault diagnosis algorithms, and similar. good agreement. The agreement with the simulation re-In our example, we have demonstrated the identification sponse (Figure 2.24) is not perfect but sufficient for most process from a practical point of view and obtained a linear control implementations. model describing the dynamics of the mechatronic system around the operating point. After its validation, the identified model is sufficiently accurate for the design of control 1.5 systems as well as for the simulation of the dynamics of 1 0.5 the mechatronic system, taking into account the standard 0 deviations of the parameters, as well as other purposes. -0.5 The final validation of the model would be through its ap- -1 plication for the design of control algorithms or for use as -1.5 a simulator. 2 3 4 5 6 7 8 time [s] Figure 2.23: Part of the measured system response (dashed curve) and the model simulation response ˆ Gp (solid curve) to a sequence of rectangular pulses Magnitude diagram 101 100 | |G 10-1 10-2 100 101 frequency [rad/s] Phase diagram 0 ]sere -100 ge[d le -200 gnae -300 sahp -400 100 101 frequency [rad/s] Figure 2.24: The test of frequency behaviour of the ˆ Gp transfer function (response ˆ Gp - solid curve, frequency re- sponse for comparison - dashed curve) 24 Bibliography [1] ELWE (1988): ELWE Educational systems, Experi- [4] L. Ljung (1987): System identification, Theory for mental manual U9, ELWE, Cremlingen. the user, Prentice-Hall, Englewood Cliffs, NJ. [2] R. Isermann, M. Muenchhof (2011): Identification of [5] L. Ljung (1992): System identification toolbox, user’s dynamic systems: an introduction with applications guide, The MathWorks, Inc., Natick, MA. (Vol. 85). Berlin: Springer. [6] J. Sjöberg et al. (1995): Non-linear black-box model- [3] J. Mikleš, M. Fikar (2007): Process modelling, iden-ing in system identification: a unified overview, Au-tification, and control. Springer. tomatica, Vol. 31, No. 12, 1691-1724. 25 BIBLIOGRAPHY 26 Chapter 3 Identification of nonlinear dynamic systems 3.1 General information on the iden- – estimate, identify = learn, train, tification of nonlinear systems – validate = generalise, – model structure = network, As mentioned in Chapter 1, researchers in the 1990s came – estimation data, identification data = learning to the realisation that training neural networks to behave set, training set, like nonlinear dynamic systems was, in fact, nonlinear sys- – validation data = generalisation set, test set, tem identification. With this understanding, it was possi- – overfitting = overtraining. ble to incorporate and take into account knowledge from system theory and practise [2]. In this chapter, we will look at the basic guidelines for the identification of non-Practical aspects of the identification of nonlinear linear dynamic systems as given by [4]. systems The main problem in system identification, as mentioned in the previous section, is to find a suitable model struc-The identification procedure must not and cannot be fully ture that can be used to describe the behaviour of a non-automated as it involves too many subjective decisions. linear system sufficiently well. Estimating the parameter For a successful procedure, not only knowledge about the values for the chosen structure is a minor problem in most identification procedure is necessary but also suitable soft-cases. The basic rule we must always follow is that we ware (e.g., [6], [7]) and, above all, input and output data do not identify parts of the system that we already know. that contain sufficient information about the dynamic be-This means that we have to use the knowledge about the haviour of the system to be identified. system that we already have. The identification of nonlinear dynamic systems is a much more comprehensive problem than the identification of lin-Model colour coding is also used for nonlinear systems. ear dynamic systems. The notion of nonlinearity encom-This coding denotes the background knowledge used for passes an infinite variety of different forms of nonlinearity modelling: the model is a white box in theoretical mod-and experience with one type of nonlinear dynamic system elling, a black box in system identification, and a grey box is generally not applicable to other types. It is therefore if the modelling is one of the various combinations of the advisable to gain practical experience of modelling a se-two. lected nonlinear process by simulating similar examples Commonly used experimental models of nonlinear dynamic with a computer. This is one way to use prior knowledge systems are artificial neural networks, fuzzy models or net-of the system to be identified. works of local models, models based on Gaussian process The problem of identifying nonlinear systems can be de-models, wavelet models, and many other models [3]. scribed as follows. The individual samples of the output If the problem of system identification is approached in signals of the system to be modelled can be represented a standard statistical way, it is also necessary to further as a function of the delayed samples of the input signal u harmonise the terminology. Neural networks and other and the output signal y disturbed by noise: similar methods have also been given their own terminoly(k) = g(y(k−1), y(k−2), . . . , u(k−1), u(k−2), . . .)+n(k), ogy to be used. Some of the most commonly used terms (3.1) and their equivalents are as follows. where k is the consecutive number of the sample defining a time instant. System identification means finding the • Terms (systems theory term = neural network or function g(·) machine learning term): ˆ y(k|θ) = g(ψ(k), θ), (3.2) 27 Identification of nonlinear dynamic systems whereas: • nonlinear Box-Jenkins (NBJ) model that has delayed input signals as regressors, delayed output signal estimates, the prediction error and the simu- ψ(k)= ψ(u(k − 1), u(k − 2), . . . , y(k − 1), y(k − 2), . . .) lation error (if the output estimates were used for the vector of regressors also regression vector, prediction) (u(k − i), ˆ y(k − i), ε(k − i), εu(k − i) = y(k − i) − ˆ y θ the vector of parameters, and u(k − i)); • it is also possible to represent the nonlinear system ˆ y the output of the model. in state space written in the form: x(k) = F(x(k − 1)u(k − 1)); The problem can be divided into two subproblems: • other possible regressors. • selection of regressors ψ(k), and Linear mappings • selection of the nonlinear mapping g(ψ). Nonlinear mappings are often represented as a weighted sum of k basis functions gk(ψ): Regressors X g(ψ(k), θ) = αkgk(ψ), (3.6) k If the structure is linear, the system is described in the although, as will be seen later, other forms of representa-following form tion are possible when modelling with Gaussian processes. B(z−1) C(z−1) For example, a well-known scalar example of a weighted A(z−1)y(z) = u(z) + v(z), (3.3) F (z−1) D(z−1) sum representation is the evolution of a function with a Fourier series. Typical examples of nonlinear mappings which can be written simply as that are represented by a weighted sum are ˆ y(k) = ψT (k)θ. (3.4) • wavelets, • nearest neighbours, As described in the previous chapter, models of linear sys- • B-splines, tems differ according to the regressors. • ARTIFICIAL NEURAL NETWORKS, Similarly, the models for nonlinear systems are named ac- – multilayer perceptron, cording to the different regressors ψ used to represent the – radial basis-function networks, nonlinearity – others, • FUZZY MODELS. ˆ y(k) = g(ψ(k), θ). (3.5) Let us look at the notations of some well-known basic func-Regressors ψ in different nonlinear models: tions: • neural network with sigmoidal activation function • nonlinear finite impulse response (NFIR) model, which means that the regressors only take delayed gk(ψ) = σ(βkψ + γk), input signals (u(k − i)); where β is the dilation factor and γ is the translation • factor; nonlinear autoregressive model with exogenous input (NARX), which means that the regressors are • radial basis-function network delayed input and output signals (u(k − i), y(k − i)); gk(ψ) = r(βk(ψ − γk)); • nonlinear output error model (NOE), which means • fuzzy model that the regressors take delayed values of the input X Y signal and estimates of the output signal (the pre-gk(ψ) = αj( µA(βk(ψ − γk))); dictions from the past)past) (u(k − i), ˆ y(k − i)); j k • nonlinear autoregressive and moving average model • recurrent network, with exogenous input (NARMAX), for which the re- ψ(k) = g(ψ(k − i), θ) gressors are delayed input, output and noise signals signals (u(k−i), y(k−i), ε(k−i) = y(k−i)− ˆ y(k−i)); • and many others. 28 3.1 General information on the identification of nonlinear systems Structure identification are usually the weights of the individual basis functions and, depending on the structure chosen, some other pa-First, we start with the systematic selection of regressors, rameters. This can be done with any known and suitable from the simplest possible combination to the more com-deterministic or stochastic optimisation method. Among plex ones: the deterministic methods, the Gauss-Newton algorithms are known to be efficient, while the first-order gradient methods are generally time-consuming. It is possible to • u(k) - static nonlinearity, estimate the parameters offline or with the so-called recur- • u(k − i) - NFIR, sive identification online. The optimisation criteria usually depend on the model error. • u(k − i), y(k − i) - NARX, • other. Model error When determining the order of the model or the number The optimisation criteria are different. They depend on of sample delays, it is useful to know Takens’ theorem the purpose of the model and the optimisation method. (its form for excited systems can be found in [9]), which The cost function used to write the optimisation criterion states how a model of a continuous dynamic system can be is expressed as a quantitative measure of the quality of reconstructed from sampled input and output signals. The the model. We usually want to achieve the smallest value sufficient condition for the reconstruction of the system is: of the cost function, which is usually a function of the error of the model. A commonly used example is the cost n > 2p, (3.7) function that depends on the square of the model error: where n is the order of the discrete model and p is the ¯ V (θ) = E ∥ y(k) − g(ψ(k), θ) ∥2 order of the original continuous system. In practice, it = σ2 + turns out that, depending on the system, a lower order n E ∥ g0(ψ(k)) − g(ψ(k), θ) ∥2, model is often sufficient: where g0(ψ(k)) is the original system and σ2 is the vari-n ance of the noise. p ≤ n ≤ 2p + 1. (3.8) When optimising the model with respect to the error square, there are generally three possible sources of error that af-Once we have chosen the model order and the regressors, fect the quality of the model: we start choosing the basis functions. There are different ways of choosing, both objective and subjective. There are no concrete rules because the basis functions we have • noise, listed are capable of representing the nonlinearity with • bias, which can be represented as follows any degree of accuracy. Some directions that might be considered are [4]: V = E ∥ g0(ψ(k)) − g(ψ(k), ˆ θ(m))) ∥2, where ˆ θ(m) is a vector of estimated values is the • Radial basis functions are chosen when dealing with parameter with the chosen length m, a small number of regressors (e.g., wavelets for a maximum of 3 regressors), • is the variance of the parameter values depending on the variance of the measurement noise σ2 • n ridge basis functions are chosen when we are dealing with a larger number of regressors (e.g., sigmoid dim {θ} V ≈ σ2 . n neural networks), N • fuzzy models are used when a priori heuristic knowl- The quality of the identified model depends on the infor-edge is available. mation content of the measured input/output data. The variance of the parameter values and the bias are corre-The cross-validation procedure is used to select the dif-lated as shown in Figure 3.1. ferent structural elements. Cross-validation is a method from statistics for evaluating models in which the data are Tips for carrying out identification divided into at least two parts. The basic form of cross-validation is k-fold cross-validation. We usually use k − 1 parts for training and the remaining part for validation. Given the complexity of the problem of identifying nonlin-The training and validation data must cross-over in suc-ear dynamic systems, it is difficult to suggest a fixed processive rounds. cedure. In general, the procedure follows the same steps as for the identification of linear systems, which were de-If we choose basis functions and thus a nonlinear model scribed in the previous section. However, we can give some structure, we must also estimate their parameters, which tips that may increase efficiency. 29 Identification of nonlinear dynamic systems Performance 8. Investigate the efficiency characteristics of the number of parameters, as the model should be as simple as possible but not too simple. 9. When choosing the sampling time, the same rule as for linear systems should apply, even though it is much more difficult to follow: the sampling time should be chosen in such a way that it captures the entire dynamics of the system to be modelled. 10. If you have an input signal to choose from, choose Variance Bias one that has an input/output data distribution and an amplitude distribution that is as rich as possible in the range of operation that is limited by the physical constraints of the process. An example of # parameters the distribution is shown in Figures 3.2 and 3.3. Figure 3.1: Bias-variance relationship Input/output pairs of data 0.5 1. Look at the measured input/output data. From this you can deduce whether the system is strongly non-0.4 linear and which ‘time constants’ (using the techni-0.3 cal term for linear systems) it has. 0.2 2. Try simple things first, i.e., simple structures and 0.1 y dimensions of models: linear models first and a small 0 number of regressors, basis functions and parameters -0.1 for the estimation. -0.2 3. Determine the physical background of the dynamic -0.3 process, as this can give you an idea for the choice -0.4-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 of regressors. u 4. Divide the data into those for identification and those for model validation. In the machine learning litera-Figure 3.2: Example of input/output data distribution ture, the division of data subsets is referred to in different ways, although the purpose is the same. The data for identification are split into data for param-120 eter estimation in nonlinear identification and data for monitoring performance in cross-validation. The 100 parameter estimation data in machine learning are called ‘training data’ and the performance monitor-80 ing data are called ‘validation data’, while the data used to validate the final model are called ‘test data’. It should be noted that this division is also found in 60 the literature describing system identification. 40 5. Normalise the data (transform the different data types into one size class) and set the mean of the 20 data equal to 0. 6. Monitor the bias/variance ratio throughout the op-0 -2 -1 0 1 2 3 timisation by cross-validation, indicating an unwise signal amplitudes increase in the number of parameters. 7. When modelling a system with a neural network, use Figure 3.3: Example of the amplitude distribution of the regularisation. It is described, for example, in [4]. input signal 30 3.1 General information on the identification of nonlinear systems Model validation Auto correlation function of simulation error 1 Model verification and validation is, as already stated, one 0.5 of the most important steps in modelling. The nonlin-0 earity of dynamic systems does not make the validation process any easier. -0.5 0 5 10 15 20 25 lag As seen and described by Equation (3.2), modelling is Cross correlation function of u1 and simulation errror 1 mainly for one-step prediction. The dynamic model sys-0.5 tem is most suitable for its purpose when it allows predictions over longer horizons and simulations. A typical 0 example of such a purpose is the use of the model for con- -0.5 trol design or system performance evaluation. This means -1 -25 -20 -15 -10 -5 0 5 10 15 20 25 that the model has to be evaluated for one-step-ahead pre-lag diction, but also for simulation. Figure 3.5: Test of prediction error • One-step-ahead prediction: ˆ y(k + 1) = g(y(k), y(k − 1), . . . , u(k), u(k − 1), . . .). Histogram 100 90 • Simulation: 80 ˆ y(k + 1) = g(ˆ y(k), ˆ y(k − 1), . . . , u(k), u(k − 1), . . .). 70 60 50 How the model is simulated is shown in Figure 1.22 in 40 Chapter 1. 30 The analysis of the consistency of the input/output be-20 haviour with the one-step-ahead prediction and simula-10 tion (Figure 3.4) is usually performed on the data used 0 for identification and on the data intended for validation -3 -2 -1 0 1 2 3 (test data). Model output (dashed) and simulated output (solid) Figure 3.6: Histogram of prediction errors 3 2 analysing the histogram of identification and validation 1 error (examples in Figures 3.5 and 3.6). 0 Other forms of statistical tests are estimates of the average prediction error. These include estimating the error with -1 various criterion functions, such as the mean square or the -2 Akaike criterion function. For more details on estimators with statistical tests, see [5]. -3 -4 Model reduction or pruning [5] is also part of validation. 0 100 200 300 400 500 time (samples) An example of a neural network after pruning can be found in Figure 3.7. The procedures for reduction depend on the structure and the modelling method. More on these Figure 3.4: Test of consistency of input/output behaviour procedures can be found in the literature describing the individual procedures (e.g., specific neural networks) and in various software. We will not go into detail about these In addition to the consistency test of input/output be-procedures; more details can be found in, for example, [1]. haviour, various statistical tests can be used, also for nonlinear systems. These include different correlations to test whether the prediction or simulation error is independent The most important test of the dynamic model, regardless of all input and output signals. An example is testing of its structure and whatever the method of modelling, is the simulation error with auto- and cross-correlation and always to test the model in terms of its purpose. The 31 Identification of nonlinear dynamic systems Network after having pruned weights 4 3 2 1 Figure 3.7: Example of a pruned artificial neural network Figure 3.8: 3D-representation of the nonlinearity of the structure system model is good enough if it is useful for the purpose for Input estimation signal 2 which it was developed. 1 This was only an overview of modelling. Better insight can 0 be found in the literature that describes neural network -1 identification in more detail (e.g., [5]). -2 0 20 40 60 80 100 Time [s] Response 2 Example of identification of nonlinear autoregres-1 sive model with exogenous input (NARX) 0 -1 We will illustrate the identification process with an ex- -2 0 20 40 60 80 100 ample of a first-order dynamic nonlinear system with an Time [s] artificial neural network. The mathematical model of the process is described by the following nonlinear differential equation: Figure 3.9: Identification signal and response of the system y(k) = y(k − 1) − 0.5 tanh(y(k − 1) + u3(k − 1)), (3.9) Input validation signal 2 where u is the input and y is the output of the system. 1 This is one of the discrete-time equivalents of the continuous-time system 0 ˙ y = − tanh(y + u3) (3.10) -1 -2 for a sampling time of 0.5 s. 0 20 40 60 80 100 Time [s] Response Since this is a first order system, the nonlinearity can be 2 represented graphically in three dimensions (Figure 3.8). 1 The system was excited with a random signal with variable 0 amplitude in the range between -1.2 and +1.2. The iden- -1 tification signal and the response of the system are shown -2 0 20 40 60 80 100 in Figure 3.9. The validation signal must be different from Time [s] the identification signal so that we can be certain that the model fits. A randomly varying signal in the same amplitude range as the identification signal is chosen. The Figure 3.10: Signal for validation and system response input signal for validation and the system response are shown in Figure 3.10. It is common for a signal to be divided into the identification part (the training and val-have a longer identification signal to contain as much in-idation set) and the validation part (the test set). The formation as possible about the dynamic behaviour of the length of the signals can be arbitrary, but it is useful to process. 32 3.1 General information on the identification of nonlinear systems The histograms of the amplitude distributions of the se-optimisation method, respectively. Following the cross-lected input signals are shown in Figure 3.11. It can be validation procedure, we chose a neural network with one seen that the amplitude distribution in the whole selected hidden layer and five neurons in it. The structure of the region is not perfect, but the signal for identification has network is shown schematically in Figure 3.13. a better amplitude distribution. How the identification Estimation input signal histogram Validation input signal histogram 30 35 30 25 25 20 20 15 15 10 10 2 5 5 0 0 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Amplitude Amplitude 1 Figure 3.11: Histogram of amplitude distributions of input signals for identification and validation input/output data are distributed according to the nonlinearity of the process is shown in Figure 3.12. For a denser distribution of samples, more samples are needed, which also means a longer signal. It is very important Figure 3.13: Network structure to check what input/output data are available for identification and validation so that what can be achieved by The parameters (weights) of the network were determined model identification can be estimated. If there are no data using the Levenberg-Marquardt method, which we chose for identification, the model cannot learn the system be-because of its high efficiency in nonlinear optimisation haviour. problems. The parameter values described by Equation (1.4) can be fitted with a hidden layer matrix/vector W1 = [wjk|w0k] and the matrix/vector of the output layer W2 = [wij|w0j]. When the optimisation was complete (i.e., the error was so small that the weights no longer changed noticeably), we obtained the following results:  −0.5588 −2.0621 −1.9530   0.5155 0.0499 −0.8670  W   1 =  −1.5149 0.3190 0.4768  ,    0.3366 −1.2029 1.8379  0.8411 1.3841 1.7123 W2 = 1.2054 1.7784 0.0810 1.1704 1.4048 −0.0580 . The weight matrix W2 has one more element than there are connections between the hidden and output layers, because it contains the values for the output offset or bias in Figure 3.12: Input/output data distribution according to the last position w0j. nonlinearity of the system to be identified Let us take a look at the model validation that must be performed in the identification process every time we re-In the next step, we will select the structure of the neural ceive a new model and want to evaluate its quality of fit. network and the regressors and estimate the parameters We run the identification process until the model is good of the network. enough for its final purpose. We will only show the results for the final selected model. In our case, the model served For the system identification we used the software to illustrate the identification process. NNSYSID Toolbox for Matlab [7]. We chose a neural network with a multilayer perceptron, but we could have Figure 3.14 shows how the neural network represents the chosen any other neural network or model that is a univer-nonlinearity of the system based on the identification data. sal approximator or can adequately represent this system. The prediction error between the nonlinearity of the sys-The regressors chosen were y(k − 1) and u(k − 1). The tem and the nonlinearity of the model (also called residu-structure selected was the NARX and the least-squares als) can be seen in Figure 3.15. 33 Identification of nonlinear dynamic systems which confirms the previous observation, and the cross-correlation between the prediction error and the input sig-4 4 2 2 nal (Figure 3.18), which shows a low correlation between 0 0 the error and the input signal. y(k+1) y(k+1) −2 −2 −4 −4 2 2 3 3 2 2 0 1 0 1 0 0 −1 −2 −1 −2 −2 −2 −3 y(k) −3 u(k) y(k) u(k) Residuals autocorrelation 1 Figure 3.14: One-step-ahead prediction: system (left), model (right) 0.5 0 -0.5 -30 -20 -10 0 10 20 30 Tau [s] Figure 3.17: Prediction-error autocorrelation Figure 3.15: Validation of prediction error Cross-correlation between e and u 1 If the model structure is suitable, the model response af-0.8 ter optimisation differs from the process response only by 0.6 white noise. The amplitude distribution of the prediction 0.4 error for the validation signal, which should be normal, is shown in Figure 3.16. It can be interpreted as a rather 0.2 narrow Gaussian curve. Another statistical tool that can 0 -0.2 -0.4 Error histogram 140 -0.6 120 -0.8 -1 100 -30 -20 -10 0 10 20 30 Tau [s] 80 60 Figure 3.18: Cross-correlation of the prediction error with the input signal 40 20 0 One of the more informative steps is usually a qualitative -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Error or visual check of the input/output behaviour by compar-ing the responses of the system and model simulation with the validation signal. In Figure 3.19, which shows such a Figure 3.16: Distribution of model prediction error on val-comparison, there is a clear correspondence between the idation signal two responses. Our goal of illustrating the identification process of a nonlinear process was satisfactorily achieved be used to check the statistical properties of the predic-with the obtained model, so we completed the highly iter-tion error is the autocorrelation of the error (Figure 3.17), ative process. 34 3.2 Example of the identification of a pH-neutralisation process tion of the chemical process is described by the system of Simulation response on validation signal 2 Equations (3.12) and the data given in Table 3.1. Instead 1.5 of performing the measurements on the inaccessible process, we used the simulated data to identify a surrogate 1 model. The theoretical model used contains several non-0.5 linearities, including the implicitly calculated and strongly nonlinear titration curve, which is a typical element of all 0 pH-neutralisation processes. -0.5 -1 Q 2 model Q 1 -1.5 s stem y -2 0 20 40 60 80 100 Q 1 e Q 3 Time [s] h 2 T 2 Figure 3.19: Consistency of input/output behaviour on validation signal h 1 T 1 3.2 Example of the identification p H Q of a pH-neutralisation process 4 In the illustrative example in the previous section, we demonstrated the procedure for identifying the system. Figure 3.20: Schematic of the pH-neutralisation process The demonstration was much easier because we identified a first order system by which the nonlinearity can be shown graphically. For higher order systems, this type of evaluation is more difficult. In the case of the identifica- ˙ x = f (x) + g(x)Q3 + p(x)Q2 tion of the pH neutralisation process, we will demonstrate c(x, y) = 0 the identification of a higher order system on a simulated system that reflects a realistic chemical process. (3.11) q1 q1 f (x) = (W (W A simplified schematic of the pH-neutralisation process, a1 − x1) b1 − x2) A1x3 A1x3 which is often used in the literature as a benchmark for 1 T various modelling and design methods, is shown in Fig- (q1 − Cv4(h1 + z)n) ure 3.20. The purpose of this process is to chemically A1 neutralise the input fluid. The process has three input 1 1 1 T flows: acid (Q g(x) = (Wa3 − x1) (Wb3 − x2) 1), input liquid (Q2) and base (Q3), which A1x3 A1x3 A1 are mixed in a vessel T1. Before mixing, the acid enters 1 1 1 T the vessel T2, which gives the system additional dynamics. p(x) = (Wa2 − x1) (Wb2 − x2) The flow of the acid and base is controlled by automatic A1x3 A1x3 A1 control valves, while the incoming liquid is measured only c(x, y) = 0 = x1 + 10y−14 − 10−y by a flow meter, in particular a rotameter. The measured 1 + 2 · 10y−pK2 output variable is the acidity, meaning the pH (pH) of the +x2 1 + 10pK1−y + 10y−pK2 mixture. Since the pH sensor is located at the outlet of pH = y (3.12) the tank T1, there is also some dead time in measuring the acidity. In this example, the input to the process is the flow rate of the base, not the flow rate of the input Figure 3.21 shows the simulation scheme for the pH neu-liquid. A more detailed description of the process can be tralisation process. The nonlinearity of the process can found in [8]. also be shown by plotting the responses of the system to the same input signal in different operating regions. One A theoretical dynamic model of the pH neutralisation pro-such example is a sequence of rectangular pulses with in-cess was derived from chemical equilibrium equations. The creasing amplitude. The response of our nonlinear system derived model also includes the dynamics of the sensors to this input signal is shown in Figure 3.23. The selected and valves as well as the dynamics of the hydraulics of the identification signal, the validation signal, and the system outlet flow. The modelling assumed ideal mixing of the responses to it are shown in Figure 3.24. A comparison liquids, their constant density, and the complete solubility of the signals in Figure 3.24 and the amplitude distribu-of the ions. The theoretical model of the pH neutralisa-tions in Figures 3.25 and 3.26 shows that the identification 35 Identification of nonlinear dynamic systems Table 3.1: Parameters of the pH-neutralisation process at Input signal 20 the operating point [8] 18 |q 16 2|=0.03 M NaHCO3 q2=0.55 ml/s |q 14 3|=0.003 M NaOH, q3=15.6 ml/s 0.0005 M NaHCO3 q1e=16.6 ml/s 12 A1=207 cm2 q4=32.8 ml/s 10 A Q3 [ml/s] 2=42 cm2 Wa1=3 · 10−3 M 8 z=11.5 cm Wb1=0 M 6 p=0.607 Wa2=-0.03 M 4 Ka1 = 4, 47 · 107 Wb2=0,03 M K 2 a2 = 5.62 · 10−11 Wa3 = 3.05 · 103 M K 0 w = 1.00 · 1014 Wb3 = 5.00 · 105 M 0 500 1000 1500 2000 2500 3000 3500 4000 ∆t=15 s h Time [s] 1=14 cm ∆tc=1 s h2=3 cm τpH =15 s Wa4 = 4.32 · 10−4 M τ Figure 3.22: Input signal to demonstrate the nonlinearity h=15 s Wb4 = 5.28 · 10−4 M τ of the system v =6 s pH=7.0 θ=10 s Output signal 10 9 Q 1 Q 1 h 1 S c o p e h 1 Q 2 p H _ v a l Q 2 8 p H Q 3 S c o p e p H R a n d o m S a t u r a t i o n p H p l a n t N u m b e r d i s c r e t e d e l a y t _ v a l u _ v a l 7 C l o c k 6 pH S c o p e u 5 1 F 1 1 / A 2 1 / s | u | s q r t ( u ) C v 1 Q 1 i n _ 1 1 / A 1 1 f i l t e r T v v h 2 1 / s h 1 h 1 o u t _ 1 2 F 1 Q 2 4 i n _ 2 f i l t e r T v v 1 z 4 | u | u ^ n v 4 C v 4 3 F 1 Q 3 i n _ 3 f i l t e r T v v 2 3 1 / u W a 1 1 / s 1 / A 1 M A T L A B M u x - l o g 1 0 ( u ) 2 F u n c t i o n W a T 1 0 500 1000 1500 2000 2500 3000 3500 4000 p H W a 2 [ H + ] Time [s] W a 3 y ( n ) = C x ( n ) + D u ( n ) F 1 2 x ( n + 1 ) = A x ( n ) + B u ( n ) p H W b 1 o u t _ 2 f i l t e r T p H 1 / s 1 / A 1 D i s c r e t e W b T 1 S t a t e - S p a c e 1 W b 2 Figure 3.23: Response of the nonlinear system (Ts = 25 W b 3 sec) Figure 3.21: Simulink simulation scheme of the pH- Estimation signal Validation signal 20 20 neutralisation process 15 15 10 10 signal is more dynamic and has a richer distribution of Q3 [ml/s] Q3 [ml/s] 5 5 amplitudes. This also generally means that the response to the identification signal contains a greater amount of 0 0 0 5000 10000 0 5000 10000 Time [s] Time [s] information about the dynamics of the chemical process. Response Response 10 10 As in the illustrative example in the previous section, we 8 8 used the same software and selected the multilayer per-6 pH 6 pH ceptron neural network for model identification. 4 4 Using an iterative procedure, we identified the regressors 2 2 0 5000 10000 0 5000 10000 that yielded the best model according to the cost func-Time [s] Time [s] tion and with the most favourable relationship between the quality of the model and the smaller number of regressors that determines the complexity of the model. We Figure 3.24: Identification signal and validation signal and chose a model with the following regressors: y(k −1), y(k − their response 2), y(k − 3), y(k − 4), u(k − 1), u(k − 2), u(k − 3), u(k − 4). For the structure, we chose the NARX model structure 36 3.2 Example of the identification of a pH-neutralisation process Estimation-signal histogram 30 25 8 7 20 6 15 5 4 10 3 5 2 1 0 0 2 4 6 8 10 12 14 16 18 20 Figure 3.25: Histogram of input signal amplitudes for Figure 3.27: Network structure identification By simulating the system and model responses to the val-Validation-signal histogram 80 idation signal in Figure 3.30, we can confirm what has already been done by the statistical tools in Figures 3.28 and 70 3.29. We have succeeded in approximating the dynamic 60 behaviour of the pH neutralisation process relatively well with a neural network. 50 40 Residuals autocorrelation 1 30 20 0.8 10 0.6 00 2 4 6 8 10 12 14 16 18 20 0.4 0.2 Figure 3.26: Histogram of input signal amplitudes for validation 0 -0.2 and a hidden layer with 10 neurons using cross-validation. 100 200 300 400 500 600 700 800 900 1000 Tau [s] Note that the number of hidden layers (usually one for a so-called shallow network), the number of neurons, and the regressors are not chosen sequentially, but simultaneously. Figure 3.28: Output error autocorrelation This is a cyclical process (i.e., cross-validation) in which the individual elements are changed and the cost function is validated: for the NARX model, this is the sum of the squares of the error between the process response and the model response. The parameters were optimised using the Levenberg-Marquardt method. After completing the optimisation, we also used the pruning method built into the software to remove superfluous weights, of which there were very few. A schematic of the network structure is shown in Figure 3.27. Figures 3.28 and 3.29 show the autocorrelation of the prediction error (residuals) between the process and model responses to the signal for validation and the cross-correlation between the prediction error of the responses and the input signal. In particular, the model is evaluated by simulation even if it is essentially identified for one-step prediction. 37 Identification of nonlinear dynamic systems Cross correlation between e and u 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1000 -800 -600 -400 -200 0 200 400 600 800 1000 Tau [s] Figure 3.29: Cross-correlation of input signal and output error Original response and pruned network response 10 pH model 9 8 7 6 Q3 [ml/s] 5 4 3 20 2000 4000 6000 8000 10000 Time [s] Figure 3.30: Consistency of input-output behaviour on validation signal 38 Bibliography [1] A. P. Englebrecht (2002): Computational intelli- [6] Deep learning toolbox for Matlab gence, An introduction, Wiley and Sons, Chichester. http://www.mathworks.com/products/ deep-learning.html [2] M. Gevers (2006): A personal view of the development of system identification: A 30-year journey [7] The NNSYSID toolbox for use with Matlab through an exciting field, IEEE Control System Mag-https://www.mathworks.com/matlabcentral/ azine, Vol. 26, No. 6, 93-105. fileexchange/87-nnsysid in [3] G. B. Giannakis, E. Serpedin (2001): A bibliography M. Nørgaard (1995): Neural network based sys- on nonlinear systems identification, Signal Process-tem identification toolbox user’s guide, Technical ing, Vol. 81, No. 3, 533-580. Report 95-E-773, Institute of Automation, Technical [4] J. Sjöberg et al. (1995): Non-linear black-box model-University of Denmark, Lyngby. ing in system identification: a unified overview, Au- [8] M. A. Henson and D. E. Seborg (1994): Adaptive tomatica, Vol. 31, No. 12, 1691-1724. nonlinear control of a ph neutralization process, IEEE [5] M. Nørgaard, O. Ravn, N. K. Poulsen, L. K. Hansen Trans. Control System Technology, Vol. 2, No. 3, 169- (2000): Neural networks for modelling and control of 183. dynamic systems, Springer, London. [9] J. Stark, D. S. Broomhead, M. E. Davies, J. Huke (2003): Delay embeddings of forced systems: II Stochastic forcing, Journal of Nonlinear Science, Vol. 13, No. 6, 519-577. 39 BIBLIOGRAPHY 40 Chapter 4 Control with artificial neural networks 4.1 Neural networks in control sys- · neural network model for controllers that tems do not need a process model but only a prediction of the controlled variable, – based on the control objective; Neural networks are, at their core, experimentally-derived ∗ neural network is a model of the controlled mappings from input data to output data, from which the variable (Figure 4.5), physical or other backgrounds of these mappings are not ∗ neural network is a model of the reference apparent. As already noted, they are representatives of system, black-box models. You should also bear this in mind when considering their role in the development of the automatic ∗ neural network is the model of cost func-control of dynamic systems. Their role is multifaceted, tion or performance index, although not as much as the role of theoretical models of · neural network helps to optimise con-dynamic systems. Neural networks are used as models of ventional controllers, whole systems or only subsystems, as substitutes for cer- • neural network as a supervisory aid: tain nonlinearities, for control, as a model of the controller or the inverse of the system, and similar [5]. – signal combinations; A systematic breakdown of the various uses of neural net- • neural network as a control implementation tool: works in the design and implementation of automatic con- – neural network maps unknowns in the controller; trol systems can be found in a review article [1], which also contains references to literature describing various appli- ∗ variable controller parameters modelled by cations. We will not go into detail but only list the applica-a neural network, tions described in [1]. The breakdown of the applications ∗ control with inverse model of part of the is shown in Figure 4.1. process, ∗ predictive control (the neural network pro-The applications are divided into the use of neural net-vides a model of the system to be con- works as a tool for control design and as part of a controlled), troller. – the neural network represents the solution of the implicit control law (Figure 4.6); Neural network only as a tool ∗ the neural network replaces the solving of computationally intensive operations (e.g., solving the Riccati equation). The roles of the neural network are: • Neural network as a modelling aid: Neural network as controller – is based on an indirect target; ∗ neural network identifies unknown parts in The roles of the neural network are: the model (Figure 4.2), · for control with feedback linearisation • learning neural network based on u: (Figure 4.3), · – neural network mimics human operator; for control with instantaneous linearisation ( Figure 4.4), ∗ mapping the actions of the human opera- ∗ neural network predicts the controlled vari-tor, able, – neural network mimics other controller; 41 Control with artificial neural networks NN in control NN only as aid NN as controller Control Training Modelling Supervis Training law based aid ory based on implem. aid on u goal aid Based on Based on Map Solves Mimic Mimic a Use Train on Train derived control unknowns implicit human model controller open-loop on goal goal in contrl. contrl. law operator data simulation plant Identifies Predicts Controlled Performance Gradient- Gradient- unknowns controlled variable index based free in model variable model model training training Search Analytical Numerical Conventional gradient gradient search using local tendencies Plant Ad-hoc Error derivative plant conversion from model derivative Figure 4.1: Classification of artificial-neural-network u Plant (ANN) uses in control [1] y ∗ Derived goal: mapping the characteristics of conventional Analytical -y-y^ gradient controllers (Figure 4.7), -properties of y y^ – neural network uses open loop data (Figure 4.8); Parent model: Parameter, ∗ Selected inputs: adaptive control with the inverse model, -time series Neural nonlinearity -y -input-output -past u -state-space network - derivatives y -f(.)+g(.)u • learning based on the control goal: -past y -linearised -neural network – learning a neural network by simulation on a model; Model-based controller – learning on a plant; ∗ learning based on the gradient of the cost Figure 4.2: Neural network identifies unknown parts in function (numerical, analytical), the model [1] ∗ learning without cost-function gradient (search algorithms). 42 4.2 Predictive control u(k-1) Derived performance function r(k) (w(k)-f(.))/g(.) Plant y(k) Analytic (discrepancy in fulfilling the gradient implicit control law) z-2 z-1 Neural network as : Implicit control law: z-1 z-2 z-1 z-1 z-2 z-1 -Riccati eq. in LQR Control -PDEs in non-LQR input -optimal control -receding-horizon control ANN-2 ANN-1 -one-step inverse g(.) f(.) -gain synthesis in pole placement Figure 4.3: Control with feedback linearisation Figure 4.6: The neural network solves implicit control law [1] Controller Conventional controller Plant or model Linearisation (computation intensive) -fast dynamics parameters -optimal -limited computing -multimodel power -receding-horizon -one-step inverse -variable-structure -minimum-variance -pole-placement Analytical Controller Plant -fuzzy gradient -variable-tuning PID Selected inputs -desired y Suboptimality due to: -y NN -interpolation - past y,u controller -constraints Figure 4.4: Control with instantaneous linearisation -x, final x -initial x -approx. analytical gradient Control goal: -numerical gradient - one-step Figure 4.7: Imitating the characteristics of conventional -gradient-free training - long-term controllers [1] -free selection u1 Plant -random perturb. u can switch or selected between: model inputs Neural u Controller Another 2 network Plant u y -u1 y ^ controller -u2 -u Analytical 3 NN controller u -u +u 2 3 in application gradient Figure 4.5: Controlled-variable model based on control Selected inputs NN -y -u goal [1] controller - y - u in training -past y,u Frequently used control methods Figure 4.8: The neural network uses open-loop data [1] In the literature review describing the use of neural networks to control dynamic systems, we note that the most detail in the next subsection, in particular at how neural commonly encountered applications are: networks can be used in predictive control. • various forms of predictive control (which are quite useful in engineering practice), and 4.2 Predictive control • various forms of adaptive control (which are somewhat less accepted in engineering practice). Basic concept of predictive control Because of their applicability, especially for various black-The description of the basic concept is taken from [4]. box models, we will examine predictive control in more Model-Based Predictive Control (MPC, MBPC) is one of 43 Control with artificial neural networks the few advanced control methods that have gained ac- • the computational complexity of the methods can ceptance in industrial practice, especially in the field of become problematic when controlling faster systems. process control. The reasons for this can be found in the simple and easy-to-understand principle of the method, The basic principles of predictive control are reflected in with the simple integration of constraints on the control the following steps (Figure 4.9): and controlled variables, the simple design and tuning of both univariable and multivariable control for linear and nonlinear systems. Predictive control is based on the explicit use of a process model to predict the future output of the process. The control signal is based on minimis-u ing a cost function of the difference between the predicted output and the reference trajectory for a given horizon in the future. The field of predictive control includes many w different algorithms with similar control principles and different models and minimisation mechanisms of cost functions derived from a model (e.g., Model Algorithmic Con-r y trol (MAC), Dynamic Matrix Control (DMC), Generalised Predictive Control (GPC), Predictive Functional Control (PFC), Unified Predictive Control (UPC), etc.). The general notation when using nonlinear models is Nonlinear Model-based Predictive Control (NMPC). Regarding the control design, the main advantages of us-k k+N k+N k+N 1 u 2 ing predictive control methods are the following: past present future • it is suitable for the control of systems with more Figure 4.9: Basic concept of predictive control. w - set-complex dynamics, point trajectory, r - reference trajectory, u - control signal, • it is suitable for systems possessing dead time or ˆy - model response, N1, N2 - start and end of prediction minimum phase, horizon, Nu - control horizon • its general concept allows the control of both univariable and multivariable processes, • Prediction of the output signal of the system based • it allows feedforward compensation of measurable on the system model. disturbances, At each time instant k, we calculate the trajectory of the output signal y(k +j) for the horizon in the fu- • captures system constraints in the design process, ture j = (N1, . . . , N2). N1 and N2 denote the lower • can incorporate size constraints and rate of change and upper values of the prediction horizon, which constraints, determines the coincidence horizon within which we want the output signal to conform to the prescribed • considerable freedom in design, with design param-behaviour. The predicted values of the output sig-eters considered as control specifications, nal of the system, expressed in terms of the system model, denoted ˆ y(k + j | k), represent the j-step • a predictive controller can form the control signal in prediction of the model. The predicted values de-advance if the future setpoint curve is known, pend on the future control scenario u(k + j | k); j = • the methods do not include explicit signal derivation 0, . . . , Nu−1, which we want to use from time instant so that measurement noise does not cause problems, k onwards. • the methods do not contain explicit integration so • Creation of a reference trajectory. that there is no problem of integral wind-up, By defining a reference trajectory r(k + j | k); j = N1, . . . , N2, we determine the desired time course of • the principle is easy to understand. the system from the current value y(k) to the given setpoint value w(k). The two main disadvantages of using predictive methods • Determination of future control signal. are: The vector of the future control signal u(k + j | k); j = 0, . . . , Nu − 1; Nu ≤ N2 is calculated by • a good model of the system to be controlled is nec-minimising the corresponding cost function by which essary because the quality of the control depends we minimise the error between the r(k + j | k) and directly on the quality of the model, ˆ y(k+j | k). The determination of the future signal is 44 4.2 Predictive control based on the application of the open loop optimality Different predictive algorithms with a nonlinear model can criterion in a certain interval in the future. be divided according to the way the nonlinear control problem is solved: • Use of the first element of the control signal vector to control the system. Only the first element u(k | k) • The direct nonlinear approach writes the control of the optimal vector of the control signal u(k + j | problem in the form of a nonlinear programming k); j = 0, . . . , Nu − 1 is used. problem and solves it using iterative optimisation methods. This approach follows directly from the At the next time instant, when we have a new measure-idea of using a nonlinear model in prediction. How-ment of system output, we repeat the whole process. This ever, due to the solution of the nonconvex optimisa-principle is called ‘the receding-horizon strategy’. Many tion problem with additional constraints, it is com-different solutions are proposed for each step, which makes putationally intensive. the different predictive control methods different from each other. • The linearisation approach is more complex due to the simplification of the optimisation problem and The basic scheme of a closed-loop predictive control sys-the use of linear predictive control algorithms for tem is shown in Figure 4.10. While the basic model may the linearised system. The linearisation methods used are feedback linearisation or inverse nonlinear n mapping, online model linearisation at the operating + w Reference r Optimisation u y Plant point, using a set of local linear models, and some generator + + algorithm _ + _ other methods. + y u y Model Model _ All predictive design methods assume knowledge of the trajectory of the setpoint signal in the future, but this assumption is not always met. Usually, we specify how the system output approaches the setpoint signal, which Figure 4.10: Common concept of proposed predictive con-is determined by forming the reference trajectory r(k + j | trol principles k); j = N1, . . . , N2. The reference trajectory can be considered as the internal reference of the predictive con-already include a model of measurable disturbances, the troller, which determines the desired closed-loop behaviour feedback loop is incorporated into the algorithm to elim-and from which the vector of the future control signal is inate one-step-ahead model prediction errors, due to un-determined. When forming a reference trajectory in rela-modeled dynamics and other (nonmeasurable) disturbances tion to the knowledge of the setpoint signal, two situations in the system. The error is expressed as the difference be-are distinguished: tween the system output and the model output at the time of sampling: 1. The setpoint signal w(k + j) is known in advance e(k) = y(k) − ˆ y(k). (4.1) for all j = 1, . . . , N2 . Due to the principle of pre-When predicting the future output of a system, the error dictive control, which predicts future behaviour, the e(k) is added to the model prediction ˆ y(k + j | k), for each predictive controller initiates the appropriate control j = N action before the setpoint signal changes. This com-1, . . . , N2. Assuming a constant error e(k) for the entire prediction horizon, we compensate for the errors of pensates for dead times and large time delays in the the steady-state model and the constant disturbances in system. In robotic systems, tracking systems, batch the system. The prediction algorithm is based on the de-processes, and similar, the setpoint signal may be viations between the model output and the measurements known in advance. and estimates the future effects of unmeasurable distur-2. The setpoint signal w(k+j) is not known in advance. bances acting on the process. Ideally, the model matches As the best possible prediction of the setpoint signal, the process and no disturbances are present. In this case, we take its current value w(k + j) = w(k). the feedback loop is not functional and the control is an example of open-loop optimal control. Given the starting point of the reference trajectory, we To predict the output of the system, the use of the sys-distinguish two situations: tem model is essential. It can be used to calculate the prediction of the output signal several steps into the fu-1. We use the current measured value of the output ture. In principle, we can use any linear or nonlinear signal r(k) = y(k). In this way, we introduce an ad-model. This includes both theoretical and experimen-ditional feedback loop into the system, the effect of tal models. Examples of such models of systems include which on the overall closed-loop system is difficult to neural networks, fuzzy models, Gaussian-process models, evaluate because it is not the result of optimising the among others. The models used are usually, but not ex-cost function. In some cases, it may even destabilise clusively, written in the discrete-time domain. the control loop. 45 Control with artificial neural networks 2. We use the current value of the reference trajectory as a result. In practice, we always choose to structure the r(k). vector of the control signal by introducing relationships between the elements of the vector. Such a decision also The open-loop-optimal calculation of the system’s response increases the robustness of the predictive control. Due to at a certain interval in the future is the main feature of pre-the principle of the receding horizon and the use of only dictive control. In this respect, there are similarities with the first element of the vector of the control signal, the the linear-quadratic (LQ) controller [4], but the LQ con-actual control signal is not limited by the introduction troller uses an infinitely long prediction horizon. Choosing of structuring. The most commonly used techniques for the appropriate cost function is the first step in deter-structuring the control signal [4] are: mining the vector of the future control signal u(k + j | k); j = 0, . . . , Nu − 1. Here we tend to choose the cost • The introduction of a control horizon after the tran-functions that are easy to compute and whose minimum sient response assumes a constant control signal as-can be found analytically or by optimisation algorithms. suming a constant setpoint signal in the future. Con-Since the desired behaviour of the system in the future is trol horizon N determined by the reference trajectory, the logical choice u; Nu ≤ N2 represents the time from which the control signal will remain constant. This of the cost function is the difference between the predicted reduces the number of optimisation variables, specif-response of the system and the reference trajectory, for ex-ically the elements of the vector of the control signal. ample: The simplest and most frequently used value in prac-N2 X J = (r(k + j) − ˆ y(k + j))2. (4.2) tice is Nu = 1, which also provides good results with a varying setpoint signal. j=N1 The parameters N1 and N2 determine the fitting hori- • The clustering technique assuming Nu = N2 divides zon in which we want the predicted output of the system the entire prediction horizon into a fixed number of to match the reference trajectory as closely as possible. segments within which it assumes a constant control By increasing the parameter N1, we omit the influence signal. of control errors in the near future, especially for phase-minimal or dead-time systems, resulting in a more steady • The basis function representation algorithm struc-and smoother control. The extended form of the cost func-tures the control signal as a linear combination of tion is described by equation: independent, predetermined basis functions (linear, polynomial, etc.). The choice of basis functions de-N2 X J = α pends on the system and the setpoint signal. The j (r(k + j) − ˆ y(k + j))2. (4.3) calculation of the control signal is thus reduced to j=N1 the calculation of the selected parameters of the ba-Weight vector α = [αN , . . . , α ] can be used to further sis functions. 1 N2 influence the significance of the errors at each time instant in the prediction horizon. It is also common to include a measure of the variation in the control signal in the cost After selecting the form of the cost function and the struc-function. Through this measure, we seek to reduce the ture of the vector of the future input signal, control de-variation of the control signal by the cost of increasing the sign is followed by the determination of the values of the deviation between the predicted output of the system and elements of this vector. The solution is found by deter-the reference trajectory: mining the minimum of the convex optimisation problem when linear models are used, without using constraints. N2 Nu The solution can often be determined analytically, mean-X X J = (r(k + j) − ˆ y(k + j))2 + β(∆u(k + j))2. (4.4) ing that the output of the process depends linearly on the j=N1 j=0 past values of the inputs and outputs of the system, or by The above cost function contains the vector of changes one-step optimisation methods, for example, the method in the control signal u(k). In many methods, we use the of least squares. Imposing constraints on the minimisation optimal vector of variations of the control signal ∆u(k + of the cost function using linear models requires finding the j | k); j = 0, . . . , N minimum of the cost function using iterative optimisation u − 1 instead of the absolute values u(k + j|k); j = 0, . . . , N algorithms. The optimisation problem remains convex in u − 1. this case. The use of nonlinear models usually results in The predicted future signal ˆ y(k + j|k); j = 1, . . . , N2 de- the optimisation problem not being convex; consequently, pends on the predicted vector of the control signal in the time-consuming and costly iterative algorithms have to be future u(k + j|k); j = 0, . . . , Nu − 1; Nu ≤ N2. In gen-used. Convergence to the true solution is no longer guar-eral, the elements of the vector are arbitrary and inde-anteed within the prescribed time or number of optimi-pendent of each other, which increases the computational sation iterations. The additional inclusion of constraints complexity and also the time consumption of the optimisa-for system variables in the chosen optimisation algorithm tion enormously by increasing Nu. The control signal can can even lead to the minimum of the cost function being also become rich in unwanted high-frequency components indeterminate with respect to the constraints (e.g., in the 46 4.2 Predictive control case of unpredictably large disturbances). Both possibil-Step 1 - identification of the system in the operating re-ities are unacceptable for the implementation of nonlin-gion region: Since this is the same system that we ear predictive control in real systems. Several approaches have already identified in Chapter 3, here only the have been proposed in the literature when such a situa-most important features are summarised: tion occurs [4]. Such optimisation methods, for example, interior point methods [4], are concerned with taking into • regressors: y(k − 1), u(k − 1), account time constraints and constraints on the result in • structure: NARX model, each optimisation step and providing at least an approxi- • a hidden layer with five neurons, multilayer per-mate solution. If necessary, the optimisation process may ceptron, be terminated prematurely due to time constraints or the optimisation process may not find a solution to the opti- • Levenberg-Marquardt optimisation method. misation problem. Model validation was also shown in Chapter 3. This An interesting possibility is to use ‘soft’ constraints, by time, we only consider the contour plot of the one-which we assume that the constraints represent limits that step ahead error predictions in Figure 4.11, which should not be exceeded. Using the structure of the cost shows the operating range in which the model is use-function, we introduce a mechanism that allows this to be ful. done in an emergency. To do this, it is necessary to define two sets of constraints: Residuals 3 1. constraints imposed for physical, safety and similar 2 reasons (e.g., the range and rate of change of the control variable are limited by the actuator used) 1 and must not be softened, 0 y(k) 2. other constraints that may be softened (e.g., in the cases in which we may violate the limits that define -1 acceptable product quality). -2 Soft constraints are implemented by adding an additional -3 -3 -2 -1 0 1 2 3 soft constraint function to the criterion used, which is not u(k) zero only in the case where the constraints are violated. In this way, the optimisation algorithm only violates constraints when an emergency situation occurs (e.g., in the Figure 4.11: Response-error-validation (one-step-ahead case of unforeseen large disturbances [4]). prediction) Again, only the first element u(k|k) of the optimal vector of the control signal u(k + j|k); j = 0, . . . , Nu − 1 is al-Step 2 - Design of a predictive control: ways implemented. Due to the receding horizon strategy and the finite length of the prediction, the response of the • We have selected Predictive Functional Control control system generally does not correspond to the opti- (PFC) as one of the simplest forms of MPC [4]. mal open-loop response based on which the optimal vector A feature of most PFC methods is the reduction of the control signal was determined. of the prediction horizon to a few points. In the case of a single point, the prediction horizon (also called the coincidence horizon) is called the coincidence point. A more detailed descrip-Illustrative example of control with a pre-tion of the method used PFC can be found, for dictive controller example, in [4] and [3]. • The PFC cost function we use is: The following is an example of control with a predictive controller that uses a neural network for prediction. J = min [r(k + P ) − ˆ y(k + P )]2 . (4.5) U(k) The controller is defined for a system mathematically described by the nonlinear difference equation (3.10): • As random points of coincidence, we have experimentally determined the value of eight sam-y(k) = y(k − 1) − 0.5 tanh(y(k − 1) + u3(k − 1)), ples in advance. The reference trajectory was chosen so that the value of the control signal ap-where u is the input and y is the output of the system. proaches the reference value exponentially with The design process can be summarised in three steps: a time constant of 10 samples. 47 Control with artificial neural networks • Some important properties to consider when developing a predictive control: – Computational complexity, – Robustness of closed-loop control to differ-Closed-loop response (full line) and set-point (dotted line) 4 ences between the controlled system and its model, 2 – consideration of the constraints of the dif-0 ferent variables we are dealing with. -2 0 20 40 60 80 100 Step 3 - Evaluation of the designed control system: the Time [s] Control signal response of the closed-loop control system and the 2 corresponding control signal are shown in Figure 4.12. 1 From this, it can be seen that the predictive con-0 trol agrees with the reference curve trajectory, as -1 long as the closed-loop system operates in the range in which the model describes the controlled system -2 0 20 40 60 80 100 Time [s] well. However, at the last step change, the closed-loop system falls outside the range of the good description of the model, resulting in a response that does not correspond to the specified requirements. Figure 4.12: Response of predictive control based on neu-This can be avoided by constraining the control to ral network model the range restricted to the range in which the model is sufficient. 48 Bibliography [1] M. Agarwal (1997): A systematic classification of [4] J. M. Maciejowski (2002): Predictive control with neural-network-based control, IEEE Control Systems constraints, Pearson Education Limited, Harlow. Magazine, Vol. 17, No. 2, 75-93. [5] M. Nørgaard, O. Ravn, N. K. Poulsen, L. K. Hansen [2] W. S. Levine (1996): The control handbook, CRC (2000): Neural networks for modelling and control of Press, Boca Raton, Fl. dynamic systems, Springer, London. [3] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, B. [6] M. Nørgaard (1995): Neural network based control Likar (2003): Predictive control with Gaussian pro-system design toolkit user’s guide, Technical Report cess models. In The IEEE Region 8 EUROCON 2003. 96-E-830, Institute of Automation, Technical Univer-Computer as a Tool. (Vol. 1, pp. 352-356). IEEE. sity of Denmark,Lyngby. https://www.mathworks.com/matlabcentral /fileexchange/86-nnctrl 49 BIBLIOGRAPHY 50 Chapter 5 Local-model networks and blended multiple-model systems ‘Multiple-model systems’ is a common term for various modelling and control design methods for nonlinear sys-x w z w 1 w 0 1 w 0 1 1 w w tems composed of less complex subsystems. They are x 2 2 S z 2 y z 2 S w m w h called different names depending on the field of research x z m h from which they originate. The best-known multiple-model systems are fuzzy Takagi-Sugeno models, blended multiple-model systems, multiple-model switching systems, Markov Figure 5.1: Multilayer perceptron mixtures of experts, among others. An overview of multiple-model modelling systems can be found in [15], and some later developments in the special issue of a journal dedi-gressors: cated to the topic [7].   m 1 X (xj − γij)2 z − , (5.2) Since we have dealt with neural networks in the previ-i = exp   2 ρ2 j=1 ij ous chapters, let us now start from the point of view of neural networks, from which can look at multiple-model where ρij is the scaling factor of the radial function. systems as networks of local models, which we will tenta-tively describe in the following chapter. The description is the same for other forms of multiple-model systems. d x z w 1 1 w c 1 z w 2 y x | . | z 2 2 S w h x z m h Hierarchical networks Figure 5.2: Radial basis-function networks As we have seen, neural networks can be divided into those with a ridge structure and those with a radial structure. A typical representative of a neural network with a These centres are more or less evenly distributed across ridge structure is the multilayer perceptron (Figure 5.1), the area of interest (Figure 1.16). They are typically also in which the nonlinearity of the modelled system is ap-located in the operating points of the nonlinear system, proximated by a ridge basis function: where there usually is a lot of measurement data available to identify the system.   m X zi = fi w , (5.1)  ij xj + w0j  j=1 Local-model networks and blended multiple-model systems where xj is the jth input, wij is the weight at node ij, w0j is the bias for the jth input, fi is a ridge basis function for The centres of the radial basis functions are usually se-the ith output, zi is the ith output. lected as the individual points defined by the input regressors, which are the centres of the measured data sets. Representative for neural networks with radial structures These data sets can be replaced by local models obtained are radial basis-function networks (Figure 5.2), where the from these data. In this way, the amount of data for nonlinear hyperplane of the modelled system is approx-modelling can be significantly reduced without sacrificing imated by differently weighted basis-function centres γ, information content. Instead of weighing the data cen-specifically by points scattered over the space of the re-tres to obtain the best fit of the modelled nonlinearity, 51 Local-model networks and blended multiple-model systems we weigh the local sub-models and combine them into a y ref + global model of the modelled nonlinearity. This structure Controller 1 + Weighting u y Plant - function is called a ‘Local-Model Network’ (LMN). We optimise the parameters, usually the weights, in a similar way to other Controller 2 Model 1 neural networks. Local models, which can be obtained in any way, are combined in different ways to form a global Model 2 Controller N model. We can weigh the outputs of the individual local models or, if the local models have the same structure, Controller bank Model N we can weigh the parameters of the local models into a single structure. The simple principle of a network with Model bank local-model network is shown in Figure 5.3. Input Figure 5.4: Block diagram illustrating a closed-loop adaptive controller in the form of blended local controllers Model Supervisor/ no. 1 scheduler whose parameters vary according to the variation of the local system models Weight Model no. 2 (Figure 5.5) are those to which the nonlinear dynamic sys-Output Weight tem tends. In the time domain, these points often represent constant equilibrium states where a nonlinear system Weight can settle itself. The operating points are usually, but not Model no. N State Figure 5.3: A principle of local-model network with weights on local-model outputs Dynamic systems If the described principle works well for describing static Input systems, care must be taken when modelling dynamic systems. Equilibrium A typical approach for both modelling and designing the curve control of dynamic systems is the so-called ‘divide and conquer’ approach. This means that the problem (a nonlinear dynamic system being modelled or the global non-Figure 5.5: Model of a nonlinear system as a family of linear controller being designed) is usually divided into linear systems; linearisations only in equilibrium points linear subproblems, because linear methods are well established. We then solve the individual linear subproblems, necessarily, the equilibrium points. which are valid only in a small domain, and combine them into a global nonlinear solution. Figure 5.4 shows a possi-The question arises as to how we divide the entire nonlin-ble implementation of control based on a network of local ear domain of a nonlinear system into subdomains. There models. are several possibilities. They are systematically described in [15], and a description of the solutions can also be found Linearisation of the nonlinear system is the core of the in [2], [18], [14]. modelling method described. The usual type of linearisation is the first-order Taylor expansion around the oper-One possible approach is to evenly divide the domain of ating point or identification around this point. We should action or nonlinearity being modelled. This is useful for be aware that such an approach is only valid in the neigh-problems of lower complexity and is illustrated in Figures bourhood of the chosen operating point. 5.6 and 5.7. The standard method for modelling with local-model net-It is also possible to divide the workspace into work sub-works is to treat linearised models around a representa-spaces according to the operating regimes of the nonlinear tive number of equilibrium points. The equilibrium points system. This is particularly useful when modelling, for 52 Local-model networks and blended multiple-model systems y ( k + 1 ) y ( k + 1 ) State y ( k ) y ( k ) u ( k ) u ( k ) y ( k + 1 ) y ( k + 1 ) Input Equilibrium y ( k ) y ( k ) curve u ( k ) u ( k ) Figure 5.6: Modelling a nonlinear dynamic system with Figure 5.8: Division of the operating region of a nonlin-uniformly distributed local linear models ear system into operating sub-regions according to similar operating characteristics y(k+1) Operating regime 3 Operating regime 1 Operating Operating regime 4 Operating space y(k) regime 2 Operating point u(k) Figure 5.7: Modelling a nonlinear dynamic system with Figure 5.9: Alternative partition of the operational region uniform distributed local linear models - detail from Fig-of the nonlinear system ure 5.6 (x0, u0) from a known nonlinear system, one does not ob-example, an industrial system that has different operating tain a linear system but an affine system that is linear in regimes and shows different dynamic behaviours in each parameters. of them. Again, this approach is only useful for a limited number of applications and is illustrated in Figures 5.8 and 5.9. ˙x = F(x, u) is linearised into Various ways of modelling local-model networks are de- ˙x = F(x0, u0) + ∇xF(x, u)(x − x0)+ scribed in addition to the publications already mentioned ∇uF(x, u)(u − u0) + higher order terms in [1], [4] with the software described in [5], the control applications in [6], [17], [19] and many others. A constant element defining the operating point of the system (F(x0, u0)) means that the superposition condition is not satisfied because the system is not linear, even if refor-Topics of interest mulated to look linear. This constant element can be very large, which means that the linearised model cannot be a When linearising a nonlinear system ˙x = F(x, u), where linear system with a small constant disturbance. When x is the vector of system states and u is the vector of we move from one working point to the next, the constant inputs, with the Taylor expansion at the operating point element changes. This means that it is connected to the 53 Local-model networks and blended multiple-model systems dynamics of the system; therefore, it cannot be treated as longer reflect it [8], [9], [16]. In contrast, if the local models an external disturbance or even neglected. are optimised in the second way (i.e., locally), they represent the local dynamics and are thus more transparent and This is just one of the problems that arise when modelling useful for the analysis and design of the control system. nonlinear dynamic systems with a local-model network. However, at the same time, local models are only valid for the domain in which they have been identified, and this is Another problem is the description of the dynamics of usually small. Consequently, some areas of system opera system in nonequilibrium regions [8],[16], which results ation may remain unmodelled, leading to the problem of from the ‘tendency’ of states of stable dynamic systems to modelling regions away from the equilibrium curve. converge to equilibrium regions. The system is thus only briefly in regions away from the equilibrium curve (Figure The problem of some regions not being modelled can also 5.10) and only a small amount of data exists to describe be avoided by using a smaller vector of scheduling variables these regions. The problem would not be so great if the [8]. Scheduling variables are those that determine the po-systems could be arbitrarily excited in practice to obtain sition of a local model in the nonlinear hyperspace. The information describing the entire operating region. How-smaller the vector of scheduling variables or the shorter the ever, due to the nature of the system and other constraints scheduling vector, the larger the range of validity of the on the excitation signal, the design of the experiment is local models (Figure 5.11), but there is a problem with the usually limited. blending of the local models in areas ‘far’ from the equilibrium curve. The blending can become uneven or even y ( k + 1 ) discontinuous (Figures 5.12 and 5.13), which can also lead to unstable control loops when used for control design [3]. y ( k + y 1 ( ) k + 1 ) M i M i y ( k ) u ( k ) y ( k ) u ( k ) y ( k ) u ( k ) Figure 5.11: The interpretation of the vector of scheduling variables Figure 5.10: Switching from one local model to another The next problem is to optimise the parameters that determine the selected dynamics to be described by the LMN. Two approaches exist [16]: • The first option is to describe the global dynamics of the unknown system. In this case, we optimise all the parameters of the local models that make up the LMN jointly and at once using all the available training data. • Another option is to describe the local dynamics by optimising the parameters of the individual local models of the LMN. We use only the data describing the relevant domain. Figure 5.12: Original system [2] The optimisation process of the global dynamics usually leads to a globally better description of the system, as the free parameters of the local models of nonequilibrium 5.1 Velocity-based linearisation regions can be adjusted to increase the validity of the local models over a wider range. The implication of better global behaviour is the loss of information about the lo-In the previous section, we mentioned that the Taylor cal dynamics, since the parameters of the local models no expansion is not an adequate method to obtain a suit-54 5.1 Velocity-based linearisation State Lines of constant r 3 1 2 1 0 0 −1 −1 y(k+1) y(k+1) −2 −2 −3 5 5 5 5 0 0 0 0 −5 −5 −5 y(k) −5 u(k) y(k) u(k) Figure 5.13: Model with a reduced and complete vector of scheduling variables [2] Input Equilibrium curve able linear partial model. Better suited is its derivative: velocity-based linearisation. The velocity-based linearisation is a generalisation of the usual linearisation around the operating point: the local linear system belongs to each operating point of the whole Figure 5.15: The effect of velocity-based linearisation out space of the operation of the nonlinear system (Figure of equilibrium 5.14) and not only to the equilibrium points. The theory of velocity-based linearisation is discussed in [10] and [11], and the demonstration software can be found in [13]. ˙ y = G(x, u), (5.3) State where x ∈ Rn, u ∈ Rm, y is the vector of outputs, and F, G are matrices of function dependences. This system can equivalently be described in the form in which the linear and nonlinear parts have been separated: ˙x = Ax + Bu + f (ρ) y = Cx + Du + g(ρ) ρ = ρ(x, u), z ∇xρρ, constants, (5.4) where A, B, C, D are correspondingly large constant ma-Input trices. The nonlinear functions f (·) and g(·) depend on the vector of scheduling variables ρ = ρ(x, u) ∈ Rq, q ≤ m+n, which expresses the nonlinear dependence of the dynamics Equilibrium of the system on its state and input with constants ∇xρ, curve ∇uρ. When this system is differentiated, we obtain ˙x = w ˙ w = (A + ∇f (ρ)∇xρ)w + (B + ∇f (ρ)∇uρ) ˙u Figure 5.14: Validity regions of local models ˙ y = (C + ∇g(ρ)∇xρ)w + (D + ∇g(ρ)∇uρ) ˙u. The velocity-based linearisation at any point, which may be arbitrarily far from equilibrium given the value of the If a result is ‘frozen’ at a value of the scheduling vector ρ1, scheduling vector, is equal to the velocity-based linearisa-we obtain a form based on the velocity-based linearisation tion at one of the equilibrium points determined by the at that particular point, which is a linear dynamic system: same value of the scheduling vector. This fact is shown in Figure 5.15. ˙x = w ˙ w = (A + ∇f (ρ1)∇xρ)w + (B + ∇f (ρ1)∇uρ) ˙u ˙ y = (C + ∇g(ρ1)∇xρ)w + (D + ∇g(ρ1)∇uρ) ˙u. Velocity-based linearisation Let us look at how a known nonlinear system is described The basic features of velocity-based linearisation can be by a velocity-based linearisation. summarised in the following points: A nonlinear dynamic system is described by a system of nonlinear differential equations • Velocity-based linearisation is associated with each operating point (both equilibrium and nonequilib- ˙x = F(x, u) rium ). 55 Local-model networks and blended multiple-model systems • The properties of the model obtained by velocity-p based linearisation are a completely accurate representation of the properties of a nonlinear system in the immediate vicinity of the relevant operating point. • The family of models obtained by velocity-based linearisation around all operating points has no loss of information with respect to the nonlinear dynamics and is an alternative representation of the nonlinear p / 2 system. Velocity-based linearisation addresses many of the short-comings of ordinary linearisation theory around equilib-Q rium points in terms of analysis and control design: • it does not contain a constraint on the action around equilibrium points; 0 • it provides a direct link between the solution obtained in the form of a velocity-based linearisation Figure 5.16: Schematic of a pendulum and the local solution of the nonlinear system; • solutions obtained by velocity-based linearisation can ˙ w 0 1 w 0 be composed into a global solution of the nonlinear 1 = 1 + ˙ u ˙ w −Q cos(x w 1 system; 2 1) −Q 2 w 1 • no prior knowledge of equilibrium points is required. ˙ y = 1 0 . (5.7) w2 It should be noted that the method raises new problems, The linearisation, for example, at the point (x1 , x , u 0 20 0) such as the derivation of the input signal, but these are is obtained by ‘freezing’ the equation at this point. In our relatively easy to solve in both analysis and system design. case, it is sufficient to insert only the value x1 into the 0 equation. In this way, we obtain from the system of equations (5.7) a completely linear system at the chosen point Modelling example and not an affine system as we would obtain using the Taylor expansion on the system described with Equations (5.6). Let us look at the velocity-based linearisation for a damped pendulum [12] (Figure 5.16), which can be described by If we simulate the model in terms of a velocity-based lin-the equation earisation for analysis purposes, we need ˙ y integrated, and ¨ θ = Q ˙ θ − Q sin θ + bF, (5.5) ˙ u can be obtained via a filter that is the implementation where Q = 29.46 and b = 1.21, θ ∈ [0, π]. In a state of the derivative. How we get rid of the derivative, which space of the form (5.4), where linear and nonlinear parts mainly comes into play in control design, is discussed in are separated, the notation of the system is as follows: the next section. Figure 5.17 shows the correspondence between the two system notations. The small discrepancy ˙ x 1 0 1 x 0 is due to the approximation of the derivation and other = 1 + u ˙ x2 0 −Q x2 b numerical reasons. 0 * * * + −Q sin(x1) Velocity-based linearisation can also be applied to discretex y = 1 0 1 , (5.6) time systems, although discretisation should not be used x2 for the derivatives and integrals required to implement velocity-based linearisation, as the error is accumulated where the state x1 is θ, the state x2 is ˙θ and the input u by integrating the output signal. is F . Let us imagine a nonlinear dynamic system represented The system of equations presented in the form of a velocity- by sampled data in the form of a discrete-time system: based linearisation is as follows: ˙ x x(k + 1) = F (x(k), u(k)) 1 w = 1 ˙ x y(k) = G (x(k), u(k)) , (5.8) 2 w2 56 5.1 Velocity-based linearisation Assuming local linearity near the operating point (x Responses of system and model 0, u0) 1 follows: 0.8 x(t + T ) − x0 + x0 = ˜ Ax0 + ˜ Aδx(t) + ˜ Bu0 + ˜ Bδu(t) 0.6 + f0 + fx0δx(t) + fu0δu(t), 0.4 theta (rad) (5.12) 0.2 δx(t + T ) + x ˜ 0 = Ax0 + ˜ Bu0 + f0 0 ˜ ˜ 0 2 4 6 8 10 + A + fx0 δx(t) + B Time -7 x 10 The difference between responses + f 1 x0) δu(t), 0 (5.13) -1 theta (rad) 0 2 4 6 8 10 Time δx(t + T ) = (F ˜ 0 − x0) + A + fx0 δx(t) + ˜ B + fu0 δu(t), (5.14) Figure 5.17: Response of the model and the pendulum, which are very close, as shown in the bottom figure, which where F0 = F(x0, u0), f0 = f (x0, u0), fx0 = ∂f (x0, u0) illustrates the difference between the two responses ∂x and fu0 = ∂f (x ∂u 0, u0). where the sampling time T is chosen small enough that Differentiating Equation (5.14) over time, we obtain: the description (5.8) captures all the nonlinear dynamics of interest in the original system. We want to describe this ˙x(t + T ) = ˜ A + f ˜ x0 ˙x(t) + B + fu0 ˙ u(t) system by a linear model with variable parameters. (5.15) In order to be able to use a velocity-based linearisation, the ˙x(t + T ) = A(ρ) ˙x(t) + B(ρ) ˙ u(t), (5.16) discrete-time system (5.8) is represented in the continuous-time domain in terms of a delayed continuous-time signal where A(ρ) = ( ˜ A + fx0) and B(ρ) = ( ˜ B + fu0). Under notation: the same conditions, the second equation of (5.10) can be written as: x(t + T ) = F (x(t), u(t)) y(t) = G (x(t), u(t)) , (5.9) ˙ y(t) = C(ρ) ˙x(t) + D(ρ) ˙ u(t), (5.17) where the values of the states x, outputs y and inputs u of where C(ρ) = ( ˜ C + gx0) in D(ρ) = ( ˜ D + gu0), the discrete-time and continuous-time systems at sampling z gx0 = ∂g (x (x ∂x 0, u0) in gu0 = ∂g ∂u 0, u0). times t = kT, k = 1, 2, . . ., are equal. In the way described above, we can represent the opera-System (5.9) can be expressed in terms of an extended tion of the system at any point defined by the vector of local linear equivalence ELLE [10]: scheduling variables ρ, with a vector of states x(t) and input u(t). System (5.10) can thus be written: x(t + T ) = ˜ Ax(t) + ˜ Bu(t) + f (ρ), y(t) = ˜ Cx(t) + ˜ Du(t) + g(ρ), (5.10) ˙x(t + T ) = A(ρ) ˙x(t) + B(ρ) ˙ u(t) ˙ y(t) = C(ρ) ˙x(t) + D(ρ) ˙ u(t), where x(t) ∈ Rn, u(t) ∈ Rm and ˜ A, ˜ B, ˜ C, ˜ D are corre- spondingly large constant matrices. The nonlinear func-where the matrices are tions f (·) and g(·) depend on a vector of scheduling variables ρ = ρ(x(t), u(t)) ∈ Rq, q ≤ m + n, which expresses ∂f the nonlinear dependence of the dynamics of the system A(ρ) = ˜ A + (x(t), u(t)), ∂x on its state and input with constant ∇xρ, ∇uρ [10]. ∂f B(ρ) = ˜ B + (x(t), u(t)), ∂u The first equation of (5.10), relative to the operating point ∂g C(ρ) = ˜ C + (x(t), u(t)), (x, u) = (x0, u0), can be written using ∂x ∂g D(ρ) = ˜ D + (x(t), u(t)), δx(t) = x(t) − x0, ∂u δu(t) = u(t) − u0, depending on the vector of scheduling variables ρ. as: Let us consider an example of modelling a discrete-time x(t + T ) = ˜ A (x0 + δx(t)) + ˜ B (u0 + δu(t)) + f (ρ). (5.11) system with velocity-based linearisation. 57 Local-model networks and blended multiple-model systems Example of modelling a discrete-time system 5.2 Blended multiple-model syste- ms We are interested in a velocity-based linearisation for a nonlinear discrete-time system [17], described by the equation The velocity-based linearisation method as described is a y(k) y(k + 1) = + u(k)3 (5.18) useful tool for linearising nonlinear systems with an in-1 + y2(k) finitely large family of linearised submodels. In practice, and with sampling time T = 1 s. In the state space, the we would also like to have a finite number of parame-system (5.18) can be written as: terised models. A velocity-based linearisation for when a nonlinear system is modelled by blending a finite number x(k) x(k + 1) = + u(k)3 of linear submodels is described in detail in [12]. 1 + x2(k) y(k) = x(k). (5.19) We select a small number of representative family members and interpolate between them to replace the miss-The representation of the system (5.19) in the continuous-ing ones: the (approximate) finite parametrisation, which time domain is is essentially a new form of blended multiple-model representation. In contrast to the usual blended systems x(t) x(t + T ) = + u(t)3 (local-model network), velocity-based linearised systems 1 + x2(t) have the following advantages: y(t) = x(t), (5.20) which is not an equivalent system but has the same re- • they use linear local models - true linear, not affine; sponse at the moments of sampling. If we were to discre-tise the system (5.20) with the sampling time T , we would • they establish a direct link between the dynamics obtain the system (5.18). of the local models and the dynamics of the overall blended system; The system (5.20) in the form of a velocity-based linearisation is – the behaviour and properties of the whole system are approximated locally by the behaviour ˙ x(t + T ) = (1 − x(t)2)/(1 + x(t)2)2 ˙ x(t) + 3u(t)2 ˙ u(t) and properties of a weighted linear combination ˙ y(t) = ˙ x(t). (5.21) of local models; – better still, the behaviour of the whole system Figure 5.18 shows the matching responses of the systems is approximated locally by the weighted com- (5.18) and (5.21). The difference between the responses of bination of the behaviour and properties of the local models. 1.5 Example of damped pendulum modelling 1 We will approximate a nonlinear system (5.5) and a family of velocity-based linearised models by blending local models at angles 0, π/2 and π. We will use Gaussian weighting Response 0.5 functions denoted by µ, as in Figure 5.19. 1 m m 3 1 0 . 8 00 20 40 60 80 100 m 2 Time 0 . 6 0 . 4 Figure 5.18: Response of the discrete-time system and the 0 . 2 continuous-time model in the velocity-based form 0 0 1 2 3 4 the two systems can be seen if observed carefully. The dif-q [ r a d ] é w & ù1 é 0 1 ù é w ù1 é 0 ù = + r ê ú ê ú ê ú ê ú & wë & - Q c o s ( p ) - Q w b 2 û ë û ë 2 û ë û ference is due to the larger approximations, but it is much é w & ù1 é 0 1 ù é w ù1 é 0 ù = + ê ú ê ú ê ú ê ú r& wë & - Q c o s ( 0 ) - Q w b é w & ù1 é 0 1 ù é w ù1 é 0 ù 2 û ë û ë 2 û ë û smaller than if we tried to replace the derivatives and in- = + r ê ú ê ú ê ú ê ú & wë & - Q c o s ( p / 2 ) - Q w b 2 û ë û ë 2 û ë û tegrals of the velocity-based linearisation implementation with discrete-time equivalents or an incremental discrete-time model. Figure 5.19: Weighting functions 58 5.2 Blended multiple-model systems The model based on the velocity-based linearisation at in-termediate operating point of the approximated system is 3 0 obtained by blending the velocity-based linearised models at the operating points corresponding to angles 0, π/2 and 2 0 π. 1 0 0 1. - 1 0 ˙x = w - 2 0 ˙ w = (A + ∇f (ρ1)∇xρ)w + (B + ∇f (ρ1)∇uρ) ˙u, - 3 0 0 1 2 3 4 2. q [ r a d ] ˙x = w ˙ w = (A + ∇f (ρ2)∇xρ)w + (B + ∇f (ρ2)∇uρ) ˙u, Figure 5.20: Fitting the nonlinearity of the blended model 3. ( ˙ w2) based on the velocity-based linearisation and the nonlinearity of the original system: full curve - nonlinear sys- ˙x = w tem, dashed curve - blended model ˙ w = (A + ∇f (ρ3)∇xρ)w + (B + ∇f (ρ3)∇uρ) ˙u. Weighted combination of solutions: 1 0.9 3 X ˜ w = w 0.8 iµi(ρ). 0.7 i=1 0.6 0.5 theta (rad) 0.4 solution of weighted combination 0.3 = solution of velocity-based linearisation 0.2 ≈ solution for the nonlinear system (localy 0.1 to the relevant operating point) 0 0 2 4 6 8 10 Time It is easy to see how few linearised systems (local models) have been taken. Only three local models are sufficient to cover the entire operating range [0, π]: Figure 5.21: Response of a nonlinear system and the blended model to a given input signal - the working range is about π/4 rad, which is the most problematic range in terms of blending ˙x = w3 X ˙ w = [(A + ∇f (ρi)∇xρ)w i=1 -3 x 10 + (B + ∇f (ρ 1.5 i)∇uρ) ˙ u] µi(ρ). 1 0.5 0 We can show that: -0.5 ˜ w ≈ w, solution of weighted combination Error [rad] -1 ≈ solution for the nonlinear system (locally) -1.5 -2 This is a direct link between the dynamics of the local -2.5 models and the dynamics of the nonlinear blended system -3 0 2 4 6 8 10 and a consequence of the use of the velocity-based realisa-Time [s] tion and does not apply to the usual local-model networks. The correspondence between the responses of the original Figure 5.22: The difference between the output of the non-and the blended model is shown by Figures 5.20, 5.21, and linear system and the blended model 5.22. 59 Local-model networks and blended multiple-model systems Example of modelling a discrete-time system 1.5 The same procedure as in the previous example can be used to model a discrete-time system. The system (5.18) was approximated in its continuous form (5.20) with seven 1 uniformly distributed local models, which were blended with Gaussian weighting functions. odziv The correspondence between the responses of the original 0.5 system and the blended model is shown in Figure 5.23. 0 0 10 20 30 40 50 60 70 80 90 100 cas [s] Figure 5.23: Response of a nonlinear discrete-time system (5.18) and the response of the blended model (5.22) 60 Bibliography [1] R. Babuška (1998): Fuzzy modeling for control, Nor- [11] D. J. Leith, W. E. Leithead (1998): Gain-scheduled well, MA: Kluwer. controller design: an analytic framework directly incorporating non-equilibrium plant dynamics, Int. J. [2] G. Gregorčič (2004): Data-based modelling of non-Control, Vol. 70, 249-269. linear systems for control, Ph.D. Thesis, University College Cork, National University of Ireland. [12] D. J. Leith, W. E. Leithead (1999): Analytic framework for blended multiple model systems using linear [3] G. Gregorčič, G. Lightbody (2003): From multi-local models, Int. J. Control, Vol. 72, 605-619. ple model networks to the Gaussian processes prior model, Proceedings in IFAC ICONS conference, Faro, [13] D. J. Leith, W. E. Leithead (2007): 149-154. Velocity-based gain-scheduling demo, https://www.hamilton.ie/doug/demof.htm [4] T. A. Johansen, B. A. Foss (1993): Constructing NARMAX models using ARMAX models, Int. J. [14] R. Li, Z. Zhao, X.-B. Li (2005): General model-set Control, Vol. 58, 1125-1153. design methods for multiple-model approach, IEEE Trans. Auto. Control, Vol. 50, No. 9, 1260-1276. [5] T. A. Johansen, B. A. Foss (1998): ORBIT - operating regime based modeling and identification toolkit, [15] R. Murray-Smith and T. A. Johansen [Eds.] (1997): Control Engineering Practice, Vol. 6, 1277-1286. Multiple model approaches to modelling and control, Taylor & Francis, London. [6] T. A. Johansen, K. Hunt and H. Fritz (1998): A software environment for gain scheduled local controller [16] R. Murray-Smith, T. A. Johansen, R. Shorten (1999): network design, IEEE Control Systems Maganzine, On transient dynamics, off-equilibrium behaviour 48-60. and identification in blended multiple model structures, Proceedings in European Control Conference, [7] T. A. Johansen, B. A. Foss [Eds.] (1999): Int. J. Karslruhe, BA-14. Control, Special issue: Multiple model approaches to modelling and control, Vol. 72, No. 7/8. [17] K. S. Narendra, J. Balakrishnan, M. K. Ciliz (1995): Adaptation and learning using multiple mod- [8] T. A. Johansen, R. Shorten and R. Murray-Smith els switching and tuning, IEEE Control System Mag- (2000): On the interpretation and identification of azine, Vol. 15, No. 3, 37-51. dynamic Takagi-Sugeno fuzzy models, IEEE Trans. on Fuzzy Systems, Vol. 8, No. 3, 297-313. [18] O. Nelles (2001): Nonlinear system identification, from classical approaches to neural networks and [9] T. A. Johansen,R. Babuška (2003): Multiobjective fuzzy Models, Springer Verlag, Berlin. identification of Takagi-Sugeno fuzzy models, IEEE Trans. Fuzzy Systems, Vol. 11, No. 6, 847-860. [19] T. Takagi and M. Sugeno (1985): Fuzzy identification of systems and its application to modeling and [10] D. J. Leith, W. E. Leithead (1998): Gain-Scheduled control, IEEE Trans. Syst., Man, Cybern., Vol. 15, & Nonlinear Systems: Dynamic analysis by velocity-116132. based linearisation families. Int. J. Control, Vol. 70, 289-317. 61 BIBLIOGRAPHY 62 Chapter 6 Design of gain-scheduling control The dynamic systems that surround us predominantly ex-is valid throughout the domain and is not limited to the hibit nonlinear dynamics. Unfortunately, the methods for vicinity of the equilibrium points. This means that we their analysis and design are still relatively undeveloped. can also deal with fast and large-scale transitions between In contrast, well-developed and well-known tools for lin-distant operating regions of the system. Gain-scheduling-ear dynamic systems exist. This means that the ‘divide control design based on a velocity-based linearisation looks and conquer’ approach, in which a nonlinear problem is like this (Figure 6.1): divided into a series of linear problems, is attractive. 1. the development of the family of velocity-based lin-Gain-scheduling control [11] is a typical ‘divide and con-earised local models for the nonlinear system; quer’ control method used in many areas of automatic control: from aeronautics to process engineering. The con-2. the design of linear controllers for each of the linear ventional control design by gain scheduling consists of the local models; following steps: 3. the integration of a nonlinear controller from a family of velocity-based controllers designed in the pre-1. the linearisation of a nonlinear system around a sevious step. lected number of equilibrium points; 2. the design of linear controllers for each of the linear Family of velocity submodels; Nonlinear system to linearised models be controlled with suitable no. of 3. the integration of the linear controllers into a global local models nonlinear controller by scheduling the parameters or signals of the controllers using scheduling variables. Realisation of nonlin. A conventional gain-scheduling controller is limited to op-controller from family Design of the family erating near the equilibrium points. The scheduling vari-of local controllers of local controllers ables used to schedule the parameters or signals of linear controllers must be slow in conventional gain-scheduling control. This means that their rate of change must be slower than the rate of change of the output signal of Figure 6.1: The method of gain-scheduling control design the closed-loop system. The theory supporting this very with velocity-based linearisation widely used approach is relatively poorly developed. For a more detailed consideration of this topic, see [9]. The advantages of velocity-based linearisation for gain-scheduling control are as follows. 6.1 The design with velocity-based • The gain-scheduling controller designed with velocity-based linearisation works well even with large changes linearisation in the setpoint and at large distances from the equilibrium state. Velocity-based linearisation, which we described in the • Such a controller does not require any of the signals previous chapter, solves a number of drawbacks of conto change slowly. This means that it can also handle ventional gain scheduling. In particular, it generalises the the global inversion of the dynamics, which could be control design by incorporating information about the dy-described in a simplified way as the direct cancella-namics of the system even at nonequilibrium points. As tion of the poles and zeros of the controlled system mentioned in the previous chapter, this form of a model at the operating points. 63 Design of gain-scheduling control • The advantage of the design procedure is that the K ( r ) 0 knowledge of the design of linear systems can be used. u y d / d t K ( r ) 1 For more details on velocity-based gain-scheduling control, see [6], [7], with the demonstration software at [10]. Examples of use are [2], [3], [4]. 5 0 Implementation of controllers based on velocity-based linearisation Figure 6.4: PI controller in velocity-based linearisation form that cannot be implemented The form of the system using velocity-based linearisation (Figure 6.2) also contains an input-signal derivative in-d / d t K ( r ) stead of the input signal itself and an output signal deriva-0 tive instead of the output signal itself. The input signal derivative is not desirable when implementing control, but y u d it can be avoided. / d t K ( r ) 1 5 0 Figure 6.2: A system in the velocity-based linearisation K ( r ) 0 form We can consider the fact that almost all controllers also y u K ( r ) contain an integral part. In this case, the system can be 1 configured as shown in Figure 6.3. 5 0 Figure 6.3: Block diagrams showing input signal deriva-Figure 6.5: The final implementation of the PI controller tion bypass 11 Example of PI controller Kp s 1 1 2 55 2 0.01s+1 s s In2 Out2 Integ Ki Integ Let us illustrate the implementation for bypassing the Differentiator signal derivation with a simple proportional-integral (PI) 11 controller: Kp 1 1 1 100 55 1 K s s 1(ρ) 1 In1 Out1 K(s, ρ) = (K Gain Integ Ki Integ 0(ρ) + ) , (6.1) s (s + 50) 100 Gain where K(s, ρ) is the transfer function of the PI controller, s is a complex variable, ρ is a scheduling variable, and K0, K1 ∈ R are controller parameters. Figure 6.4 shows Figure 6.6: An example of the controller implementation the basic and undesirable form of implementation; Figure of the controller K(s, ρ ) 1 in Simulink 6.5 shows how to avoid the derivative. 1) = (11 + 55 s (0.01s+1) The implementation of the controller in the form of a simulation schematic of the package Matlab/Simulink for the for the control of the nonlinear system with input u and values K output y: 0 = 11 and K1 = 55 is shown in Figure 6.6. ˙ y = tanh(u − 10y) + 0.01(u − 10y). (6.2) Example of two differently implemented gain-sch-For both controllers, the nonlinearity of the controller pa-eduling controllers rameters is the same, but they are implemented differently. The simulation schematics are shown in Figure 6.7 and Let us consider the design of a controller in the velocity-the closed-loop responses are shown in Figure 6.8. From based form and in the conventional gain-scheduling form the responses, it can be seen that the conventional gain-64 6.1 The design with velocity-based linearisation scheduling control cannot keep up with large changes in On this basis, we can design the gain-scheduling control the setpoint signal. according to the following procedure: • Design a linear controller for each of the linearisa-u(1)*(-10+(14/(1.01-tanh(u(2))*tanh(u(2))))) tions of the nonlinear system, meaning for the local Kp models that are represented by a small number of 1 1 1 50 1 linear local models. s u(1)*100/(1.01-tanh(u(2))*tanh(u(2))) u e s Gain 1 Integrator1 Ki Integrator2 • Use the same type of blending functions as in the rho 2 approximated system model for blending the family of controllers based on velocity-based linearisation. • Provided that the blended system model for the con-u(1)*(-10+(14/(1.01-tanh(u(2))*tanh(u(2))))) troller design is sufficiently accurate and the con-Kp troller is sufficiently robust, the family of controllers 1 is also suitable for the control of the system. This 1 1 1 50 s s u(1)*100/(1.01-tanh(u(2))*tanh(u(2))) u condition applies to all design procedures using the Integrator2 e Gain1 Integrator1 Ki system model. It should be noted that for a success-2 rho ful controller design, the system must be approximated by a larger number of models than a minimum number of local models that satisfactorily de-Figure 6.7: Simulink simulation schematic for the con-scribe the nonlinear dynamics. The reason for this troller in the conventional form (top), and in the velocity-is that in the case of gain-scheduling we reproduce based form (bottom) the inverse dynamics of the process to some extent. • Integration of a global nonlinear controller from the designed family of local velocity-based controllers. 2 1.5 Summary of lessons on gain-scheduling design based on a velocity-based linearisation: 1 • When we design a controller with gain scheduling, 0.5 we must make a nonlinear controller with a well-defined family of local models obtained with velocity-0 based linearisation, which is relatively straightforward. -0.5 velocity-based sch. • The derivation of noisy input signals is not required. -1 conventional sch. setpoint • We can obtain a nonlinear controller directly con- -1.5 nected to the family of linear controllers without the 0 1 2 3 4 5 6 7 Time need for additional parameter tuning. • The family of velocity-based linearised system mod-Figure 6.8: Responses in closed loop with different control els has an infinite number of elements, but we often implementations want to work with a finite and smaller set of ele- * * * ments. We can design controllers for a small number of linearised system models and then mix them Thus far, we have considered families of velocity-based lin-with blending functions to create a global nonlinear earised models with an infinite number of links represent-controller for the entire nonlinear system. ing the modelled nonlinear system. In practice, we often • When implementing controllers that are based on encounter the desire to approximate a nonlinear system by velocity-based linearisation, care must be taken to a model consisting of a finite number of linear submodels. preserve the advantages obtained. Velocity-based linearisation also allows us to approximate • Blending (i.e., interpolation) depends on the propa model by interpolating a small number of linearised lo-erties of the nonlinear system. cal models. Furthermore, there is a direct relationship between the solutions of these ‘local’ control problems • Conventionally implemented blended controllers have and the solution of the control problem for the nonlinear greater deviations from the desired behaviour with blended system. increasing distance from the equilibrium points. 65 Design of gain-scheduling control • The direct relationship between properties of velocity- based linearised models and properties of the blended nonlinear model/controller greatly facilitates analysis and design. This is a peculiarity of velocity-based linearisation and does not apply to the usual forms of systems with multiple models (local-model networks, fuzzy models, etc.). M • It should be borne in mind that the successful implementation of control depends on the correct choice of the scheduling variable and the number of blending functions. 6.2 Example of control design: a single-segment robot manipu- lator F y The purpose of this example, described in more detail in [1], is to illustrate the process of designing a control with gain-scheduling based on the final number of linearised Figure 6.9: Schematic of a single-segment robot manipu-submodels and to show the importance of correct imple-lator mentation. • The output variable y (i.e., the vertical deviation of We illustrate the control design process and evaluate it the rod tip) is the scheduling variable. with a computer simulation using a single-segment robot manipulator as an example. A segment of the robot ma- • A pole of the transfer function is added to attenuate nipulator is approximated by a homogeneous rod as shown high-frequency noise and deviation and to facilitate in Figure 6.9: the implementation of the controller. r • M = J ¨ Φ + mg sin Φ + k ˙ We will compare two different designs of the blended v Φ 2 controller. y = r(1 − cos Φ), (6.3) where y is the vertical deviation of the rod tip, Φ is the Conventional implementation of the gain-scheduling angular deviation, M is the torque, J is the momentum, m controller is the mass of the rod, g is the acceleration due to gravity, r is the length of the rod, and kv is the viscous friction coefficient. A conventional implementation of a blended controller with output u, input e, states xc and blending functions µi This is a nonlinear system, which is unlikely to pose any looks like this: major problems for the design of the controller, regardless X ˙x = F of the design method used. ci i(xc, e)µi(ρ) ρ i The design process is described in the following steps: X ui = Gi(xc, e)µi(ρ) i • The system is represented by 3 local models at 3 ρ(xc, e) = ∇xρxc + ∇eρρe, (6.4) operating points, r = 0, 0.2 and 0.7, corresponding where normally to Φ = 0, 36 and 72 angular degrees. F • i(xc, e) = αi + Aixc + Bie, (6.5) For the local models, we design local PI controllers with the requirement that the closed-loop response Gi(xc, e) = βi + Cixc + Die (6.6) is free from overshoot and is the same over the entire (6.7) operating range. with the constants αi, Ai, Bi, βi, Ci, Di, which represents • The local controllers are suitably blended into a non- the linearisation of the nonlinear system at an operating linear controller. point. • Set the three blending functions in the form of a An example of a conventional implementation of a non-Gaussian bell curve. linear PI controller is shown as a block diagram in Figure 66 6.3 Example of the control design of a gas-liquid separator 6.10. The disadvantages of such an implementation can r be summarised in two observations: m r • Predictable behaviour exists only near the equilibrium points. k m ( r ) p i i • e + The scheduling variable must be changed slowly. + u W k m ( r ) ò ò _ i i i + W r m Figure 6.11: Implementation of blended controller with r velocity-based linearisation k m ( r ) p i i The filter at the input of the controller, which we exploited e + + u W k m ( r ) ò ò _ i i i + to avoid the derivative of the input signal, contains an additional pole Ω to attenuate high frequencies, which also W limits the amount of noise in the signal. A comparison of the closed-loop responses with differently Figure 6.10: Conventional design of a blended PI con-implemented controllers is shown in Figure 6.12. It is troller clear that a closed-loop system with a conventionally implemented controller does not behave as it should with The scheduling variable is the output of the process y, large changes in the setpoint, where the system quickly which means that the scheduling variable does not change transitions from one area of nonlinearity to another. slowly. Implementation by blending control signals instead of parameter blending has the following disadvan-0.8 tages in addition to those already mentioned. The parallel dynamic controllers can increase the order of the non-0.7 linear controller proportionally to the number of active 0.6 controllers, which affects the dynamics of the closed loop. The exception is when the local controllers have identical 0.5 poles when the controller parameters are blended, which 0.4 is the case in our example. Therefore, closed-loop systems behave the same with conventionally implemented con-0.3 trollers regardless of whether output signals of controllers 0.2 or controller parameters are blended. VBL 0.1 conventional setpoint 00 50 100 150 200 250 Implementation of a gain-scheduling controller with Time velocity-based linearisation The implementation of blended controllers with velocity-Figure 6.12: Simulation results based linearisation proceeds as follows: ˙xc = wc 6.3 Example of the control design ! ! X X ˙ wc = (ρ) wc + Biµi(ρ) ˙e of a gas-liquid separator i i ! ! X X ˙ u = C The purpose of this example is to show a practically im-iµi(ρ) wc + Diµi(ρ) ˙e. i i plemented gain-scheduling control in a way that is close (6.8) to an engineer who normally designs process control. The task was to design a pressure control for a semi-industrial An example of a nonlinear PI controller in velocity-based process plant for gas treatment in the process laboratory form is shown in the block diagram in Figure 6.11. The of the Systems and Control Department at the Jožef Ste-disadvantage of the velocity-based linearisation implemen-fan Institute. The schematic is shown in Figure 6.13. The tation is that we have to be careful to implement it cor-problem is described in more detail in [5]. The control rectly. requirements are as follows: 67 Design of gain-scheduling control • The behaviour of the closed-loop response should Supervisory PC be as uniform as possible over the whole operating computer range. • Data acquisition The control should be implemented with programmable Control and algorithm logic controllers (PLCs) as typical representatives of Ethernet preprocessing industrial control hardware. PLC Q PLC A • The chosen control algorithm should be understand-able to an engineer who normally works with PLCs. Plant • For simplicity, gain scheduling should be done by blending PI controllers and blending with triangular weighting (membership) Figure 6.14: Control system for the gas-liquid separator functions. • The control of the tank pressure must function with Set of different liquid levels in the tank in such a way that parameters Blending the response is the same over the entire operating range. r e u y • We are solving a nonlinear problem because the dy-Controller Plant namics of the system change when the liquid level changes. Figure 6.15: Block diagram of closed loop LT 2 aaaaaaaaaaaa V5 aaaaaaaaaaaa LT Signal derivatives aaaaaaaaaaaa 3 of P and I parts R2 aaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaa Blending functions for r e d d P/ t V8 local controllers Calculation d d I/ t 1 y V4 V6 0.9 of P and I 0.8 terms 0.7 K , T 0.6 p i I1 0.5 0.4 Drain 0.3 0.2 Anti-windup 0.1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 FT Constraints 1 u ò Gas V1 P1 Integration Gas implemented T1 as summation PT 1 aaaaaaaaaaaaaaaa FT aaaaaaaaaaaaaaaa 2 aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa Figure 6.16: Implementation diagram of a blended gain V7 aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa LT aaaaaaaaaaaaaaaa 1 scheduling controller for PLC aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa Water aaaaaaaaaaaaaaaa • We defined a suitable number of equilibrium points V2 V3 in which we have placed the centres of the blending functions by finding a compromise between good closed-loop behaviour (requiring as many functions as possible) and simple implementation (requiring as few functions as possible) due to the simplicity of Figure 6.13: The technological schematic of the gas-liquid the hardware. separator • Tuning of the local controllers at equilibrium points The control structure is shown in Figure 6.14, from the was implemented with engineering tuning rules based system-theory point of view by the block diagram in Fig-on the responses to very small steps of positive and ure 6.15, and from the implementation point of view by negative amplitude change around each equilibrium the block diagram in Figure 6.16. point. The design process was the same as described in the previ- • The whole design process is iterative and interactive, ous sections. Some features of this design can be described until the requirements are met. The design process as follows: would have been easier if we had used a mathemat-68 6.3 Example of the control design of a gas-liquid separator ical model of the process, but that would have required additional effort and consequently cost in the Simulink theoretical modelling of the plant, which is avoided D/A by engineers in practice whenever possible. PC • Due to the physical background, the scheduling variable was the height of the liquid, which is slower Connection than the closed-loop response of the gas pressure. PCI 20098C PLC Q board This means that the performance is not critical in any case. In reality, we would have to work with a vector of scheduling variables, but the analysis in [5] shows that the vector of scheduling variables can be A/D simplified by one dominant variable: the fluid level in the vessel. Figure 6.19: Hardware-in-the-Loop-simulation implemen-The simulation results, shown in Figures 6.17 and 6.18, tation show that the closed-loop system behaves differently with upward and downward steps due to the nonlinearity of the valve. Otherwise, we can conclude that we have a satisfactory closed-loop response over the entire range of Figure 6.20: Comparison of the closed loop responses at interest with only three blended controllers. the operating edge. The solid line represents the HIL response and the dashed line represents the computer simulation response 0.8 [bar] 0.7 0.6 Finally, the controller is evaluated at the plant. In Figure Pressure 0.5 6.21, we see that the controller controls the unit satisfac-0.4 torily. The differences between the measured and simu-0 1000 2000 3000 4000 5000 Time [s] lated behaviour can be seen in Figure 6.22. The difference 1.5 between the rise times of the measurements and the simulation is due to the fact that the mathematical model 1 [m] we used for the simulation does not represent the dynam-Level 0.5 ics of the gas-liquid unit well enough. Regardless of the observed discrepancies, it can be concluded that we were 00 1000 2000 3000 4000 5000 Time [s] able to achieve similar dynamic behaviour over the entire operating range of the nonlinear system using a simple algorithm and industrial hardware. Figure 6.17: Simulation results 0.8 [bar] 0.8 0.6 0.8 0.75 Pressure [bar] 0.75 [bar] 0.7 0.7 0.40 1000 2000 3000 4000 5000 0.65 0.65 1.5 Pressure 0.6 0.6 Pressure 0.55 0.55 [m] 1 0.5 0.5 0.5 Level 0.45 0.45 0 20 40 60 80 100 0 20 40 60 80 100 Time [%] Time [%] 00 1000 2000 3000 4000 5000 1 0.5 1] Figure 6.18: Simulation results - comparison of normalised - 0 signals (all signals in one image): closed-loop response u [0 -0.5 with blended controller (left image), closed-loop response 0 1000 2000 3000 4000 5000 Time [s] with a single PI controller The next step was to test the designed controller with a Figure 6.21: Measurements of closed-loop response, PLC connected to a computer that simulated the dynam-scheduling variable and control signal ics of the plant (Hardware-in-the-Loop (HIL) simulation), as shown in Figure 6.19 The simulation results can be seen in Figure 6.20. 69 Design of gain-scheduling control 0.9 [bar] 0.8 0.7 Pressure 0.6 0.50 20 40 60 80 100 Time [%] 0.9 [bar] 0.8 0.7 Pressure 0.6 0.50 20 40 60 80 100 Time [%] Figure 6.22: Close-up: comparison of the normalised signals between the measurements at the plant (upper image) and the computer simulation (lower image) 70 Bibliography [1] B. Čokan, J. Kocijan (2001): Some realisation issue of [5] J. Kocijan, G. Žunič, S. Strmčnik, D. Vrančić (2002): fuzzy gain-scheduling controllers: a robotic manipula-Fuzzy gain-scheduling control of a gas-liquid separator case study, 6th On-line World Conference on Soft tion plant implemented on a PLC. Int. J. Control, Computing in Industrial Applications, September 10-Vol. 75, 1082-1091. 24, 2001. In: Roy, R. (edt.). Soft computing and industry: recent applications. New York: Springer, [6] D. J. Leith, W. E. Leithead (1998): Gain-scheduled 2002, 191-199. & nonlinear systems: dynamic analysis by velocity-based linearisation families. Int. J. Control, Vol. 70, [2] J. Kocijan, D. J. Murray-Smith (1999): Robust non-289-317. linear control for ship steering. In: Mastorakis, N. E. (edt.). Progress in simulation, modeling, analysis [7] D. J. Leith, W. E. Leithead (1998): Gain-scheduled and synthesis of modern electrical and electronic de-controller design: an analytic framework directly in-vices and systems, World Scientific and Engineering corporating non-equilibrium plant dynamics. Int. J. Society Press, 235-240. Control, Vol. 70, 249-269. [3] J. Kocijan, D. Vrančić (2000): Multiple blended con- [8] D. J. Leith, W. E. Leithead (1999): Analytic frame-troller design for bilinear systems. In: Rudas, I. J. work for blended multiple model systems using linear (edt.), Tar, J. K. (edt.). IFAC Symposium on Artifi-local models, Int. J. Control, Vol. 72, 605-619. cial Intelligence in Real Time Control, AIRTC-2000, [9] D. J. Leith, W. E. Leithead (2000): Survey of gain-October 2-4, 2000, Budapest, Hungary. Preprints. scheduling analysis and design, Int. J. Control, Vol. Budapest: Budapest Polytechnic, 187-192. 73, 1001-1025. [4] J. Kocijan, N. Hvala, S. Strmčnik (2000): Multi- [10] D. J. Leith, W. E. Leithead (2007): model control of wastewater treatment reactor. In: Velocity-based gain-scheduling demo, Mastorakis, N. (edt.). System and control: theory https://www.hamilton.ie/doug/demof.htm and applications, (Electrical and computer engineering series). World Scientific and Engineering Society, [11] W. J. Rough, J. S. Shamma (2000): Research on gain 2000, 49-54. scheduling, Automatica, Vol. 36, 1401-1425. 71 BIBLIOGRAPHY 72 Chapter 7 Identification of nonlinear systems with Gaussian processes 7.1 Gaussian Process where: mi(xi) = E[f (xi)] (7.2) The Gaussian process is a stochastic process, also known and the elements of the covariance matrix Kij are defined as a ‘random process’. A stochastic process is a generas: alisation of a random variable to a domain of indepen-Kij = cov(f (xi), f (xj)) dent variables. If the value of the random variable at each point in this domain is distributed according to normal = E[(f (xi) − m(xi)(f (xj ) − m(xj )], (7.3) (i.e., Gaussian) distribution, such a stochastic process is called a Gaussian process (GP) [16]. Alternatively, if the usually obtained by a covariance function C(xi, xj): input to a stochastic process is a vector of independent cov(f (xi), f (xj)) = C(xi, xj). (7.4) variables x, this process is Gaussian if the distribution of the values of the function f (x) is Gaussian for each input If the distribution of a set of variables is Gaussian, the vector x. distribution of any subset of the elements of that set is The Gaussian-process model (abbreviated to ‘GP model’) also Gaussian, which is called the consistency condition. is usually referred to as a nonparametric probabilistic model This property is important for the performance of a GP [23]. In this way, we allow a priori the description of model when the elements of the covariance matrix of the an unknown system with an infinite set of functions in-GP are obtained by a covariance function [23]. stead of the usual restriction to a class of parameterised functions (i.e., basis functions). In doing so, we assign a higher probability to functions whose occurrence we con-Covariance function sider more likely in the description of the system (e.g., smooth, stationary, periodic). The value of the covariance function C(xi, xj) expresses the correlation between the values of the outputs f (xi) and The basics of modelling with a GP model are described f (xj) based on the values of the input vectors xi and xj. in [27] or the opening chapters in [23]; a more detailed The covariance function can be of arbitrary form if it forms explanation can be found in [23, 22]. a nonnegative definite covariance matrix K for any set N It should be noted that the input and output of the GP of input vectors xi, i = 1, . . . , N . The covariance functions model are not signals. As with other data-driven mod-can be stationary (depending only on the distance between elling methods, the input to the model is a vector of sam-the data), nonstationary, periodic, etc., and are given in pled values of independent variables x, and the output more detail in [23]. The covariance function defining the of the GP model is a probability distribution of the out-form of the unknown function f (x) is usually not known put f (x) for a given input vector. This follows from the in advance but can be obtained from knowledge of the Bayesian modelling [20], which introduces the probability general properties of the function f (x). aspect into the modelling. Let us look at the details of the The most commonly used covariance function is the squared modelling with GP. exponential or Gaussian covariance function, which ex-For any set N of input vectors x presses two general properties: i, i = 1, . . . , N , the GP is given by the vector of mean values of the m = [m • smoothness, which indicates that the output of the 1(x1) . . . mN (xN )]T and the covariance matrix K, function being modelled changes relatively little when  K  11 . . . K1N the change in the input value is small. The corre- . . K = .  . . . .  , (7.1) lation is higher for two output values whose input  . .  K values are close to each other. N 1 . . . KNN 73 Identification of nonlinear systems with Gaussian processes • stationarity, where the covariance between two in- and the system is stationary and use the Gaussian co-put vectors depends only on their distance from each variance function (7.5) with parameters unknown at the other and not on their absolute position in the do-beginning, to form a covariance matrix K. main. y ∼ N (0, K), where the elements of the covariance matrix Kij = C(xi, xj) = CSE(xi, xj) + v0δij. CSE(xi, xj) It is often used when there is no a priori knowledge about are the elements of the covariance matrix given by the co-the structure of the function to be modelled. Here, the covariance function (7.5) and v0δij describe the effect of the variance between two outputs yi = f (xi) and yj = f (xj): noise at the output of the system, where δij is the Kro-necker operator. Since we have assumed white noise, its D 1 X values are only correlated with themselves. CSE(xi, xj) = v1 exp − wd(xd − xd)2 ; (7.5) 2 i j d=1 Since the (as yet unknown) output y∗ is an embodiment of the same system as the training outputs y, we can write h i D is the length of the input vector x or the number of [3]: y y N +1 = ∼ N (0, K y∗ N +1). The joint covariance independent variables of the model. The parameters v1 matrix KN+1 of the vector yN+1 can be decomposed: and wd, d = 1, . . . , D of the covariance function are arbitrary. They are defined as hyperparameters1 [22, 19]       to emphasise that these are parameters of an otherwise  K k(x)      nonparametric model 2, which determine the shape of the K   N +1 =   . (7.7) unknown function f (x). The parameter v   1 is the scaling   coefficient of the covariance, and the parameters w d reflect k(x)T k(x) the importance of each component of the input vector: the larger the parameter wd, the more influential the change in The matrix K is the covariance matrix of the training the component vector xd is on the value of the output. For data, k(x∗) is the vector of covariances between the train-a given covariance function to form a positive definite co-ing data and the test data, k(x∗) is the autocovariance of variance matrix, all parameters of the squared-exponential the test data. covariance function must be greater than zero. According to Bayes’ principle [20], the probability distribution of the output y∗ can be divided into two parts: the Modelling part that determines the probability of the learning outputs given the learning inputs (i.e., the marginal part): p(y|X) ∼ N (0, K), and the conditional part that, given The operation of the GP model is best illustrated by an the first part and the input x∗, predicts the probability example. Suppose we want to describe a system distribution of the output y∗. In formal terms, the calculation of the probability distribution of the output response y = f (x) + v, (7.6) y∗ is as follows [16]: where v is white Gaussian noise with mean 0 and variance Z v p(y∗|x∗, y, XX) = p(y∗|x∗, θ, y, X)p(θ|y, X)dθ. (7.8) 0, v ∼ N (0, v0). Based on N input-output samples (i.e., pairs of vectors) (xi, yi) collected in the set D = {X, y}, we want to determine the unknown value of the output y∗ Normally this integral is analytically intractable, but two given the values of the input vector x∗. In the sequel, we approximations [16] are available. The first, more com-refer to the matrix X of dimension N × D together with mon, is to approximate the integral using the most likely the vector y of dimension N ×1 as a training dataset, since values of the unknown hyperparameters θ: we use them to train or learn the GP model. A particular input/output pair (xi, yi) from this dataset is also called p(y∗|x∗, y, XX) ≈ p(y∗|x∗, θ, y, X). (7.9) a ‘training data pair’ or ‘training point’. The pair (x∗, y∗) is referred to as the test dataset or test input/output pair. We use those values of the hyperparameters θ, for which the probability of training outputs y is greatest given the The training outputs yi, i = 1, . . . , N represent the values values of the training inputs X and the covariance func-of the random variables resulting from the Gaussian pro-tion C(., .). These are determined by searching for the cess. We assume that the output of the system is smooth maximum marginal likelihood. To avoid constrained optimisation, we use the logarithm of the marginal likelihood 1Neal [19] has shown that a forward neural network with a hidden when maximising the marginal likelihood: layer having an infinite number of neurons corresponds to a GP model. Hyperparameters determine the distribution of the neural network’s parameter values L(θ) = log(p(y|X, θ)) 2The model is nonparametric because in addition to the hyper-1 1 N parameters and the covariance function, we need information about = − log(|K|) − yT K−1y − log(2π), the behaviour of the function f (x) in the form of the input/output 2 2 2 data used in modelling (7.10) 74 7.1 Gaussian Process where θ is a vector of parameters, θ = [w1 . . . wD v1 v0]T and K covariance matrix for the training data D. If the op-Training data y y timisation is performed with the conjugate-gradient method * * or another gradient method, a further calculation requires * * the derivation of the log likelihood after all hyperparame- * * * * ters: Yx Yx ∂L(θ) 1 ∂K 1 ∂K Value y x* = − Tr K−1 + yT K−1 K−1y. (7.11) at x=x * ∂θi 2 ∂θi 2 ∂θi At each optimisation step, the inverse of the covariance matrix K−1 must be calculated, which is computationally y y challenging for large N . ip(y) * at x=x * * * * * The alternative to approximate the integral (7.8) is nu- * * * * merical integration over the entire hyperparameter distribution using the Markov Chain Monte Carlo method [16]. Yx Value y x * Y x x* at x=x * The method for determining the model hyperparameters is the cross-validation method [23]. The values of the hyperparameters are searched as usual, except that we divide Figure 7.1: An illustration of modelling with Gaussian the training data into k parts for k-fold cross-validation. processes We use k − 1 parts for training and the remaining part for validation. Repeat the process k times, each time with different data for validation. An extreme example is leave- • mean function, and one-out cross-validation (LOO cross-validation) with only • covariance function C(., .) with known or optimised one data point for validation. The biggest problem with hyperparameters θ, which indicate how the data D this procedure is the computational complexity, as we have are related. to learn k models from [23]. Since the model GP requires information about the unPrediction known function in the form of the training inputs and outputs even after training, the model is not parametric. The hyperparameters only indicate via a covariance func-The joint distribution p(yN+1) is Gaussian. Therefore, the tion how the learning information is used for prediction, conditional distribution is also Gaussian but they do not contain any information about the func-p(y∗|x∗, y, X) = p(yN+1) . By simplification [16, 27], the p(y|X) tion/system to be described. predicted output of the system (7.6) is a Gaussian distribution: We can interpret the vector k(x∗)T K−1 in the expression for the mean of the predicted output (7.13) as a vector p(y∗|x∗, X, y) = N µ(x∗), σ2(x∗) (7.12) of weights that determines the weighting of each training output yi in y according to the distance between the train-with mean µ(x∗) and variance σ2(x∗): ing vector and the test input vector. This linear predictor can be understood as smoothing the information about µ(x∗) = k(x∗)T K−1y (7.13) the unknown system (training data) contained in the GP σ2(x∗) = k(x∗) − k(x∗)T K−1k(x∗), (7.14) model. Alternatively, the prediction mean µ(x∗) can be thought of as a linear combination of N basis functions, where k(x∗) = [C(x1, x∗) . . . C(xN , x∗)] is the N × 1 co-centred on the training points: y∗ = PN α i=1 i C (x∗, xi). variance vector between the test data and the training The output of the system is a sample from the resulting data mentioned above and k(x∗) = C(x∗, x∗) is the auto-normal distribution (7.12). covariance of the test data. An illustration of the above can be found in Figure 7.1. The low variance σ2(x∗) of the predicted distribution of the output means more confidence in the prediction and vice versa. If we look at the expression for the vari-Interpretation ance, we see that it consists of two parts [23]. The term k(x∗)T K−1k(x∗) is subtracted from the first part k(x∗), which is the a priori GP variance. This represents the The GP model consists of the following parts: reduction in the a priori variance of GP at x∗ due to the training data and increases as the covariance between the • pairs of input/output training data (points) D, which training and test data increases. Simply put, the test in-represent the the behaviour of the unknown system, put is ‘closer’ if it is already known as training data, which 75 Identification of nonlinear systems with Gaussian processes means that the GP model has a higher confidence in the Comparison prediction. The variance also depends on the position of the test input data relative to the training input data, Let us review the differences between the GP model and which is one of the main advantages of the GP model over other models obtained by experimental modelling. other models. The comparison with methods that first determine the structure (e.g., by theoretical modelling) and then optimise the parameters with any optimisation method is not Comparison with other types of models quite appropriate, as it is a comparison between grey-box and black-box modelling methods. General about GP model properties It is easier to compare the GP model with neural networks in the context of modelling. It turns out that neural net-The GP model differs from most models in that it is non-works have two major problems in addition to the problem parametric. This is because the information about the of determining a suitable structure: unknown system is described both by the training data D itself contained in the model and by the (hyper)parameters • the lack of transparency: the structure of the neu-of the covariance function and the covariance function itral network does not reflect the structure of the un-self. known system, and The advantages of the GP model are: • the curse of dimensionality, which expresses itself in two ways. As the dimension of the input space increases: • a measure of uncertainty in the output prediction given by the variance and depending on the mutual – the demand for data increases exponentially with covariance (position) of the training input vectors dimension, and and the test input vectors; – the number of building blocks (neurons) in a neural network also increases, • the possibility to include different types of prior knowledge, e.g., linear local models, hysteresis, prior knowl-resulting in higher computational complexity and a edge about noise, static properties, etc.; higher probability of terminating the optimisation at a local minimum. • relative ease of use; The advantages of the GP model over neural networks • a small number of hyperparameters that simultane- are an indicator of confidence in the prediction of the GP ously express the influence of the individual input model, better performance of the GP model with a smaller components and noise. number of data, and a reduction in the ‘curse of dimensionality’. For more on the relationship between the GP model and neural networks, see [10, 17, 19]. In addition to these advantages, the GP model also has the following disadvantages: Fuzzy logic and local-model networks reduce the problems described for neural networks, but in certain cases the GP model has potential advantages over them. In particular, • the nonparametric nature of the GP model, which when small or incomplete data sets for identification are limits the application methods; involved. • the high computational cost of training when the Models such as Support Vector Machines (SVMs) are closely unknown system is described by a large number of related to the GP model, as the methods use kernel func-training data. tions like the GP model, where kernel functions are called covariance functions. The main advantage of the GP model over these models is the confidence in the prediction of the The computational effort can be reduced in several ways. GP model and the use of conditional probability to deter-One way is to cluster the training information, meaning mine the parameters. The disadvantage is the complexity into local models. Another way is to speed up the training of the calculations. The Relevance Method Vector Ma-by using a smaller subset of the training set (e.g., [24]) or chines (RVM) is a special form of the GP model [23]. by approximating the inverse of the covariance matrix, which is the most computationally intensive. These and The connection between GP models and some other mod-other methods are described in [21, 23] and the references els is described in more detail in [17, 23] and the references therein. therein. 76 7.2 Identification of dynamic systems with GP Example of identification of a static nonlin-Nonlinear function and its GP model earity with GP 2.5 95% tolerance band 2 training data y 1.5 E(GP) Let us illustrate the use of the GP model with an example. 1 We want to identify a nonlinear function f (x) as a function 0.5 of the independent variable x: 0 -0.5 f (x) = 4x2 + x − 6 sin(x) + 1 + v (7.15) -1 0 0.2 0.4 0.6 0.8 1 1.5 |error| in the interval x ∈ [0, 1.2]. The variance of the Gaussian 2σ 1 noise v at the output is σ2 = 0.0025. The nonlinear func-0.5 tion is represented by eight unevenly spaced pairs of train-00 0.2 0.4 0.6 0.8 1 ing data representing the input/output relation x/f (x). x The function and the training data can be seen in Figure 7.2. Figure 7.3: Output of model (solid curve) with uncertainty (grey area) and training data (stars) Nonlinear function to be identified 1.6 f(x) 1.4 training data 7.2 Identification of dynamic sys- 1.2 1 tems with GP 0.8 0.6 f(x) The following sections summarise the good features of the 0.4 GP model when used for the identification of dynamic 0.2 systems: 0 -0.2 • the prediction is given in the form of a Gaussian -0.4 0 0.2 0.4 0.6 0.8 1 distribution, and the variance can be seen as a mea-x sure of confidence in the prediction; this depends on the quality and location of the training data in the domain;3 Figure 7.2: Nonlinear function • the number of hyperparameters that have to be optimised in the identification is relatively small; For the identification, we choose the squared-exponential • the model is quite robust, as it works relatively well covariance function (7.5). Since there is only one input, even with a small amount of training data, e.g., in the function simplifies to: the nonequilibrium region; 1 C(x • i, xj ) = v1 exp − w(xi − xj)2 + v0δij. (7.16) it is possible to incorporate different types of prior 2 knowledge, such as knowledge about static features, local models, etc.; The optimisation leads to three hyperparameters, which have the following values v1 = 13.8, w = 4.2 and v0 = • the curse of dimensionality does not increase expo-0.0065. The identification results are shown in Figure 7.3. nentially with the amount of data but with the third It can be observed that the model poorly describes the power [22]; unknown function in a region not described by the training data (x > 1) and that the prediction in a region with • the model does not use scheduling variables nor is sparse data (i.e., described by only a few points), namely it necessary to divide the operational domain into x > 0.7. A good feature of the GP model is that it alerts sub-domains, as in local-model networks. us to a less well-described region by an increased variance in Figure 7.3, which is particularly visible for x > 1. It is less noticeable because of the lower noise is the sec-Of course, the GP model also has some disadvantages: ond property of the GP model, specifically the smoothing 3 of the training information it contains, where the model The ‘confidence measure’ in the predicted output is not exclusively the domain of GP models. How it can be determined for fuzzy smooths the noise contained in the training data to predict models is described, for example, in [14]. In general, it is a property the new output. of Bayesian modelling 77 Identification of nonlinear systems with Gaussian processes • it is a nonparametric model and, as such, useless identification methods [15], which we described in Chap-when a parametric model is needed, for example, for ter 3. We will highlight the steps of the identification control design; procedure that differ significantly from the same steps in other types of models due to the characteristics of the GP • it is computationally expensive to use the model in model. the case of a large training dataset; this is especially true for the hyperparameter-optimisation phase. Definition of the purpose of the model So, when should one use the GP model for the identification of dynamic systems instead of basis-function identifi-The purpose of identification is to obtain a model of a cation methods? It should be done when: system. The model is not an end in itself, but it is to be used for a specific task. This task defines the criteria • the information about the system is available in the that the model must fulfil in order to be accepted as a form of input/output data; satisfactory model. We can use the model for simulation, response prediction, control design, system analysis, fault • the available data is poor, contains a lot of noise, diagnosis, and other functions. measurement errors, missing and unevenly distributed data; • we need an indication of the uncertainty of the pre- Model set-up diction; • The next step in the identification process is to build the we have a relatively small, but not too small, number model. We set up the GP model based on a priori knowl-of data relative to the number of regressors. edge about the process, previous measurements, or validation results in the iterative system identification process. The construction contains: Identification procedure • the selection of an appropriate covariance function, In this subsection, we will first outline the identification procedure of dynamic systems with the GP modelling [2]. • the selection of regressors, and In system identification, the model is obtained from measurements, but of course we can also use our prior knowl- • the decision whether to include prior knowledge. edge of the system if we have it. Roughly speaking, identification with the GP model consists of the following steps: The choice of regressors when modelling dynamic systems simultaneously determines the order of the model. 1. defining the purpose of the model; 2. GP model set-up; The design of experiment 3. design of the experiment using a priori knowledge or a priori measurements; The next step in the identification process is the design of 4. experiment and signal preprocessing to obtain data the experiment, which is done on the basis of knowledge for training and validation; about the process gained from prior knowledge or previous measurements and according to the model setup. 5. training of the GP model, i.e., optimisation of the hyperparameters; and The design process is similar to other methods in the following aspects: 6. validation of the GP model. 1. the selection of inputs and outputs, taking care to At each of these steps, we can decide whether to proceed measure all influencing variables; to the next identification step or return to one of the previous ones. The whole identification procedure is rarely 2. the selection of the appropriate sampling time; done at once. It is usually an iterative procedure. System identification is complete when we decide, based on 3. the selection of an appropriate excitation signal so validation, that the model is sufficient for its purpose. that the process is described over the entire operating range of interest; this is a feature of the GP In the following, we will first briefly outline the different model that the model interpolates well in the region steps of the identification process. Most of the steps in the where we have training data and extrapolates poorly identification procedure are the same as in other system everywhere else. 78 7.2 Identification of dynamic systems with GP In designing the experiment, we need to consider con-Model training straints such as the following: Training a GP model is the same whether it is a model • disturbances of the measured signals; that is static or one that describes dynamic systems with the NARX model. During training, we optimise the hy- • limitations of the input signal amplitude for physical perparameters θ. These are not known in advance but reasons, safety reasons and limitations of the actua-must be determined from the training data. tors; Usually, the maximum-marginal-likelihood optimisation is • the limited time available to perform the identifica- used because it is simple but gives good results. To de-tion, e.g., when measuring in an industrial environ-termine the most probable values of the hyperparameters, ment. we can generally use any of the optimisation methods. [5] Since there is a possibility that the optimisation process Experiment and data processing becomes stuck at the local minimum, we usually repeat the training process several times for different initial values of The experiment is carried out as we planned it in the pre-the hyperparameters and validate the resulting models. vious section. The result of the experiment is measured input and output signals of the process that represent its behaviour. Model validation The input to GP is not the signals but the sampled val-The purpose of validation is to check the fit of the math-ues that determine the behaviour of the system at certain ematical model and the system under consideration [18]. values of the regressors. These samples are obtained by Although validation is a highly important step in the iden-sampling the input and output signals of the system being tification process that provides information about how modelled. The sampling time is determined in the design good the resulting model is, it is often not given sufficient phase of the experiment. attention. An example of a possible selection of input/output train-The quality of a model can be measured in several ways, ing pairs (with index i) for a system with input signal u the most important aspects being: and output signal y, which we would like to describe, for example, with a second-order NARX model: • model plausibility (model verification): here we are • interested in the consistency of the training output i: the output of the process at time model with prior knowledge; t = kT : yi = y(k); • • model falseness: here we are usually interested in corresponding training input: xi = [y(k − 1) y(k − the match between the model’s and the system’s re-2) u(k − 1) u(k − 2)], sponses; and where T is the sampling time of the input and output • model purposiveness: here we check whether the signals. The test data is constructed in the same way as model is appropriate for the intended task. the training data. An overview of validation methods can be found in [18, 8], A good feature of the GP model is that the significance of for example. each regressor is reflected by the value of the corresponding hyperparameter. This is called automatic relevance detec-Very often, in addition to the one-step-ahead prediction of tion (ARD). For a squared exponential covariance function the model, we use the results of the simulation, which are (7.5), for example, the larger the value of the hyperparam-compared to the behaviour of the system for validation. eter wd, the more significant the corresponding regressor We validate the NARX model as an output-error model. d. This property can only be used if the resulting training In the next subsection, we describe the simulation of the samples are normalised before being used for training, so GP model. that the sample values of the individual regressors are in the same size class. Simulation of dynamic GP models In the case of missing regressor values in a given regression vector, we can either replace these values, for example, with the average of the previous and subsequent samples Because of its form, the GP model is used as an input/output or discard such a regression vector or replace it with an-discrete-time dynamic model, often as the NARX model other one if we have enough data available. [25]. At time instant k, the input vector xk contains the 79 Identification of nonlinear systems with Gaussian processes past values of the inputs to the system u and the outputs • simulation without uncertainty propagation or the of the system y: ‘naive’ method, with which, similar to neural networks, we only take the mean value of the the pre-xk = [y(k − 1) . . . y(k − L) y(k − 1) . . . u(k − L)]. (7.17) dicted value and • In general, there are two possibilities for the realisation of simulation with uncertainty propagation, in which all a multi-step prediction: or part of the information about the predicted distribution is used. The procedure is described in detail • in [5], and a shortened version is given in [13, 4]. direct method, by which we learn multiple models, Since we cannot calculate the exact distributions at for each prediction horizon separately, or the output of the model, we must again resort to ap- • iterative method, by which we learn one model for proximations. We have two possible approximations one-step-ahead prediction, which we repeat itera- [5, 17]: tively. – analytical approximation of the statistical moments of the output distribution, where the out-The problem with the direct method is that we have to put with non-Gaussian distribution is approxi-choose the prediction horizon in advance. If we change it, mated by a Gaussian distribution with the same we have to learn a new model. Another problem with this mean and variance, which is called a statistical-method is that for strongly nonlinear systems and large moments-matching method (Figure 7.9), and horizons, a large number of data must be available [5]. – numerical Monte-Carlo method. Since the simulation is a multi-step-ahead prediction for an infinite horizon, the direct method is limited. u k-1 ( ) In contrast, the model for a one-step-ahead prediction can be built relatively easily, and by iteratively feeding back -1 Z the output values, it is possible to obtain a model for any u k ( -2) ... Gaussian Process ^ desired prediction horizon [5]. This method is also used N ( ( m ), k ( v )) k - L Model in basis-function models, for example, in the simulation of Z u k L ( - ) systems with neural networks. ... The problem with the iterative simulation method is error accumulation as we move forward in time. One possible solution to the problem is to eliminate the systematic er-N ( ^ m( k-1), ( v k-1)) -1 ror that arises from successive one-step-ahead predictions Z [26, 9]. However, using the GP model, there is also the ^ N ( ( ), m k-2 ( )) v k-2 -2 possibility that we do not eliminate the error, but use the Z . variance of the prediction for the evaluation of response .. ^ N ( ( ), m k-L ( )) v k-L [5, 13]. -L Z Simulation procedure Figure 7.4: Simulation schema for the identified GP model of the dynamic process Suppose we know the response history of the dynamic system of order L up to step k. Then, at step k + 1, we know So, when should one use the first or the second method? the complete input vector of the GP model: The uncertainty propagation method provides a more ac-xk+1 = [y(k) . . . y(k − L + 1) u(k) . . . u(k − L + 1)] curate measure of confidence in the prediction, but it is (7.18) more computationally intensive for both the analytical In step k + 2, the new input vector for the GP model is: and numerical approximations. In contrast, the method x without uncertainty propagation is fast and simple and k+2 = [y(k + 1) y(k) . . . y(k − L + 2) u(k + 1) . . . still provides a meaningful measure of confidence in the u(k − L + 2)]. (7.19) prediction. Admittedly, this measure is too optimistic be-The problem here is that we do not know the value of cause the cumulative error from the previous steps is not the output at time k + 1. We do something similar to taken into account in the prediction variance. neural-network models; instead of the real value of the output of the system y(k + 1), we use its estimated value Variance propagation not only leads to more accurate pre- ˆ y(k + 1) = f (x diction output, but also affects the mean of the predicted k ), calculated with the GP model. This is how we proceed in all further steps. distribution. Depending on the nonlinearity of the system, these differences can be larger or smaller. Since both We have two simulation options: the ‘naive’ and uncertainty-propagation simulations are 80 7.2 Identification of dynamic systems with GP approximations, we can only guess by how much the re-Identification sults of the latter are more accurate. However, the model 2 95% tolerance band uncertainty needs to be validated. mean value 1.5 identification data The ‘naive’ method is therefore used when we are inter-1 ested in speed and simplicity, and the uncertainty propagation when we are interested in a more accurate level 0.5 of confidence in the prediction and the simulation time is 0 not too limited. -0.5 -1 Example of identification of a dynamic system -1.5 0 20 40 60 80 100 Time [s] Let us illustrate the method described by identifying a first-order nonlinear dynamical system (3.9), which we will Figure 7.5: Simulation response to identification signal model as a first order model: y(k + 1) = f (y(k), u(k)). (7.20) 4 4 2 2 The function f is a GP, and we have a regression vector 0 0 y(k+1) -2 -2 with two variables D = 2. This means that we use the -4 predicted y(k+1) -4 squared-exponential covariance function to identify the 2 2 2 0 2 0 0 0 hyperparameters v -2 -2 -2 -2 0, v1, w1, w2. y(k) u(k) y(k) u(k) Input and output signals Figure 7.6: Nonlinearity of the system y(k+1)=f(u(k),y(k)) (left figure) and GP model (right figure) The main characteristics of the selected input and output signals are as follows: Original system Gaussian process model 3 3 2 2 • The input signal is defined in the range [-1.3, 1.3]. 1 1 0 0 y(k) y(k) • The sampling time selected according to the system -1 -1 dynamics is 0.5 s. -2 -2 -3 -3 -2 0 2 -2 0 2 • Input signal for identification (as in Chapter 3): u(k) u(k) Zoom Zoom 1 1 – is generated by a random number generator, 0.5 0.5 – the amount of data determines the dimension 0 0 y(k) y(k) of the covariance matrix, -0.5 -0.5 – the size of the covariance matrix determines the -1 -1 computational load. -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 u(k) u(k) • input signal for validation (the same as in Chapter 3): Figure 7.7: Comparison of prediction countours and gra- – generated by a random-number generator, but dients with different sampling and amplitude in the range [-1.2,1.2], shown in Figures 7.6 and 7.7. The analytical approxima- – a second validation signal with an amplitude tion with statistical moment matching was implemented outside the identified region, namely in the range for the simulation. [-1,3,1,5]. The validation simulation is shown in Figures 7.8 to 7.11. The response of the simulation to the input identification signal is shown as follows in Figure 7.5, and a comparison Responses can be quantitatively assessed using various of the model prediction with that of the original system is statistical measures. Particularly common are the mean 81 Identification of nonlinear systems with Gaussian processes Validation 1 Validation 2 2 3 1.5 2 1 1 0.5 0 0 -1 -0.5 -2 -1 95% tolerance band 95% tolerance band mean value -1.5 mean value -3 validation data validation data -2 -4 0 20 40 60 80 100 0 20 40 60 80 100 Time [s] Time [s] Figure 7.8: Simulation response to the first validation sig-Figure 7.9: Simulation response to second validation signal nal Standard deviation 0.45 absolute error and especially the mean of the square of error, which is especially suitable for all methods for which 0.4 the least-squares method is used for optimisation. An ex-0.35 ample of a statistical measure that takes the entire dis-0.3 tribution into account is the logarithm of the error of the prediction density (LD), which gives more weight to the 0.25 error of those predictions where the model with a lower 0.2 variance has a higher confidence in the prediction. For all three, the lower the value, the better the model. 0.15 0.1 These statistical measures are summarised in the following equations and results for our example: 0.050 20 40 60 80 100 Time [s] i = 1 X AE = | ˆ yi − yi| = 0.028, N i = 1 X Figure 7.10: Standard deviation of the simulation response SE = ( ˆ yi − yi)2 = 0.0016, N to the second signal for validation 1 X ( ˆ yi − yi)2 LD = (log(2π) + log(σ2) + ) = −1.3842, Training data 2N i σ2 i i 4 4 where: AE - average value of the absolute error s 2 2 SE - mean value of the square of the error 2 LD - logarithm of the error of the predictive density 0 0 2 2 2 0 2 0 0 0 -2 -2 -2 y(k) -2 u(k) y(k) u(k) 7.3 Example of the identification Figure 7.11: Uncertainty surface (left image) y(k + 1) = of the pH-neutralisation pro- f(u(k), y(k)) for GP model and position of training data cess (right figure) • Let us consider another example of the identification of covariance function: the pH-neutralisation process (2.20) [10], which we will " D # 1 X model with a higher order model. We have already used C(xp, xq) = v1 exp − wd(xp − xq )2 + v0δpq. 2 d d it to show the modelling of neural networks in Chapter 3. d=1 • Regressors: y(k−1), ..., y(k−4), u(k−1), ..., u(k−4), Identification implying ten hyperparameters to be optimised. • This implies a higher computational cost. In our We iteratively selected the following model: experience, it is about 110 times higher than for 82 7.4 Control design the first-order system, which depends on the amount Model response to validation signal of data used or the data required to model such a 12 95% tolerance band higher-order system. 11 mean value validation signal • 10 Optimisation method: conjugate gradients. 9 • Simulation with the statistical-moments-matching 8 method. 7 6 Model validation with simulation is shown in Figures 7.12 5 to 7.14. The values of the statistical measures are AE = 4 0.1494, SE = 0.0512 and LD = 27.87. 3 2 Identification 0 2000 4000 6000 8000 10000 10 Time [s] system 8 GP model 6 Figure 7.14: Simulation response to the validation signal 4 together with the tolerance band 20 2000 4000 6000 8000 10000 Time [s] Validation Responses to identification signal 10 10 system system 8 GP model 8 GP model 6 6 4 4 2 2 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Time [s] Time [s] Responses to validation signal 10 system 8 GP model Figure 7.12: Simulation response of the system and mean 6 values of the model 4 2 Error autocorrelation 0 2000 4000 6000 8000 10000 Time [s] 0.8 0.6 0.4 0.2 Figure 7.15: Validation with the second input signal 0 -0.2 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Error autocorrelation Lag [s] 0.8 0.6 Figure 7.13: Simulation-error autocorrelation 0.4 0.2 0 In Figure 7.14, the areas with higher variance can be seen, -0.2 which means lower confidence in the prediction. This is 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Lag [s] due to the model being validated with a signal that is dynamically very different from the signal we used to identify it. This means that the model is excited outside or at the Figure 7.16: Simulation-error autocorrelation edge of the region where it was identified. Therefore, we validate the system with a signal that also excites the sys-described for use with neural networks. We will illustrate tem in the region where it was identified (Figures 7.15 and the case of predictive control using a GP model for a first-7.16), which gives better validation results. order system (3.9) [11] and for the pH-neutralisation process [12] in the same way as in Chapter 4. 7.4 Control design We have the predictive functional control with the cost function As explained earlier, the GP model is a nonparametric J = min [r(k + P ) − ˆ y(k + P )]2 . U(k) probabilistic model. The control-design methods that can use this type of model are more or less the same as those The chosen prediction horizon is eight samples long and 83 Identification of nonlinear systems with Gaussian processes the control horizon is one sample. When optimising the control signal, we could consider the amplitude and rate constraints at the input, output, and states, but we only Closed-loop response with the tolerance band 2 considered the constraint on variance 95% tolerance band 1.5 var ˆ y(k + P ) ≤ k mean value v , which implicitly accounts for other 1 setpoint constraints. In the range where the system is constrained, 0.5 we cannot identify a good model or even any model at all, 0 which is indicated by the large value of the variance. -0.5 0 20 40 60 80 100 Time [s] The results of the setpoint tracking of the first-order sys-Control signal -0.4 tem (3.9) without considering the constraints are shown in -0.6 Figures 7.17 and 7.18, and with the constraint in Figures 7.19 and 7.20. -0.8 -1 Closed-loop response with the tolerance band -1.2 2 0 20 40 60 80 100 Time [s] 95% tolerance band 1.5 mean value setpoint 1 0.5 Figure 7.19: Closed-loop response: the case with the vari-0 ance constraint at 0.132 -0.5 0 10 20 30 40 50 60 70 80 90 100 Time [s] Control signal -0.4 -0.6 -0.8 Standard deviation -1 0.14 -1.2 0.12 -1.4 0 10 20 30 40 50 60 70 80 90 100 0.1 Time [s] 0.08 0.06 Figure 7.17: Closed-loop response - the unconstrained case 0.04 0.020 10 20 30 40 50 60 70 80 90 100 Standard deviation Time[s] 0.2 0.15 Figure 7.20: Standard deviation 0.1 0.05 00 10 20 30 40 50 60 70 80 90 100 Time [s] Closed-loop response with the tolerance band 8 95% tolerance band 7 mean value 6 setpoint Figure 7.18: Standard deviation 5 4 The same control principle was applied to the process of 3 pH neutralisation. The results of the setpoint tracking 0 1000 2000 3000 4000 5000 Time [s] without considering the constraint are shown in Figures Control signal 16 7.21 and 7.22, and with the constraint in Figures 7.23 and 14 7.24. 12 A simple conclusion that can be drawn from the results 10 shown is that predictive control can make sense of the 80 1000 2000 3000 4000 5000 information about model uncertainty. Predictive control Time [s] does not allow the process to enter a region where the prediction deviation is greater than a prescribed threshold, and in this way it maintains the safety of the control. Figure 7.21: pH-process: closed-loop response – the unconstrained case 84 7.4 Control design Standard deviation 0.4 0.3 0.2 0.1 00 1000 2000 3000 4000 5000 Time[s] Figure 7.22: Standard deviation Closed-loop response with the tolerance band 8 95% tolerance band 7 mean value setpoint 6 5 4 30 1000 2000 3000 4000 5000 Time [s] Control signal 16 14 12 10 80 1000 2000 3000 4000 5000 Time [s] Figure 7.23: pH process: closed-loop response – the case with variance constraint at 0.152 Standard deviation 0.16 0.14 0.12 0.1 0.08 0.06 0.040 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time [s] Figure 7.24: Standard deviation 85 Identification of nonlinear systems with Gaussian processes 86 Bibliography [1] K. Ažman, J. Kocijan (2005): An example of Gaus- [11] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, B. sian process model identification, L. Budin, S. Rib-Likar (2003): Predictive control with Gaussian pro-arić, (eds.), Proceedings of 28th International con-cess models. V: B. Zajc (edt.), M. Tkalčič (edt.). Pro-ference MIPRO, CIS – Intelligent Systems, Opatija, ceedings the IEEE Region 8 EUROCON 2003: com-79-84. puter as a tool, Vol. A, Ljubljana, 352-356. [2] K. Ažman, J. Kocijan (2006): Gaussian pro- [12] J. Kocijan, R. Murray-Smith (2005): Nonlinear pre-cess model validation: biotechnological case studies, dictive control with a Gaussian process model. V: R. I. Troch, F. Breitenecker, (eds.) Proceedings of the Murray-Smith (edt.), R. Shorten(edt.). Switching and 5th Vienna Symposium on Mathematical Modeling – learning in feedback systems, Lecture notes in com-MathMod, Dunaj. puter science, Vol. 3355, Springer, Heidelberg, 185-200. [3] M. N. Gibbs (1997): Bayesian Gaussian processes [13] J. Kocijan, A. Girard, B. Banko, R. Murray-Smith for regression and classification, PhD dissertation, (2005): Dynamic systems identification with Gaus-Cambridge University, Cambridge. sian processes, Mathematical and Computer Mod- elling of Dynamic Systems, Vol. 11, No. 4, 411-424. [4] A. Girard, C.E. Rasmussen, R. Murray-Smith (2002): Gaussian process priors with uncertain in- [14] I. Škrjanc, S. Blažič, O. Agamenonni (2005): Identi-puts: multiple-step-ahead prediction, Technical re-fication of dynamical systems with a robust interval port DCS TR-2002-119, University of Glasgow, Glas-fuzzy model, Automatica, Vol. 41, 327-332. gow. [15] L. Ljung (1999): System identification – theory [5] A. Girard (2004): Approximate methods for propa-for the user, Prentice Hall, New Jersey, the second gation of uncertainty with Gaussian process models, edition. PhD dissertation, University of Glasgow, Glasgow. [16] D. J. C. MacKay (1998): Introduction to Gaus- [6] G. Gregorčič, G. Lightbody (2002): Gaussian pro-sian processes, C. M. Bishop (edt.), Neural networks cesses for modelling of dynamic non-linear systems, and machine learning, NATO ASI Series – F 168, Proceedings in Irish Signals and Systems Conference, Springer-Verlag, Berlin, 133-166. Cork. [17] D. J. C. MacKay (2003): Information theory, in- [7] M. A. Henson and D. E. Seborg (1994): Adap-ference and learning algorithms, Chapter Gaussian tive nonlinear control of a pH neutralization process, Processes, Cambridge University Press, Cambridge, IEEE Trans. Control System Technology, Vol. 2, No. 535-548. 3, 169-183. [18] D. J. Murray-Smith (1998): Methods for the external validation of continuous system simulation models: a [8] N. Hvala, S. Strmčnik, D. Šel, S. Milanić, B. Banko review, Mathematical and Computer Modelling of (2005): Influence of model validation on proper se-Dynamical Systems, Vol. 4, 5-31. lection of process models — an industrial case study, Computers and Chemical Engineering, Vol. 29, 1507- [19] R. M. Neal (1996): Bayesian learning for neu- 1522. ral networks, Lecture Notes in Statistics, Vol. 118, Springer-Verlag, New York. [9] K. Judd, M. Small (2000): Towards long-term prediction, Physica D, Vol. 136, No. 1-2, 31-44. [20] V. Peterka (1981): Bayesian system identification, P. Eykhoff (edt.), Trends and Progress in System Iden- [10] J. Kocijan, B. Banko, B. Likar, A. Girard, R. Murray-tification, Pergamon Press, Oxford, 239-304. Smith, C.E. Rasmussen (2003): A case based comparison of identification with neural networks and Gaus- [21] J. Qui˜ nonero-Candela, C. E. Rasmussen (2005): sian process models, Proceedings in IFAC ICONS Analysis of some methods for reduced rank Gaus-conference, Faro, 137-142. sian process regression, R. Murray-Smith, (edt.), 87 BIBLIOGRAPHY R. Shorten, (edt.), Switching and Learning in Feed- [26] M. Small, K. Judd (1999): Variable prediction steps back Systems, Lecture Notes in Computer Science, and long term prediction, Technial report, Univer-Vol. 3355, Springer, Heidelberg, 98-127. sity of Western Australia, Dept. of Mathematics and Statistics. [22] C. E. Rasmussen (1996): Evaluation of Gaussian processes and other methods for nonlinear regresion, [27] C. K. I. Williams (1998): Prediction with Gaussian PhD dissertation, University of Toronto. processes: from linear regression and beyond, M. I. Jordan, (edt.), Learning in graphical models, Vol. 89 [23] C. E. Rasmussen, C. K. I. Williams (2006): Gaus-of Nato Science Series D, Springer, Berlin, 599-621. sian processes for machine learning, The MIT Press, Cambridge, MA. [28] C. K. I. Williams, M. Seeger (2001): Using the Nyström method to speed up kernel machines, T. K. [24] M. Seeger, C. K. I. Williams, N. D. Lawrence (2003): Leen, (edt.), T. G. Diettrich, (edt.),V. Tresp, (edt.), Fast forward selection to speed up sparse Gaussian Proceedings of Advances in Neural Information Pro-process regression, C.M. Bishop, (edt.), B.J. Frey, cessing Systems conference, Vol. 13, The MIT Press, (edt.), Proceedings of the Ninth International Work-MA, 682-688. shop on AI and Statistics, Key West, FL. [25] J. Sjöberg et al. (1995): Nonlinear black-box modeling in system identification: a unified overwiev, Automatica, Vol. 31, No. 12, 1691-1724. 88 Index activation function, 2, 6 Takens theorem, 29 Adaline, 2 theoretical model, 13 adaptive control, 52 autoregressive and moving average model with exogenous universal approximator, 5 input (ARMAX), 15 autoregressive model with exogenous input(ARX), 15 vector of scheduling variables, 54 velocity-based linearisation, 54, 56, 63 backpropagation, 2, 3 blended multiple-model systems, 58 Boltzmann machines, 3 delta rule, 3 dynamic system, 14 Fourier analysis, 19 fuzzy models, 28 gain-scheduling control, 63, 64, 66 Gaussian process, 73 GP model, 73, 75–77 Hopfield neural network, 3 Kohonen network, 3 linear model, 14 local-model networks, 51 multilayer perceptron, 5, 6, 51 multiple-model systems, 51 neural networks, 1, 5, 41 nonlinear mappings, 28 nonlinear systems, 27 one-step-ahead prediction, 31, 80 output error (OE), 15 perceptron, 2 predictive control, 43, 83 predictive functional control, 47, 83 radial basis functions, 29 radial basis-function network, 7, 8, 28, 51 regressors, 14, 28 ridge basis functions, 29 scheduling vector, 54 self-organised neural networks, 3 simulated annealing, 3 simulation, 31, 79, 80 static system, 14 system identification, 13, 14 89