Strojniški vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 © 2015 Journal of Mechanical Engineering. All rights reserved. D0l:10.5545/sv-jme.2014.1769 Original Scientific Paper Received for review: 2014-02-19 Received revised form: 2014-05-14 Accepted for publication: 2014-05-21 Crack Fault Detection for a Gearbox Using Discrete Wavelet Transform and an Adaptive Resonance Theory Neural Network Zhuang Li* - Zhiyong Ma - Yibing Liu - Wei Teng - Rui Jiang North China Electric Power University, School of Energy Power and Mechanical Engineering, China In this paper, a new approach using discrete wavelet transform and an adaptive resonance theory neural network for crack fault detection of a gearbox is proposed. With the use of a multi-resolution analytical property of the discrete wavelet transform, the signals are decomposed into a series of sub-bands. The changes of sub-band energy are thought to be caused by the crack fault. Therefore, the relative wavelet energy is proposed as a feature. An artificial neural network is introduced for the detection of crack faults. Due to differences in operating environments, it is difficult to acquire typical, known samples of such faults. An adaptive resonance theory neural network is proposed in order to recognize the changing trend of crack faults without known samples on the basis of extracting the relative wavelet energy as an input eigenvector. The proposed method is applied to the vibration signals collected from a gearbox to diagnose a gear crack fault. The results show that the relative wavelet energy can effectively extract the signal feature and that the adaptive resonance theory neural network can recognize the changing trend from the normal state to a crack fault before the occurrence of a broken tooth fault. Keywords: relative wavelet energy, pattern recognition, gearbox, fault detection, adaptive resonance theory, neural network Highlights • Early fault diagnosis of a gearbox. • Proposed relative wavelet energy for feature extraction. • Proposed an adaptive resonance theory neural network for recognizing crack faults. • Recognized the changing trend from the normal state to a crack fault without known samples. 0 INTRODUCTION A gearbox is a core component in rotating machinery and has been widely employed in various industrial equipment. The meshing of gear teeth is a dynamic process that generates dynamic excitation forces, i.e. elastic variable forces and collision forces, but also forces due to the sliding and rolling of tooth flanks [1]. The gear of a gearbox in operation bears alternating friction and impact loads, which easily lead to variable defects and faults. Detecting gearbox faults as early as possible is essential in order to avoid fatal breakdowns of machines and loss of production and casualties. Vibration signal analysis is the main technique for monitoring the condition and detecting faults in a gearbox. By employing appropriate signal processing methods, changes in vibration signals caused by faults can be detected to aid in evaluating the gearbox's health status. The development process from the normal state to a fault in the gearbox is a slow one. With the limitation of the mechanical structure and its working environment, it is difficult to directly measure the changes of state for a gearbox, e.g. gear wear and cracking. Generally, the changes of state are estimated by observing the changes of features extracted from vibration signals. Although a great variety of features provides information about different aspects of the working condition, it remains difficult to identify the condition only with a visual estimation. To solve this problem, pattern recognition is employed on the basis of feature extraction. With pattern recognition, the working condition of a gearbox can be classified, and faults can be detected automatically. Therefore, gearbox fault detection consists of feature extraction and pattern recognition [2]. New types of signal-processing techniques for feature extraction have emerged with different theoretical bases. Due to different working environments, not all the signal processing techniques work well for a specific system. Because the nonlinear factors (loads, friction, impact, etc.) have influence on gearbox vibration signals, choosing suitable signal processing techniques to acquire features for accurate and reliable fault detection should be considered. The main feature extraction methods include: timedomain methods, frequency-domain method and time-frequency methods. Time-domain and frequency-domain methods are the basic methods of signal processing. Features extracted with time-domain methods include peak amplitude, root-mean-square amplitude, kurtosis, crest factor, etc. [3]. Frequency-domain methods, including power spectrum, cepstrum analysis, and an envelope spectra technique, have been successfully applied to gear fault diagnosis [4] *Corr. Author's Address: North China Electric Power University, School of Energy Power and Mechanical Engineering, Beijing, China, 102206, lizhuang@ncepu.edu.cn 63 StrojniSki vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 to [5]. As the gearbox vibration signals possess non-stationary and non-linear characteristics, it is difficult to diagnose the fault only by using traditional timedomain or frequency-domain methods. To solve this problem, time-frequency methods have been proposed, e.g. short-time Fourier transform (STFT), Wigner-Ville distribution (WVD) and wavelet transform, and have been widely used [6] to [8]. In the abovementioned methods, the wavelet transform that has the capability to offer good frequency resolution for low-frequency components and good time resolution for high-frequency components provides an efficient method for non-stationary signals [9]. Furthermore, discrete wavelet transform (DWT) based on a Mallat algorithm has received widespread attention in recent years. The DWT can be represented as a filtering process in which the signal is separated into a series of sub-bands and wavelet coefficients that are distributed on different frequency bands to reflect the signal feature at each of sub-band [10]. The DWT has been acknowledged to be a successful tool for fault detection [11]. In recent years, many studies on artificial neural networks have been carried out with the aim of determining intelligent fault diagnosis to investigate the potential applications in pattern recognition. It is common to train a neural network by using samples so that it can recognize the required input-output characteristics and classify the unknown input patterns [12]. This type of neural network is based on supervised learning, including back-propagation (BP) neural networks, fuzzy networks, probability neural network, etc.; they are commonly used in fault diagnosis [13] to [15]. However, only the patterns that occur in the training samples can be classified. If a new pattern is classified by the neural network, an incorrect result will be given. Both new patterns with original training samples as well as renewal training are needed in order to enable the neural network to recognize new patterns. Therefore, a neural network based on supervised learning cannot function without training samples. To overcome this issue, some unsupervised neural networks have been developed, including self-organizing competitive neural networks, self-organizing feature map neural networks, and adaptive resonance theory networks. They are all used for implementing pattern recognition without training samples [16] to [18]. Regarding this matter, an adaptive resonance theory (ART) neural network can not only recognize objects in a way similar to a brain learning autonomously, but also can solve the plasticity-stability dilemma [18]. Its algorithm can accept new input patterns adaptively without modifying the trained neural network and/or increasing memory capacity with the species of samples. The process of learning, memory and training of an ART neural network proceed synchronously. ART was presented by Carpenter in 1976, and an ART neural network was presented in 1987 [19]. The type of ART neural network presented then was an ART-1 [19]. However, while an ART-1 neural network is appropriate for binary input, it is not appropriate in practical application. For adapting any types of input, an ART-2 neural network was presented [20], and it has been widely used in pattern recognition and fault detection. Lee et al. transferred the estimated parameters by using the ART-2 neural network with uneven vigilance parameters for fault isolation, which showed the effectiveness of the ART-2 neural network-based fault diagnosis method [21]. Lee et al. combined DWT and an ART-2 neural network for fault diagnosis of a dynamic system [22]. Obikawa and Shinozuka developed a monitoring system for classifying the levels of the tool flank wear of coated tools into some categories using an ART-2 neural network [23]. In this study, a new method for crack fault detection is proposed. Considering the non-stationary and non-linear characteristics of the signals, DWT is applied for feature extraction. The current situation of gearbox fault detection is time-consuming, and it is costly (or even impossible) to collect all kinds of known fault samples. Furthermore, an operating gearbox is influenced by its working environment. The samples obtained from a specific gearbox may not be suitable for other gearboxes with different working environments. There is a lack of known samplesfor the training of supervised neural networks. Therefore, an ART-2 neural network is proposed for state recognition and classification. Through the unsupervised classification of the samples via an ART-2 neural network, the changing trend from the normal state to a crack fault before a broken tooth fault occurs can be determined. Meanwhile, to verify the effectiveness of the ART-2 neural network, it is compared with a self-organizing competitive neural network and a self-organizing feature map neural network. This paper is organized as follows: in Section 1, the relative wavelet energy is proposed as a feature and an ART-2 neural network is presented for pattern recognition. In Section 2, the gearbox experiment is introduced. The relative wavelet energy of the signal samples are extracted and compared with the analysis in time and frequency domain, after which the ART-2 neural network is used for recognizing the changing 64 Li, Z. - Ma, Z. - Liu, Y. - Teng, W. - Jiang, R. Strojniski vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 trend of the crack fault before a broken tooth fault happens. The conclusion is given in Section 3. 1 THEORETICAL BACKGROUND OF DWT AND ART-2 NEURAL NETWORK 1.1 Fundamental of Wavelet Transform 1.1.1 Discrete Wavelet Transform (DWT) The basic analysis wavelet y/(t ) is a square integrable function, and it meets the following relationship: C.-Í v(m) -dœ < œ, œ (i) where is the Fourier transform of y/(t). Through translation and dilation, a member of the function can be derived from the y/(t). The equation can be described as follows: /x II-1 ,t - K Va,b (t) = a 2 v(—X (2) where ,b (t) is a member of the wavelet basis, a and b represent the scale parameter and translation parameter, respectively. The continuous wavelet transform of a finite-energy signal x(t) is defined as follows: Î+œ * —œ *(t)Va,ö (t)dt = a 2 Î+œ * t — b x(t)v (-)dt, —œ n (3) where * denotes complex conjugation and Ww (a, b) is wavelet coefficients. As seen in Eqs. (2) and (3), ya b (t) can be regarded as a window function. a and b are used to adjust the frequency and time location of the wavelet. A small a offers high-frequency resolution and is useful in extracting high-frequency components of signals. a increases in response to the decrease in frequency resolution but the increase in time resolution and low-frequency components is easily extracted. The DWT is derived from the CWT through the discretization of the parameters a and b. Generally, a is replaced by 2 and b is replaced by k2j (j, k e Z). Ww (a, b) can be shown as: W ( j, k ) = J x (t V* (t ) dt, (4) where y. k (t ) = 2- 2 y (2- j t - k ). The Mallat algorithm is a breakthrough of the DWT, providing a fast algorithm and achieves multiresolution analysis of signals. Wavelet filters are used for decomposition and re-construction. It is shown in Eqs. (5) to (7). A0[ x (t )] = x(t ), (5) Aj[x(t)] = XH(2t - k)Aj„Jx(t)], (6) k Dj [ x(t )] = X G (2t - k ) Aj -J x (t )], (7) k where x(t) is the original signal, j is the decomposition level (j = 1, 2, ..., J). H and G are wavelet decomposition filters for low-pass filtering and highpass filtering, respectively. Aj and Dj are the low frequency wavelet coefficients (Approximations) and the high frequency wavelet coefficients (Details) of signal x(t) at the jth level, respectively. The decomposition procedure of a J-level DWT is shown in Fig.1. It can be seen that Dj and Aj are obtained through high-pass filtering and low-pass filtering with down-sampling at each level. After the signal x(t) is decomposed by the J-level DWT, D^j at each level and AJ at the Jth level are obtained. Therefore, the DWT based on Mallat algorithm can be represented as a filtering process that the signal is decomposed into a series of sub-bands. Fig. 1. Decomposition procedure of J-level DWT Dj and Dj can be used to reconstruct the signal branch separately, which represents the signal component in each sub-band through up-sampling and reconstruction filter h and g. The reconstruction process is shown in Fig. 2. Dj (t) and Aj (t) are represented as the signal branch reconstruction of Dj and AJ, respectively. The original signal x(t) can be regarded as the sum of each component. It can be described as: x(t) = Aj (t) + X Dj (t). j=i Crack Fault Detection for a Gearbox Using Discrete Wavelet Transform and an Adaptive Resonance Theory Neural Network (8) 65 2 StrojniSki vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 Fig. 2. Reconstruction for the component of the signal in each sub-band 1.1.2 Relative Wavelet Energy The signal components derived through decomposition and reconstruction of DWT are distributed into independent sub-bands, and the component energy in each sub-band contains information available for fault detection. When a gear fault occurs, non-stationary and non-linear vibration energy is generated, which leads to a change of signal energy in some sub-bands. As the DWT has the characteristic of multi-resolution analysis, which makes it suitable for the analysis of non-stationary and non-linear signals, the DWT is used for feature extraction and the relative wavelet energy is proposed and calculated as the feature. The procedure is as follows: (1) Decomposition by J-level DWT for the N-point signal x(t) to obtained Dj (j = 1, 2, ..., J) and AJ. (2) Reconstruction for Dj and AJ to obtain the signal in each sub-band. are the same as components Dj (t) and AJ (t) The length of Dj (t) and that of x(t). Aj (t) (3) Let Aj(t) = DJ+1(t) and the wavelet sub-band energy in each sub-band is calculated as: Ej =\\Dj (t I =ÉK (k )|2, (9) (4) where N is the number of the data samples of Dj (t), k represents the time-series of data samples, and dj(k) is the data sample of Dj (t) (i.e. dj(k) e Dj(t), j = 1, 2, ..., J+1). The relative wavelet energy Oj in each sub-band is shown as: °j = Ej / Etotal, J+1 N 2 J+1 where Elola, =££|dj(k)| =£Ej. (10) j =1 k=1 j=1 According to the above analysis, it can be seen that the DWT has a multi-resolution analytical property and that the relative wavelet energy can reflect the energy distribution of signals in different sub-bands. Thus, the relative wavelet energy is chosen as the feature of signals and used for future work in pattern recognition. 1.2 Fundamental of ART-2 Neural Network 1.2.1 Structure of ART-2 Neural Network The structure of an ART-2 neural network is shown in Fig. 3. It consists of two subsystems: the attentional subsystem and the orienting subsystem. The attentional subsystem consists of two layers: the comparison layer (F1) and the recognition layer (F2). The orienting subsystem is the reset system, which is represented as the trigonal part R. The F1 layer that contains n groups of neurons is used to accept an n-dimension input pattern (x1,x2,..., xn). The F2 layer has m neurons, each of which represents a type of pattern or category. The neurons in the two layers form the short-term memory of the neural network. The F1 layer and F2 layer are connected by weights that form the long-term memory. With the processing of the F1 layer and weights, the input pattern is transferred to the F2 layer, and the output of the F2 layer (y1,y2,...,ym) is obtained. The maximum value of output is chosen, and the corresponding neuron is activated as the winning neuron. If the degree of match between the feedback of the F2 layer and the output of the F1 layer is less than the threshold value, the orienting subsystem will reset the F2 layer, and the activated neuron will be restrained. Next, the winning neuron is again chosen from the F2 layer until the degree of match meets the requirements, and the weights connected to the activated neuron are modified at the same time. Fig. 3. Structure of ART-2 neural network 1.2.2 Algorithm of ART-2 Neural Network A topological structure is shown in Fig. 4 that describes the connection between the /th group of 66 Li, Z. - Ma, Z. - Liu, Y. - Teng, W. - Jiang, R. Strojniski vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 neurons in the F1 layer and the 7th neuron in the F2 layer. It can be seen that the F1 layer includes three levels. Two types of neurons exist in each level. The neurons represented by hollow circles are used to calculate the module of the input vector and transfer the inhibitive incentive; the neurons represented by filled circles are used to transfer excited incentives. Fig. 4. Topological of ART-2 neural network The algorithm process of an ART-2 neural network is shown as follows: (1) Calculation in the F1 layer The lower level of F1 layer receives the input xj , and the upper level receives the feedback of the F2 layer. These two levels are combined with the middle level separately, and positive feedback loops are formed. The algorithms in each level are shown in Eqs. (11) to (16). z. = xi + aut, (11) q, = z, /(e + ||Z|j), (12) v, = f ( q ) + bf ( s i ), (13) U = v, /(e + |\V\|), (14) s i = P, /(e + | |P|| ), (15) m Pi = ui + Z g( yj î'ji ' (16) F2 layer to the neuron p in F1 layer and g(yj) is the feedback of the jth neuron in F2 layer. fx) is a non-linear transformation function, which is shown in Eq. (17). 20x2 /(x2 + 02) 00 f ( x) = (17) where 6 is the anti-noise coefficient ( 0 = (2) Calculation in the F2 layer The jth neuron in F2 level receives the output of neuron p which can be described as: T=1 Pi w. j = 1,2,..., m, (18) where Wj is the connected weight from the neuron pf in F1 layer to the jth neuron in F2 layer. The activated neuron is determined by Eq. (19): T * = max{T.} j = 1,2,..., m (19) where j* refers to the serial number of activated neuron. The feedback of each neuron in F2 layer is calculated as: g(y,)= d j = J o j * / (20) where d is the learning rate (0 < d< 1). According to Eq. (20), Eq. (16) can be described as: dtt Pi = J = J j * f (21) (3) Calculation of the orienting subsystem According to Eqs. (11) to (16), it can be seen that the vector U = (ub u2,..., ub..., un) contains the features of input vector X, and the feedback features of F2 layer are included in the vector P=(pi,p2,...,p j,...,pn). Through comparing the degree of match between the vectors U and P, the orienting subsystem can determine whether the F2 layer should be reset. The degree of match ||R|| can be calculated as: Wl= j=i cpi e + |\U\\ + | |cP|| (22) where a and b are the coefficients of positive feedback (a> 1, b > 1), e is far less than 1. j refers to the connected weight from the jth neuron in where c is the weighting coefficient (c < 1 /d-1). The greater ||R|| is, the more similar the vectors U and P are. Define parameter p as the threshold i=1 Crack Fault Detection for a Gearbox Using Discrete Wavelet Transform and an Adaptive Resonance Theory Neural Network 67 StrojniSki vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 value (0 p, the F2 layer need not be reset, and the weights are modified directly. Otherwise, the orienting subsystem will reset the F2 layer. The activated neuron is restrained and chosen again from the F2 layer. Furthermore, the degree of match is repeatedly calculated until it meets the requirements, i.e. ||,R|| > p. (4) Modification of the weights After the winning neuron j* is determined, the weights connected to the activated neuron are modified according to Eqs. (23) and (24). wf(k +1) = wf(k ) + d (1 - d )[ Uk ut (k ) - w *(k)], (23) t (k +1) = t (k) + d(1 - d-1 (k)]. (24) j' j' i - d i' 2 FEATURE EXTRACTION AND PATTERN RECOGNITION OF GEARBOX VIBRATION SIGNAL 2.1 Method for Crack Fault Detection A schematic representation of the proposed method is shown in Fig. 5. First, the sample series of the gearbox is acquired, and the relative wavelet energy features are extracted by DWT. Next, an ART-2 neural network is used for the recognition and classification of the sample series. Through the unsupervised classification, the samples of the same state are classified into the same category, and those of different states are classified into separate categories. Finally, the recognition result is output, and the changing trend from the normal state to the crack fault can be recognized from the classification of the samples. 2.2 Experiment Specification Fig. 6 shows a diagram of the experimental system used for analysing the changing trend of the crack fault. The gearbox is single-stage with helical cylindrical gears. Table 1 lists the parameters of the experimental system. Gearbox Machine Fig. 6. Structure of experiment gearbox Table 1. Parameters of the experimental system Motor Rated speed 1120 rpm Number of teeth of driving gear 75 Gearbox Number of teeth of driven gear 17 Mesh frequency 1400 Hz Fig. 5. Scheme of the proposed method The vibration signal of the gearbox was collected with a piezoelectric accelerometer, and the sampling frequency was 8000 Hz. The process of the driven gear from the normal state to a broken tooth fault was recorded with a monitoring system. The entire measuring time was 8 minutes, during which it took approximately 90 seconds for the motor to reach its rated speed. The rated speed was maintained for 290 seconds with constant load, after which a broken tooth fault occurred on the driven gear. The collected vibration signal contained the information of the change of state for the driven gear changes: normal state, crack formation, crack expansion and broken tooth fault. 2.3 Analysis of Time and Frequency Domain The time vibration signal is shown in Fig. 7a. In the first 90 seconds, the vibration amplitude increases with the rotational speed. The amplitude changes to steady in the following 290 seconds in which rotational speed of the motor reaches the rated speed, and the system enters a normal working state. At 380 s, the amplitude increases noticeably, and a broken tooth fault occurs. 68 Li, Z. - Ma, Z. - Liu, Y. - Teng, W. - Jiang, R. Strojniski vestnik - Journal of Mechanical Engineering 61(2015)1, 63-73 „ 200