Strojniški vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 © 2018 Journal of Mechanical Engineering. All rights reserved. D0l:10.5545/sv-jme.2018.5249 Original Scientific Paper Received for review: 2018-01-30 Received revised form: 2018-05-08 Accepted for publication: 2018-05-23 An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM Honghu Pan - Xingxi He* - Sai Tang - Fanming Meng Chongqing University, The State Key Laboratory of Mechanical Transmissions, China As one of the most critical components in rotating machinery, bearing fault diagnosis has attracted many researchers' attention. The traditional methods for bearing fault diagnosis normally requires three steps, including data pre-processing, feature extraction and pattern classification, which require much expertise and experience. This paper takes advantage of deep learning algorithms and proposes an improved bearing fault diagnosis method based on a convolutional neural network (CNN) and a long-short-term memory (LSTM) recurrent neural network whose input is the raw sampling signal without any pre-processing or traditional feature extraction. The CNN is frequently used in image classification as it could extract features automatically from high-dimensional data, while LSTM is most applied in speech recognition as it considers time coherence. This paper combined one-dimensional CNN and LSTM into one unified structure by using the CNN's output as input to the LSTM to identify the bearing fault types. First, a part of raw bearing signal data is used as the training dataset in the model, and the simulation ends when the number of iterations reaches a specific value. Second, the rest of the signal data was input in the trained model as the testing dataset to verify the effectiveness of the proposed method. The results show that the average accuracy rate in the testing dataset of this proposed method reaches more than 99 %, which outperforms other algorithms for bearing fault diagnosis. Keywords: bearing fault diagnosis, CNN, LSTM Highlights • An improved bearing fault diagnosis method based on deep learning algorithms is proposed in this paper. • To take advantages of CNN and LSTM, this proposed model combined them into one structure by taking CNN's output as LSTM's input. • This proposed model requires no traditional feature extraction, which is the most difficult step in traditional fault diagnosis methods. • A comparison experiment with other deep learning-based models and traditional methods proved the effectiveness of this proposed method. 0 INTRODUCTION The bearing is the most important component in rotating machinery; its main function is to support the mechanical rotating body and reduce the friction coefficient during movement. However, continuous abrasion resulting from the relative motion between mating surfaces would cause the components' damage, and several studies have shown that the bearing fault is the major source in rotating machinery faults [1]. An effective fault diagnosis method could obtain the healthy condition of bearings and probe the fault patterns, which are also the most challenging tasks in fault diagnosis. The traditional methods for bearing fault diagnosis using vibration signals mainly includes three steps: data pre-processing, feature extraction, and pattern classification. The features that are commonly extracted have been generated from the time domain [2], the frequency domain [3], or the time-frequency domain [4]. Next, the extracted features are fed into classifiers such as a support vector machine (SVM) [5] and [6], a decision tree [7], a BP neural network [8], etc. The difficulty of traditional fault diagnosis methods lies in the selection of features. Any feature has its own limitations [9], e.g., the time domain feature could not detect the faulty component, the frequency domain feature is unable to identify the location of damage, the envelop analysis requires prior knowledge and professional experience and the wavelet tree feature requires pre-selection of the suitable mother wavelet and appropriate level of decomposition. In recent years, deep learning algorithms have aroused the widespread attention of researchers as it could discover intricate structures in big data [10]. Compared with traditional machine learning algorithms, deep learning has made great progress in image recognition [11] and speech recognition [12]. Furthermore, a large number of academic achievements emerged in the field of bearing fault diagnosis using deep learning algorithms. Sun at al. [13] extracted the wavelet features and selected convolutional neural network (CNN) as the classifier, whose accuracy rate reaches 99.79 %; He et al. [14] proposed an unsupervised fault diagnosis based on a deep belief network (DBN), and it has been proven *Corr. Author's Address: Chongqing University, The State Key Laboratory of Mechanical Transmissions, Chongqing 400044, China, xingxi@cqu.edu.cn 443 Strojniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 to outperform the back propagation neural network (BPNN) and support vector machine (SVM); Yin at al. [15] proposed an effective health assessment model by integrating Isomap into DBN with extracting timedomain, frequency-domain, wavelet packet features. As illustrated in the literature [16], CNN is good at reducing frequency variations, and a long-short term memory (LSTM) recurrent neural network is appropriate for temporal modeling. This paper takes advantage of both CNN and LSTM and proposed an improved fault diagnosis method by combining a one-dimensional CNN and LSTM into one structure. By using this method, the limitations of traditional feature extraction can be avoided since the input of the model is the raw signal data and no traditional feature extraction is needed. The rest of this paper is illustrated as follows: Section 1 introduces one-dimensional CNN and LSTM briefly and describes the method to combine them into one structure. Section 2 presents the bearing fault diagnosis and the results by using the proposed model. Section 3 shows the comparison results with other models. Section 4 draws the conclusion. 1 METHODS The structure of the proposed method is shown in Fig. 1, which consists of five layers including the input layer, the convolutional layer, the pooling layer, the LSTM layer and the output layer. Among above layers, the convolutional layer and pooling layer are applied in the CNN model, which has been proven effective in image recognition [11]. The convolution operation changes the input data into smaller feature maps through convolutional kernels. The convolutional kernels and feature maps are usually two-dimensional as the input of CNN are two-dimensional figures. To meet the one-dimensional -characteristic of mechanical signals, this paper constructs a one-dimensional convolutional neural network, whose convolutional kernels and feature maps are all one-dimensional. Suppose the input of a one-dimensional convolutional neural network is x, which belongs to Rnx1, where n is the length of the input data. Then the output of the convolutional layer can be calculated as follows [17]: yt, j =f * wj +bi)' (i) where the y, j,k is the output of the convolutional layer, 1 < i < m, m is the number of samples, 1 < j < p, p is the length of the convolutional kernels, 1 < k < n, f is the activation function, typically a hyperbolic tangent, relu, or sigmoid function; is the input data; * is the convolution operation; xi,k is the weight and bi is the bias. The pooling layer is the sub-sampling layer to reduce the size of feature maps and prevent the overfitting. The Max pooling method is frequently used in the pooling layer whose output is the 444 y'm.p. 1 ym.p.r, Fig. 1. Combined structure of one-dimensional CNN and LSTM Pan, H. - He, X. - Tang, S. - Meng, F. Strajniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 maximum of the previous feature maps, which can be expressed as follows [17]: °i,j ,k = max ( x 2i-1,j,k , X2i i ,j ,k )' (2) where zl,j,k is the output of the pooling layer, and 1 < l < m/2. The output of the one-dimensional CNN is taken as the input of the LSTM to reduce variance in time series. In order to solve the problem of gradient disappearance and gradient explosion in standard recurrent neural networks (RNN), Hochreiter and Schmidhuber [18] proposed the LSTM. The main difference between the LSTM and standard RNN is that the hidden units' structure of the standard RNN was replaced by LSTM cells. As shown in Fig. 2, an LSTM's cell consists of three gate structures, i.e., the forget gate, input gate and output gate and a cell structure. Fig. 2. Structure of a cell of LSTM The output of the CNN layer is divided into m/2 segments, which means the input of the LSTM layer has m/2 time series. The forget gate determines how much previous information could pass, whose output could be calculated as follows: f =o(wfizt + whfht - + bf ) (3) where the a is a sigmoid function; w is weight; zt is the current input, 1 < t < m/2; ht-1 is the output of the previous cell; bf is the bias. The input gate determines the new information that could be saved in the cell, which could be calculated as: h =°{(izt + whiht-i + bi), (4) Ct = tanh (wxczt + whcht - + b). (5) The output gate determines what information to output from the cell state, whose output can be expressed as follows: ct = ct -Jt + hCt > °t =°{(oZt + Whoht-1 + bo ) ht = o Xtanh (c ). (6) (7) (8) Behind the LSTM layer is the softmax layer for classification, which could be calculated as follows: softmax (yt ) = S ^ ' (9) where ut is the ith output of the former layer. The output class label can be obtained after the softmax layer, which was compared with the true label of the experimental data. The backpropagation (BP) algorithm [19] was introduced to train the model, which could adjust the weights and biases by comparing the output class label with the true label and propagating the output layer's error back through the network to minimize the loss function L, which is calculated as follows: L = -—X[u ln u' + (1 - u )ln (l - u' )], (10) where m is the number of samples; u is the true label, and u' is the output class. In the process of back propagation, the value of weight and bias are adjusted continuously until the number of iterations reaches the specific value. The parameters adjustment could be expressed as follows: dL nn w = -£ —, (11) bt = bt-1 ~s dw dL db ' (12) where s is the learning rate, which determines the updating speed of parameters; wt, bt represent the value of weight and bias in tth iteration; wt-1, bt-1 represent the value of weight and bias in (t-1)th iteration. The training processing was presented in Fig. 3, the output class was compared with the true label of the sample, and parameters were adjusted based on the above BP algorithm. After several iterations, the value of loss function became tiny, which means the parameters have met the samples' characteristics. To accelerate the training process and prevent the local optima, the mini-batch gradient descent algorithm [12] is applied, in which the batch size of training samples is selected for iteration. u e An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM 445 Strojniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 Fig. 3. The training processing Fig. 4. The testing processing After training processing finished, testing datasets is about to be input into the trained model, which was presented in Fig. 4. Parameters for testing datasets use what was updated in the last iteration of training processing. To validate the algorithm's efficiency, the accuracy rate Ar was introduced, which could be calculated with the following formula: N Ar x 100 [%], N (13) where Nr is the number of correctly predicted samples, and Nt is the number of total samples. The flow chart of this whole proposed model is shown in Fig. 5, where N is the max iteration epoch. Firstly, vibration signal was collected in test stand and then divided into a training dataset and a testing dataset; Secondly, the training dataset was used to train the model, and optimal parameters would be acquired after several iterations; Thirdly, the testing dataset was input to the trained model to obtain its predicted class, so the accuracy rate could be calculated based on Eq. (13). 2 BEARING FAULT DIAGNOSIS 2.1 Experimental Setup and Data Acquisition In order to validate the proposed method, the bearing vibration data from the Case Western Reserve University (CWRU) Bearing Data Centre is applied [20]. As shown in Fig. 6, the test stand consists of a motor, a torque transducer, a dynamometer and the control electronics (not shown). The SKF bearings were used in this experiment, whose fault was introduced by electro-discharge machining. The vibration data was collected for four health conditions, i.e., the normal condition, the ball fault, the inner race fault and the outer race fault, and the fault diameters are 0.18 mm, 0.36 mm, and 0.53 mm, respectively. The data used in this paper were collected from the drive end of the test bench. The motor provides an output power of 2.2 kW, the sampling frequency is 48 kHz and the sampling time for each dataset is approximately 10 s. The rotating speed of the shaft is 1725 r/min, which means about 1670 data points will be collected for one revolution. The first 48,430 points of each dataset are chosen and divided into 290 samples, and therefore each sample includes 446 Pan, H. - He, X. - Tang, S. - Meng, F. Strajniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 Table 1. Description of dataset Fig. 5. Flowchart of the proposed model 1670 points collected in one revolution. To reduce the impact of equipment fluctuations and to ensure the points in each sample are collected within the same revolution, the first 35 points and the last 35 points in each sample are discarded; therefore, every sample includes only 1600 points. Each dataset represents a state of bearing health conditions and contains 290 samples, in which 240 samples are selected randomly as the training dataset, and the rest are used as the test set. As shown in Table 1, the proposed model processed a total of ten datasets, including one normal state and three fault types, i.e. the ball fault, the inner race fault and the outer race fault while each fault type has three variations by size. Fig. 7 shows the vibration signals of ten health conditions as referred in the Table 1, and it is hard to classify them just by intuition. Health condition Fault size [mm] Training dataset Testing dataset Class label a) normal - 240 50 0 b) ball fault 0.18 240 50 1 c) ball fault 0.36 240 50 2 d) ball fault 0.53 240 50 3 e) inner race fault 0.18 240 50 4 f) inner race fault 0.36 240 50 5 g) inner race fault 0.53 240 50 6 h) outer race fault 0.18 240 50 7 i) outer race fault 0.36 240 50 8 j) outer race fault 0.53 240 50 9 2.2 Training Results The model was illustrated in Section 2 in detail. The input data has three dimensions; the first dimension is the number of samples; the second dimension represents time steps, and the third dimension is a default value. The original training dataset with the dimension of 2400 * 1600 is changed into 2400 * 80 * 20, while for the testing dataset the dimension is changed to 500 * 80 * 20. The structure of this proposed model was shown in Fig. 1. A convolutional layer is set behind the input layer, whose kernel length is chosen to be 32 and kernel channel to be 64, then the dimension of this layer's output is 2400 * 80 * 32 or 500 * 80 * 32. Behind the convolutional layer is the pooling layer, the pooling length and stride are both chosen to be 2, which means the dimension of this layer's output is 2400 * 40 * 32 or 500 * 40 * 32. Following the pooling layer, a 'relu' activation function is introduced to increase the nonlinear properties, and 0.2 dropout is introduced to prevent overfitting. The fourth layer is the LSTM layer with 128 cells, whose output dimension is 2400 * 128 or 500 * 128. The last layer is the output layer with a softmax classifier, and the data turns into 2400 * 10 or 500 * 10. Settings of the model were shown in Table 2., and the Adam optimizer [21] is chosen to minimize the loss function. This proposed model is developed based on Python and implemented in an open resource library (keras). All experiments are performed on a computer with 4 GB GPU whose type is GTX 1050 Ti. One important task for deep learning models is the adjustment of hyperparameters, and this paper takes the batch size and learning rate as hyperparameters. As mentioned in Section 1, the mini-batch gradient descent algorithm was used to minimize the loss function, and then the batch size defines the number An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM 447 00 =i S" O" H - ■ ^ O r-o II > > a □ Q. O o 3 rn - — ^ o "O g cn B3 OJ. -a o ^ —■ o t/> = o o O 3 t/> o ^ y. H. m' 3 — 3 "" ,, M II A imml A Tmrnl A Traml A Trnml CD IO Learning rate era; a- i» o 8 r5 o « c 3 troi Q. S T! 3 £ 15 c -> Q. Q. CD O c? w CJ> 3 =r Q. x o A O 9? 3 S. Q. 3 ™ Cr c? C/J 3 03 3 S1 "2. 5 3' «> Oo. T3 A imml U1 O U1 TO 8 A [mm] A [mm] A [mm] A [mm] A [mm] NS O N5 Ui O Ui v> S TO § TO § TO S era C5 3 Q. O- cr CD 5 0Q Strajniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 of samples to be processed in one batch. The learning rate defines the updating speed of some parameters such as weight and bias. The accuracy rate in the testing dataset reflects the effectiveness of the model, which equals the number of correctly predicted samples divided by the total number of testing dataset samples. The results for different configurations are shown in Table 3; as can be seen, the best configuration with the batch size of 80 and the learning rate of 0.004 presents a completely correct prediction. Otherwise, the average accuracy rate is over 99 %, which proves the effectiveness of the proposed model. 350 batch size= =20 -batch size- =40 —batch size= =60 —batch size- =80 —batch size- =100 100 - 50 0 1 2 3 4 5 6 Learning rate . . 10"3 Fig. 8. Running time for different configurations As shown in Table 3, the value of batch size has little effect on the prediction accuracy, while a large learning rate may lead to a decline in prediction accuracy because the parameters tend to oscillate instead of converge. Considering the similar accuracy rate for different configurations, the computing time is introduced for evaluation. As shown in Fig. 8, the main influencing factor of computing time is the batch size. The computing time for the batch size of 20 is four times more than that for the batch size of 100. Taking accuracy rate and computing time into consideration, the configuration with the batch size of 80 and the learning rate of 0.004 is recommended. 3 COMPARISONS 3.1 Comparison with Other Deep Learning Models The comparisons between the proposed method and a DNN model, a CNN model and a LSTM model are presented in this section. The DNN model includes three layers with 256 neurons, 256 neurons, 10 neurons respectively and 0.2 dropout to prevent overfitting; otherwise, the 'relu' activation function is used to introduce non-linear properties. The configuration and parameters for the CNN model and the LSTM in this section use the same values in Table 2. 0.2- 1 - 1- 1 - 1 - 0 10 20 30 40 50 Iterations Fig. 9. Accuracy rate in training dataset of models using deep learning algorithms The accuracy rate in the training dataset is used to evaluate the models; as shown in Fig. 9, the iteration speed of the LSTM model is the slowest, which requires nearly 50 iterations to achieve a 90 % accuracy rate. The CNN model performs as good as the proposed method in training dataset and better than the DNN. Table 4 presents the accuracy rate in the testing dataset for different models, the three models for comparison achieve the accuracy rate of 98.8 %, 93.8 % and 80.8 %, respectively. Combined with Fig. 8 and Table 4, the accuracy rate of the DNN model in the testing dataset is much lower than that in training dataset due to the overfitting. Table 4. Accuracy rate in testing dataset of different models Methods Proposed method CNN LSTM DNN Accuracy rate [%] 100 98.8 93.8 80.8 The conclusion can be drawn that the proposed model has a higher accuracy rate in the testing dataset than the CNN model, it iterates faster than LSTM model, and it is more effective in reducing overfitting than DNN model. 3.2 Comparison with Traditional Fault Diagnosis Methods The traditional fault diagnosis methods require feature extraction before feeding data to the classifier. This paper extracted the time domain features, wavelet packet features and empirical mode decomposition (EMD) features to feed into a commonly used classifier decision tree. Table 5 presents the extracted time domain features, while wavelet packet features and EMD features are given in [22] and [23] respectively. The training and testing datasets use what mentioned in 2.1 (240 datasets for training, 50 datasets for testing). An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM 449 Strojniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 Table 5. Time domain features Features Formula Mean 1 N x=-y x. Nt1 ' Root mean square XmS "J N it X Variance 1 N 9 D =— Y( -X)2 Peak value ) Peak to peak value Xp-p = max (xi ) - min ( xt ) Kurtosis Dx = 1Z ( - X )4 i1 Z (x - X )41 2 Skewness 5 = i i r x - x Y N it I ^X J Crest factor C XP Cf ~ X rms Impulse factor I ii Xx The performance of models using different features is given in Table 6; the accuracy rate is in the range between 75 % and 90 % by using traditional methods. It can be concluded that the training results by using different features vary greatly and the accuracy rate of the previous experiment is nearly 10 % higher than the latter one, which proves the choice of the feature is a difficult step in traditional methods. The same feature extracted in different ways may cause different results as well. The wavelet packet feature of db4 mother wavelet outperforms than that of db3 mother wavelet and haar mother wavelet. Therefore, the overall performance of the traditional methods is worse than the deep learning-based models. Compared with traditional or previously-used techniques of bearing fault diagnosis, the advantages of using the deep learning networks mentioned in the paper lie in the following aspects: 1) Comparably fewer iterations are required, and the computing time is acceptable. Based on Fig. 8, with the batch size of 80, the simulation time takes only 100 seconds. 2) Even though the rate of accuracy is not exactly 100 %, the result is remarkably high and quite close to 100 % (with the batch size of 80, different learning rates result in the accuracy rate all above 99.4 %). 3) No feature extraction based on prior knowledge, diagnostic experiences or professional expertise is required, and the constructed deep learning networks can extract features automatically. Traditional feature extraction methods can be quite complicated, e.g., different features suit for different signals, for instance, Fourier transform suits stationary signals while wavelet transform suits non-stationary signals. Or those methods might be hard to fully reflect the fault characteristics. Automatic feature extraction can avoid these complexities and uncertainties. 4 CONCLUSIONS In traditional methods for bearing fault diagnosis and detection, it's necessary to extract some features to describe the signal. However, different features suit different conditions, which requires much expertise and a priori knowledge. The application of deep learning algorithms helps solve the problem as the deep learning algorithms such as the DNN, CNN, and LSTM have been proved capable of discovering intricate structures in big data. Table 6. Performance of different methods Methods Accuracy rate in testing dataset [%] Time domain + decision tree 80.6 EMD + decision tree 77 Wavelet packet(mother wavelet: haar) + decision tree 79.8 Wavelet packet(mother wavelet: db3) + decision tree 86.6 Wavelet packet(mother wavelet: db4) + decision tree 87.8 This paper proposed an improved bearing fault diagnosis method by combining one-dimensional CNN and LSTM into one structure. Considering CNN's advantage in reducing frequency variance and LSTM's advantage in the temporal model, the output of CNN was taken as the input of LSTM. The raw signal data collected by sensors is divided into a training dataset and testing dataset. The training dataset was used to determine the inner parameters in the model. After that, the testing dataset was fed into the trained model to verify its effectiveness. The results show that the average accuracy rate in the testing dataset is over 99 %; moreover, the model in 450 Pan, H. - He, X. - Tang, S. - Meng, F. Strajniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 its best configuration presents a completely correct prediction. Compared with other deep learning-based models, the proposed model has advantages: firstly, it achieves the highest prediction accuracy in the testing dataset; secondly, it iterates faster than the LSTM model; thirdly, the proposed model is more efficient to prevent overfitting than the DNN model. Compared with the traditional fault diagnosis methods, this proposed method has the following advantages: firstly, the traditional feature extraction is not required, which eliminates the interference of inappropriate features; secondly, the prediction accuracy of this proposed method is much higher than that of the traditional methods. However, this proposed model has its own limitation: its main disadvantage is the large amount of computation required. As the improvement of computer's computation power, this proposed method can be extended to more complicated mechanical systems, such as gearboxes. Future research will focus on the further improvement of the algorithm and its applications in other field. 5 ACKNOWLEDGEMENTS This work was supported by the Fundamental Research Funds for the Central Universities of P.R. China (No. 106112016CDJZR288805). 6 REFERENCES [1] Vekteris, V., Trumpa, A., Turla, V., Šešok, N., Iljin, I. Moškin, V., Kilikevičius, A., Jakštas, A., Kleiza, J. (2017). An investigation into fault diagnosis in a rotor-bearing system with dampers used in centrifugal milk separators. Transactions of FAMENA, vol. 41, no. 2, p. 77-86, DOI:10.21278/TOF.41207. [2] Liu, F., Qian, Q., Liu, F., Lu, S., He, Q., Zhao, J. (2017). Wayside bearing fault diagnosis based on envelope analysis paved with time-domain interpolation resampling and weighted-correlation-coefficient-guided stochastic resonance. Shock and Vibration, vol. 2017, art ID 3189135, DOI:10.1155/2017/3189135. [3] Mckee, K.K., Forbes, G.L., Mazhar, I., Entwistle, R., Hodkiewicz, M., Howard, I. (2015). A vibration cavitation sensitivity parameter based on spectral and statistical methods. Expert Systems with Applications, vol. 42, no. 1, p. 67-78, DOI:10.1016/j.eswa.2014.07.029. [4] Do, V.T., Nguyen, L. C. (2016). Adaptive empirical mode decomposition for bearing fault detection. Strojniški vestnik -Journal of Mechanical Engineering, vol. 62, no. 5, p. 281, 290, DOI:10.5545/sv-jme.2015.3079. [5] Fernández-Francos, D., Martínez-Rego, D., Fontenla-Romero, O., Alonso-Betanzos, A. (2013). Automatic bearing fault diagnosis based on one-class v-SVM. Computers & Industrial Engineering, vol. 64 no. 1, p. 357-365, DOI:10.1016/j. cie.2012.10.013. [6] Zhang, X., Liang, Y., Zhou, J., Zang, Y. (2015). A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM. Measurement, vol. 69, p. 164-179, DOI:10.1016/j. measurement.2015.03.017. [7] Aydin, I., Karakose, M., Akin, E. (2014). An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space. ISA Transactions, vol. 53, no. 2, p. 220-229, DOI:10.1016/j. isatra.2013.11.004. [8] Zhao, Z., Xu, Q., Jia, M. (2016). Improved shuffled frog leaping algorithm-based BP neural network and its application in bearing early fault diagnosis. Neural Computing and Applications, vol. 27, no. 2, p. 375-385, DOI:10.1007/s00521-015-1850-y. [9] Boudiaf, A., Moussaoui, A., Dahane, A., Atoui, I. (2016). A comparative study of various methods of bearing faults diagnosis using the case Western Reserve University data. Journal of Failure Analysis and Prevention, vol. 16, no. 2, p. 271-284, DOI:10.1007/s11668-016-0080-7. [10] Lecun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature, vol. 521, no. 7553, p. 436-444, DOI:10.1038/nature14539. [11] He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, DOI:10.1109/cvpr.2016.90. [12] Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V. (2012). Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, vol. 29, no. 6, p. 82-97, DOI:10.1109/msp.2012.2205597. [13] Sun, W., Yao, B., Zeng, N., Chen, B., He, Y., Cao, X., He, W. (2017). An intelligent gear fault diagnosis methodology using a complex wavelet enhanced convolutional neural network. Materials, vol. 10, no. 7, p. 790, DOI:10.3390/ma10070790. [14] He, J., Yang, S., Gan, C. (2017). Unsupervised fault diagnosis of a gear transmission chain using a deep belief network. Sensors, vol. 17, no. 7, p. 1564, DOI:10.3390/s17071564. [15] Yin, A., Lu, J., Dai, Z., Li, J., Ouyang, Q. (2016). Isomap and deep belief network-based machine health combined assessment model. Strojniški vestnik - Journal of Mechanical Engineering, vol. 62, no. 12, p. 740-750, DOI:10.5545/sv-jme.2016.3694. [16] Sainath, T.N., Vinyals, O., Senior, A., Sak, H. (2015). Convolutional, long short-term memory, fully connected deep neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, p. 4580-4584, DOI:10.1109/ icassp.2015.7178838. [17] Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G. (2015). A convolutional neural network cascade for face detection. IEEE Conference on Computer Vision and Pattern Recognition, DOI:10.1109/cvpr.2015.7299170. [18] Hochreiter, S., Schmidhuber, J. (1997). Long short-term memory. Neural Computation, vol. 9, no. 8, p. 1735-1780, DOI:10.1162/neco.1997.9.8.1735. [19] Horikawa, S., Furuhashi, T., Uchikawa, Y. (1992). On fuzzy modeling using fuzzy neural networks with the back- An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM 451 Strojniski vestnik - Journal of Mechanical Engineering 64(2018)7-8, 443-452 propagation algorithm. IEEE Transactions on Neural Networks, vol. 3, no. 5, p. 801-806, DOI:10.1109/72.159069. [20] Loparo, K. (1998). Bearings Vibration Data Set, Cast Western Reverse University, from: https://csegroups.case.edu/ bearingdatacenter/home, accessed on 1998-July-25 [21] Kingma, D., Ba, J. (2015). Adam: A metod for stochastic optimization. Internal Conference of Learning Representation, p. 1-15. [22] Wang, Z., Zhang, Q., Xiong, J., Xiao, M., Sun, G., He, J. (2017). Fault diagnosis of a rolling bearing using wavelet packet denoising and random forests. IEEE Sensors Journal, vol. 17, no. 17, p. 5581-5588, DOI:10.1109/jsen.2017.2726011. [23] Ali, J.B., Fnaiech, N., Saidi, L., Chebel-Morello, B., Fnaiech, F. (2015). Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals. Applied Acoustics, vol. 89, p. 1627, DOI:10.1016/j.apacoust.2014.08.016. 452 Pan, H. - He, X. - Tang, S. - Meng, F.