261
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274 Received for review: 2022-11-25
© 2023 The Authors. CC BY 4.0 Int. Licensee: SV-JME Received revised form: 2023-02-03
DOI:10.5545/sv-jme.2022.459 Original Scientific Paper Accepted for publication: 2023-03-06
*Corr. Author’s Address: Northeast Petroleum University, Mechanical Science and Engineering Institute, China, wangteng_1009@163.com
An Improved MSCNN and GRU Model  
for Rolling Bearing Fault Diagnosis
Wang, T. – Tang, Y . – Wang, T. – Lei, N.
Teng Wang
*
 – Youfu Tang – Tao Wang – Na Lei
Northeast Petroleum University, Mechanical Science and Engineering Institute, China
In this paper, a novel fault diagnosis method based on the fusion of squeeze and excitation-multiscale convolutional neural networks (SENet-
MSCNN) and gate recurrent unit (GRU) is proposed to address the problem of low diagnosis rate caused by the fact that normal samples 
are much larger than fault samples in the vibration big data. The method takes the time-domain vibration signal as input and fuses the 
spatial features extracted by SENet-MSCNN. The temporal features extracted by GRU in order to bring them into the fully connected layer 
for identification so as to realize the intelligent diagnosis of rolling bearing adaptive feature extraction. Finally, the method is applied to the 
simulated signal and experimental data for testing and analysis. The results reveal that the model can reach 98.98 % and 76.44 % migration 
diagnostic accuracy in bearing and gearbox datasets. At the same time, it has strong noise immunity, adaptivity, and robustness, providing an 
effective way for intelligent diagnosis of rolling bearing vibration big data.
Keywords: SENet, multiscale convolutional neural networks, gate recurrent unit, rolling bearing, fault diagnosis
Highlights
• 	 A novel integration method of SENet-MSCNN and GRU is proposed, which can more effectively and adaptively extract the fault 
features of rolling bearing and fault diagnosis.
• 	 Based on the existing problems of convolutional neural networks (CNN), we developed MSCNN. The multiscale convolution 
kernel in MSCNN not only considers the global basic features of the signal but also extracts local detail features. 
• 	 The SENet is added to MSCNN to recalibrate multiscale features, which can reduce attention to irrelevant information and pay 
more attention to motivate important information.
• 	 SENet-MSCNN is good at reducing frequency variance and extracting spatial features, and GRU is good at extracting long 
sequence time-series features. The integration model has better robustness in real working conditions and can reach 98.98 % 
accuracy under variable load conditions.
0  INTRODUCTION
As a key component in rotating machinery, rolling 
bearings often work in severe environments of high 
load and high speed, so they are highly prone to failure. 
Research shows that bearing faults account for the 
majority of the total number of faults [1]. Therefore, 
it is of great theoretical significance and engineering 
application value to research rolling bearing fault 
diagnosis methods to ensure the continuous and safe 
operation of equipment and reduce the economic loss 
of downtime [2].
However, the rolling bearing vibration big data 
caused by variable working conditions and shock 
excitation have typical nonlinear non-stationary 
complex characteristics [3] and [4], which makes the 
existing signal processing techniques, such as time-
domain statistical analysis [5] and [6], frequency-
domain spectral analysis [7] and [8], short-time 
Fourier transform [9], wavelet analysis [10] and [11], 
Hilbert-Huang transform (HHT) [12], variational 
mode decomposition (VMD) [13], difficult to extract 
fault features adaptively.
In contrast, the normal samples are much larger 
than the fault samples in the massive rolling bearing 
vibration data collected in the field, which makes 
the diagnostic efficiency and recognition rate of the 
existing artificial intelligence diagnosis methods, 
such as support vector machine (SVM) [14], decision 
tree (DT) [15], and random forest (RF) [16], artificial 
neural network (ANN) [17], CNN [18] and [19], 
deep autoencoder (DAE) [20], deep belief network 
(DBN) [21], recurrent neural network (RNN) [22] and 
artificial immune algorithm (AIN) [23] difficult to 
apply in industrial contexts.
At present, researchers at home and abroad have 
fused signal-processing techniques with artificial 
intelligence diagnosis methods, such as wavelet packet 
decomposition and empirical mode decomposition 
(EMD) with back propagation (BP) network fusion 
[24], VMD and probabilistic neural network (PNN) 
[25], wavelet and CNN network fusion [26], HHT, and 
CNN network fusion [27]. These methods have been 
effective in improving the diagnostic performance 
of the large sample and various fault vibration data. 
However, vibration data is affected by different 
working conditions, structural parameters, fault types, 
fault degrees, and the number of faults. The above 
fusion methods have their applicability conditions 
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
262 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
and need to be artificially selected based on experts’ 
empirical knowledge, which has greater limitations.
In addition, some researchers improve fault 
diagnosis accuracy and robustness by fusing 
different artificial intelligence methods. These 
diagnosis methods mainly contain two parts: feature 
extraction and pattern recognition. A convolutional 
discriminative feature learning approach and support 
vector machine fusion method was proposed by Sun 
et al. [28] for fault diagnosis of induction motors; the 
network performance is improved. Chen et al. [29] 
proposed a mechanical fault diagnosis method based 
on CNN and extreme learning machine (ELM) by 
using ELM as a classifier of CNN with the advantages 
of fast learning speed and high generalization ability. 
The network performance is further improved, and 
the model generalization ability and convergence 
speed are also enhanced. Wang et al. [30] proposed 
a CNN-based hidden Markov model for rolling 
bearing fault identification by fusing the strong 
feature extraction capability of CNN and the excellent 
pattern recognition performance of the hidden Markov 
model. Compared with the CNN model alone, it has 
higher classification accuracy and robustness. Based 
on the excellent network performance of CNN, the 
above methods have achieved better performance by 
combining CNN with various artificial intelligence 
methods. However, the CNN, as a feature extraction 
layer, extracts high-dimensional features and contains 
a large amount of spatial information and sequence 
information. Another approach, as a classifier, 
classifies the spatial features extracted by the CNN 
without considering the connection between the 
features. Based on this idea, taking advantage of 
RNN for extracting temporal features and CNN for 
extracting spatial features, some researchers have 
combined CNN and RNN [31] for fault diagnosis and 
achieved better accuracy. However, RNNs are prone 
to gradient explosion and gradient disappearance. 
Gate recurrent unit (GRU) [32] and long short-term 
memory (LSTM) [33], as a variant of RNN, can solve 
gradient vanishing and gradient explosion problems. 
A planetary gearbox diagnosis method based on CNN 
and LSTM is proposed by Shi et al. [34]; the network 
is able to detect the type, location, and direction of 
gearbox faults with greater accuracy and a higher 
recognition rate than traditional a single CNN. Chen 
et al. [35] found that the features extracted by a size 
convolutional kernel are more singular; a multi-scale 
convolutional neural network and long shor-term 
memory (MSCNN-LSTM) fault diagnosis model was 
proposed. The average accuracy in the experimental 
data reached 98.46 % and has strong noise immunity. 
Li et al. [36] proposed a method that combines CNN 
and GRU models with vibration and acoustic emission 
signals for gear-pitting fault diagnosis. The method 
can achieve a diagnosis rate of more than 98 % and 
exhibits stronger robustness compared with a single 
CNN and GRU for different loads and learning rates. 
The fusion of two different deep learning methods is 
to take the advantage of both models and make the 
model representation more powerful. However, it is 
easy to deepen the network layer depth, leading to 
model overfitting. 
To address these problems, a fault diagnosis 
method based on the fusion of SENet-MSCNN and 
GRU is proposed. The width of the convolutional 
layers of the network is increased by adding 
convolutional kernels of different scales to form 
MSCNN layers, without increasing the depth of the 
network structure. Convolutional kernels of different 
sizes can capture different perceptual field features to 
obtain global and local information. In addition, the 
features extracted by MSCNN are not all important, 
which can easily cause redundant information and 
irrelevant information to influence the classification 
results. Thus, the SE-Net block [37] was introduced 
into MSCNN to recalibrate multiscale features to 
reduce attention to irrelevant information and pay 
more attention to motivate important information. 
Then, the spatial features extracted by SENet-MSCNN 
were input to GRU to extract time-series features. 
Compared with other fault diagnosis methods, the 
method has broad application prospects in improving 
the accuracy of rolling bearing fault diagnosis. In 
addition, the method is attractive in reducing failure 
rates, reducing maintenance and repair costs of 
machinery and equipment, and preventing accidents.
1  FAULT DIAGNOSIS MODEL BASED ON  
SENet-MSCNN AND GRU METHOD
The MSCNN extracts the fault features through 
several convolutional kernels of different sizes and 
fuses the multiscale features. Then the fused features 
are fed into the GRU network to extract the time-
series features and classify them while adding the 
SE-Net into the MSCNN to enable the network model 
to recalibrate multiscale features, which can further 
improve the diagnosis rate and robustness of the fault 
model.
1.1  Architecture of MSCNN
Since the rolling bearing vibration signal presents 
nonlinear and nonstationary characteristics, the high-
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
263 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
frequency features cannot be extracted by larger 
convolutional kernels and the low-frequency features 
cannot be extracted by smaller convolutional kernels. 
Therefore, to solve this problem, this paper uses 
the MSCNN structure, which includes a multiscale 
convolutional layer by connecting convolutional 
kernels of different sizes [1×1, 3×1, 5×1], whose 
structure is shown in Fig. 1. Convolutional kernels 
[3×1] extract high-frequency fault features; 
convolutional kernels [5×1] extract low-frequency 
fault features. The features extracted from several 
different receptive fields possess both global and 
local information [38] and [39]. A convolutional 
kernel of size [1×1] is added to each of the four 
branches of MSCNN, which has two advantages: 
first, although a [1×1] convolutional kernel cannot 
extract spatial features, it can extract features along 
the depth dimension to achieve a nonlinear feature 
map. Second, several [1×1] convolutional kernels are 
embedded in the front of [3×1, 5×1] convolutional 
kernels and can reduce dimensionality to reduce the 
computational cost. It can accelerate training and 
improve generalization. The calculation process of the 
convolution is as follows:
 xgnx gn *,      





 (1)
where x denotes the amplitude, and g denotes the 
multiscale convolution kernel.
Fig. 1.  The structure diagram of MSCNN, which includes  
a multiscale convolutional layer by connecting convolutional 
kernels of different sizes [1×1, 3×1, 5×1]
By using the scaled exponential linear units 
(SELU) activation function [40], the data distribution 
is self-normalized to satisfy a normal distribution with 
mean 0 and variance 1. Moreover, the SELU activation 
function is a non-saturated function, which can 
solve the vanishing gradient and exploding gradient 
problem. Its function expression is as follows:
 selu x
xx
ex
x
()
,
() ,
, 







0
10
 (2)
where α and λ denote constants.
1.2  Architecture of SENet
A new module is introduced in the MSCNN model: 
SENet, whose detailed structure is shown in Fig. 2. 
The biggest advantage of the SENet block is that it 
can construct interdependencies between channels 
[41]. SENet adopts a feature recalibration mechanism, 
which can obtain the dependency degree of each 
channel feature through global information. Then, by 
the dependency degree, the important information is 
selectively enhanced and the irrelevant information 
is squeezed to recalibrate the relationship between 
channel-wise features. Thus, it aids in strengthening 
the convolutional kernel learning capability and 
improve the feature representation capability of 
MSCNN. The formulas are as follows:
 zG AP
HW
uij u
cc c
j
W
i
H
  

 
 
1
1 1
(, ), (3)
 sz
cc
   
 WW
21
, (4)
 M     m, m, m, ...,mFsu
123cscalec c
, (5)
where uR
c
HW


 is input feature, zR
c
c

 1
 the 
channel-wise feature vector, sR
c
HW


 recalibration 
vectors, and M

R
HW
 reconstructing feature vector.  
W
1


R
DrD /
 and W
2
1


Rz R
DDr
c
c /
 are weights ∗ 
convolution operator, σ Sigmoid function, δ Relu 
function, r reduction ratio, and F
scale
  is scalar 
multiplication.
a)     b) 
Fig. 2.  The structure diagram of SENet;  
a) SENet module, and b) SENet block
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
264 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
1.3  Architecture of GRU
The GRU network is a simplified version of the 
LSTM, which has a simpler structure, with lower 
computational cost, faster iterations, and no reduction 
in network performance compared to the LSTM. The 
GRU network has only two gated units: the update 
gate and the reset gate. With these two gated units, it 
can learn, discard, and retain information in a long-
term sequence and influence the output of the next 
iteration. As shown in Fig. 3, the input vector x
(t)
 and 
the previous state vector h
(t–1)
 are connected to two 
fully connected layers, through the Sigmoid function 
mapping the result z
(t)
 and r
(t)
 between 0 and 1. 
Fig. 3.  The structure diagram of GRU; the GRU network has only 
two gated units: the update gate and the reset gate
The formulas can be given as follows:
 zb
t xz hz t z   


=+ ,
TT
 WW h
1
 (6)
 rx b
t xr t hr t r    

=+ +,
TT
 WW h
1
 (7)
 gt anh x+ rb
t xg t hg tt g     




WW h
TT
,
1
 (8)
 hz zg
tt tt t      
 

 h
1
1, (9)
where σ represents Sigmoid function. W
xz
, W
xr
, and 
W
xg
 represent the weight matrices of for their 
connection to the input vector x
(t)
. W
hz
, W
hr
, and W
hg
 
represent the weight matrices of for their connection 
to the vector h
(t–1)
. b
z
, b
r
, and b
g
 are the bias. ⊗ is 
scalar multiplication.
1.4  Intelligent Fault Diagnosis Methods Based on SENet-
MSCNN and GRU Model
As shown in Fig. 4, rolling bearing fault diagnosis 
consists of three major parts: the SENet-MSCNN 
layer, GRU layer, and Dense layer. The method is 
based on the improvement of the integration method 
of the CNN and LSTM, with which CNN is good at 
reducing the vibration frequency variance and GRU 
is good at extracting time-series features. Combining 
and improving the CNN and GRU fusion model, next 
the SENet-MSCNN and GRU fault diagnosis model 
is proposed. First, the time domain signal of bearing 
fault vibration is directly served as the input of 
SENet-MSCNN to extract multiscale features through 
a multiscale convolution kernel. The multiscale 
features are input to the SENet to recalibrate features. 
Then, the high dimensional multiscale features are 
Global Average Pooling to reduce dimensions. The 
low dimensional features are input to the GRU layer 
to extract time-series features. Finally, the features are 
input to the fully connected layer for classification by 
the Softmax function.
Fig. 4.  The framework of the SENet-MSCNN and GRU model, which consist of three major parts  
SENet-MSCNN layer, GRU layer, and Dense layer
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
265 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
1.5  Signals Application of SENet-MSCNN and GRU 
Methods in Simulated Signals
1.5.1  Construction of Simulated Signal Data Sets
The raw signal of the rolling bearing is simulated 
using three simplified models from the literature [42]: 
x
1
(t), x
2
(t), x
3
(t), which are:
 xt ft
11
04 21 0   .c os ,  (10)
 xt ft
22
06 21 5   .c os  , (11)
 xt ft ft
br 3
212      



sins in  , (12)
where f
1
 = 20 Hz, f
2
 = 45 Hz, f
3
 = 100 Hz, f
r
 = 10 Hz,  
N = 2048, sampling frequency f
s
 = 10 Hz.
A random matrix A is used to construct the 
simulated signal and a white noise signal is added 
to form the simulated signal: x
4
(t), x
5
(t) and x
6
(t). As 
shown in Fig. 5, x
4
(t), x
5
(t) and x
6
(t) are added to the 
Gaussian white noise to obtain the simulated signal 
plot with a signal-noise ratio of 2 dB; the signal-noise 
ratio equation is as follows:
 SNR
p
p
s
n
=10
10
log, (13)
 A 


0 5214 0 4555 0 2013
0 4278 0 2002 0 4416
0 3122 0 3818 0 6058
...
...
...
 







, (14)
 A
xt
xt
xt
S
S
S
x
noise
noise
noise
1
2
3
4

























t t
xt
xt













5
6
, (15)
where p
s
 is to input signal energy, p
n
 noise energy, and  
S
noise 
Gaussian white noise.
Fig. 5.  Time-domain plots of the simulated signal,  
a) x
4
(t), b) x
5
(t), and c) x
6
(t) are added to Gaussian  
white noise signals with a signal-noise ratio of –2
x
4
(t), x
5
(t) and x
6
(t) are added to Gaussian white 
noise signals with a different signal-noise ratio in the 
range of [-4, 8]. The numbers of training samples and 
test samples for each signal-noise ratio of each fault 
are 30 and 10, respectively, which are given in Table 
1.
Table 1.  The information of the simulated signal dataset
Signals
SNR [dB]
-4 -2 0 2 4 6 8
x
4
(t)
Train 30 30 30 30 30 30 30
Test 10 10 10 10 10 10 10
x
5
(t)
Train 30 30 30 30 30 30 30
Test 10 10 10 10 10 10 10
x
6
(t)
Train 30 30 30 30 30 30 30
Test 10 10 10 10 10 10 10
1.5.2  Network Structure and Parameters
The detailed structure and parameter settings of the 
SENet-MSCNN and GRU networks models are 
shown in Table 2. Firstly, the simulation signal is 
input to the SENet-MSCNN network. The number of 
convolutional kernels in the SENet-MSCNN network 
is 128, and the activation function is Selu, which 
induces self-normalizing data. Then the output features 
of SENet-MSCNN are input to global average pooling 
and the remaining spatial features are discarded to 
reduce feature dimensionality. Twenty cell numbers 
are set for the GRU network and the Relu function is 
used. The last layer is the fully connected layer and 
the Softmax function is applied to classify the output 
results into three classifications. The hyperparameters 
are set: the learning rate is 0.001, the batch size is 32, 
the number of iterations is 196, and the loss function 
is the cross-entropy loss function.
1.5.3  Simulation Analysis
Fig. 6 is a comparison of the recognition rates in 
the test set between GRU, 1D-CNN, MSCNN, BP 
networks, and proposed methods. It can be seen that 
recognition rates of the SENet-MSCNN and GRU 
networks have stabilized at 100 % at the second 
iteration. The recognition rate of the GRU network 
has stabilized tending to 99.3 %. The BP network and 
MSCNN network recognition rate are stable below 
99.8 %, and the 1DCNN network fluctuates more. 
Therefore, the recognition rate, convergence speed, 
and anti-noise performance of the SENet-MSCNN 
and GRU networks perform better in the simulation 
data.
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
266 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
Fig. 6.  The diagnosis rate of different methods in simulation data, 
several methods including SENet-MSCNN and GRU, 1DCNN, GRU, 
MSCNN, and BP are compared and analysed
2 APPLICATION OF SENet-MSCNN AND GRU METHODS IN 
ROLLING BEARING FAULT DIAGNOSIS
In this section, we first discuss the Case Western 
Reserve University, Cleveland, USA, bearing datasets 
[43] and the gearbox datasets, and our implementation 
details. Subsequently, the proposed method in this 
paper is applied to the comparative analysis of several 
typical methods in the two datasets. Meanwhile, 
we did ablation experiments to examine the effect 
of each model component. Finally, we design the 
variable working conditions experiment to analyse the 
migration performance of the diagnosis.
2.1  Data Preprocessing
This paper increases the data sample set through data 
augmentation techniques (by sliding windows and 
cuts) to create three different sizes of comparison test 
datasets of 500, 1500, and 4500 samples. As shown in 
Fig. 7, each sample slide window size is 2048 points, 
and a slide step size is 500 points, which consists of 
datasets.
Fig. 7.  Data augmentation with overlap where. the sample length 
is 2048 and the sliding step is 500
2.2  Bearing Dataset
As shown in Fig. 8, the experimental platform consists 
of four parts: motor, torque transducer/encoder, 
dynamometer, and electronic control. The sampling 
frequency is 12 kHz, and the data ARE collected 
from the vibration data of the drive end (DE). There 
are three fault types: the inner fault, the ball fault, 
and the outer fault with three fault diameters (0.1778 
mm, 0.3556 mm, and 0.5334 mm) and a normal state. 
Three fault types are shown in Fig. 9. Therefore, 
Table 2.  The parameters of the SENet-MSCNN and GRU models used in the simulated signal dataset
Layer Kernel size/step Kernel num Unit Input size Output size Activation
Conv_1 3×1/2 128 32×2048×1 32×1024×128 Selu
Conv_a 1×1/2 128 32×1024×128 32×512×128 Selu
Conv_b 1×1/1 128 32×1024×128 32×1024×128 Selu
Conv_b 3×1/2 128 32×1024×128 32×512×128 Selu
AveragePooling c 3×1/2 32×1024×128 32×512×128 Selu
Conv_c 5×1/1 128 32×512×128 32×512×128 Selu
Conv_d 1×1/1 128 32×1024×128 32×1024×128 Selu
Conv_d 3×1/1 128 32×1024×128 32×1024×128
Selu
Conv_d 3×1/2 128 32×1024×128 32×512×128
Selu
Concatenate (32×512×128)×4 32×512×512
SENet 32×512×512 32×512×512
GlobalAverage Pooling 32×1024×512 32×512
ExpandDim 32×512 32×512×1
Gru_1 20 32×512×1 32×512×20 Relu
Gru_2 20 32×512×20 32×512×20 Relu
MaxPooling 3×1/2 32×512×20 32×256×20
Flatten 32×256×20 32×5210
Dense 3 32×5210 32×3 Softmax
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
267 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
there are 10 states in total and their fault time-domain 
waveforms are shown in Fig. 10. The rolling bearings 
worked at three motor loads (746 W, 1492 W, 2238 W 
with three motor speeds (1772 r/min, 1750 r/min, and 
1730 r/min). The 10 states under the three working 
conditions (1 hp, 2 hp, 3 hp) are respectively denoted 
as A, B, and C. Dataset D consists of three working 
conditions. Details of the datasets are shown in Table 
3; 80 % of the samples are extracted from the 10 states 
datasets under the three working conditions to form 
the training set and 20 % to form the test set.
2.3  Gearbox Dataset
The experiment in this paper used the HFXZ-
1 planetary gearbox fault diagnosis experimental 
platform, as shown in Fig. 11. The experimental 
platform consists of seven parts: motor, gearbox, 
flexible coupling, planetary gearbox, helical gearbox, 
torque sensor, and magnetic powder brake. Three fault 
states were in the experiment (gear tooth breakage, 
Fig. 9.  Different faults of the rolling bearings: a) inner fault, b) outer fault, and c) ball fault
Fig. 10.  Raw vibration signals for 10 states;  rolling bearings include 10 states: Normal, Inner 0.1778 mm , Outer 0.1778 mm , Ball 0.1778 
mm , Inner 0.3556 mm , Outer 0.3556 mm , Ball 0.3556 mm , Inner 0.5334 mm , Outer 0.5334 mm , and Ball 0.5334 mm
            Motor       Torque transducer/encoder   Dynamometer
Fig. 8.  Experiment platform for rolling bearing fault used by CWRU 
which consists of four parts: motor, torque transducer/encoder, 
dynamometer, and electronic control
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
268 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
2.4  Analysis of Experimental Results
The detailed structure and parameter settings of the 
SENet-MSCNN and GRU network models in the 
experimental data are shown in Table 2, which are the 
same as the detailed parameters under the simulated 
signals. The number of units in the dense layer being 
10 and divided into 10 classifications. The data in A, 
B, C, and D working loads are used as the model input. 
To evaluate the superiority of the proposed method, it 
was compared with 1D-CNN, CNN-GRU, MSCNN, 
gear wear, and gear crack), as shown in Fig. 12, and 
one normal state.
The motor speed was set to 600 rad/min and 
the sampling frequency was 5120. Three loads were 
set: 0.1 A, 0.05 A, and 0 A corresponding to datasets 
A, B, and C. The acceleration sensor was installed 
outside the planetary gearbox to detect the vibration 
signal, and the time domain waveform of the original 
vibration signal is shown in Fig. 13.
Table 3.  The information of the rolling bearing datasets
Normal Inner Ball Outer
Label 0 1 2 3 4 5 6 7 8 9
Fault diameter [inches] 0 0.007 0.014 0.021 0.007 0.014 0.021 0.007 0.014 0.021
A (1 hp)
Train 160 160 160 160 160 160 160 160 160 160
Test 40 40 40 40 40 40 40 40 40 40
B (2 hp)
Train 160 160 160 160 160 160 160 160 160 160
Test 40 40 40 40 40 40 40 40 40 40
C (3 hp)
Train 160 160 160 160 160 160 160 160 160 160
Test 40 40 40 40 40 40 40 40 40 40
D (1 hp 2 hp 3 hp)
Train 480 480 480 480 480 480 480 480 480 480
Test 120 120 120 120 120 120 120 120 120 120
Fig. 11.  Planetary gearbox fault diagnosis experiment platform: 1 motor, 2 gearbox, 3 flexible coupling, 4 planetary gearbox, 5 helical 
gearbox, 6 torque sensor, and 7 magnetic powder brake
Fig. 12.  Different faults of the gearbox: a gear tooth breakage, b gear wear, and c gear crack
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
269 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
GRU, BP, SVM, RF, and DT. The results are shown 
in Table 4. The proposed method outperforms the 
other models in various working loads. The average 
accuracy is improved by 1.75 %, 1.42 %, and 3.58 % 
compared with CNN-GRU, MSCNN, and 1DCNN, 
respectively. The traditional machine learning 
methods (SVM, RF, and DT) perform poorly, with 
average accuracy below 80%, which demonstrates 
that deep learning methods have better performance in 
massive vibration data. Hence, the recognition rate of 
the SENet-MSCNN and GRU networks is remarkably 
better than the other methods.
Fig. 13.  Time-domain plots of vibration signals  
under four state modes
Table 4.  The diagnosis rate of different methods in experimental 
data
Methods
Different working loads
Average 
[%]
A B C D
Proposed 
method
100.00 100.00 100.00 100.00 100.00
1DCNN 98.00 94.67 93.00 100.00 96.42
CNN-GRU 97.67 99.00 97.00 99.33 98.25
MSCNN 97.33 99.00 99.00 99.00 98.58
GRU 96.33 84.67 95.66 94.00 92.64
BP 71.33 58.00 66.00 61.00 64.08
SVM 64.67 75.00 81.67 80.78 75.53
RF 47.67 60.33 49.33 47.33 51.16
DT 36.33 40.33 40.33 40.22 39.30
Considering the different performance of 
network models in different sizes of datasets, 
comparison experiments of three datasets (500 
samples, 1500 samples, and 4500 samples) are 
established. The datasets are divided into training 
sets and test sets in the ratio of 4:1. The batch 
sizes of the three datasets are set to 16, 32, and 
64, respectively. Compared with other methods, 
the SENet-MSCNN and GRU model have the 
highest accuracy in three datasets with 100 %, 
99.67 %, and 100 % respectively, which can be 
seen in Fig. 14. It also shows that the method has 
good performance in small datasets. Therefore, the 
accuracy and robustness of the proposed method in 
this paper are significantly better than others. Fig. 
15 shows the time plots. The time consumed by 
the GRU network is the longest and the MSCNN 
is the shortest for each sample; the time consumed 
by the SENet-MSCNN and GRU networks is 
second. The MSCNN reduces the complexity of 
GRU model parameter computation and reduces 
the model training time. Furthermore, with the 
batch size increasing, the time consumed by every 
sample is further reduced. The consuming time of 
the proposed method is reduced to 19 ms/sample 
at 4500 samples, which further speeds up the 
model iteration. The SENet-MSCNN and GRU 
fault diagnosis models are significantly improved 
in terms of diagnostic accuracy, robustness, and 
diagnostic speed.
1DCNN 88.00 % 98.33 % 99.56 %
BP 46.00 % 56.67 % 69.56 %
CNN-GRU 95.00 % 98.67 % 99.78 %
GRU 75.00 % 28.33 % 96.56 %
MSCNN 89.00 % 99.33 % 99.40 %
Proposed method 100.00 % 99.67 % 100 %
SVM 48.00 % 69.33 % 80.78 %
RF 33.00 % 46.67 % 47.11 %
DT 26.00 % 35.67 % 41.33 %
Fig. 14.  Diagnosis accuracy of different methods  
in three fault sample sets
2.5  Analysis of Ablation Experiments
In this section, we will do ablation experiments to 
compare the performance of the proposed method 
with serval baseline methods (MSCNN, GRU, 
MSCNN-GRU). 
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
270 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
Fig. 15.  Iteration speed of different methods in three datasets
GRU 94.33 % 86.33 % 98.00 % 96.67 % 93.83 %
MSCNN 95.67 % 99.00 % 99.00 % 99.67 % 98.33 %
MSCNN-GRU 99.97 % 100.00 % 100.00 % 100.00 % 99.99 %
Proposed method 100.00 % 100.00 % 100.00 % 100.00 % 100.00 %
Fig. 16.  Ablation study results
As Fig. 16 shows, the average accuracy 
of MSCNN and GRU is 93.83 % and 98.33 %, 
respectively. After the fusion of these two methods, 
MSCNN-GRU achieves 100 % accuracy in B, C, 
and D working conditions, with an average accuracy 
of 99.99 %. The proposed method adds the SENet to 
MSCNN-CNN, and the accuracy is further improved 
to reach 100 % in all four conditions. Thus, it is also 
proved that the fusion of GRU, MSCNN, and SENet 
has a significant improvement in the recognition rate.
2.6  Variable Working Conditions Experiment
To further signify the migration characteristics of 
SENet-MSCNN and GRU models for different 
working conditions, the experiment datasets of A, B, 
C, and D working conditions are used as the input of 
the models. One of A, B, C, D working conditions 
is used as the source domain and another working 
condition dataset is used as the target domain, which 
constitutes 12 domains. Table 5 presents the accuracy 
of different methods in variable working conditions. 
The average accuracy of BP , DT, RF, SVM models 
with shallow network structures is below 70 %, and 
the migration characteristics of these models are poor. 
The average accuracy of single deep network CNN, 
GRU, MSCNN is 91.12 %, 84.75 %, and 90.12 %, 
respectively. The network performance is improved. 
The average accuracy of the fused models, CNN-
GRU, MSCNN-GRU, SENet-MSCNN and GRU are 
above 94.12 %. The network performance is further 
improved. The proposed method achieves the highest 
average accuracy of 98.98 %, which is 4.83 % and 
2.81 % higher than CNN-GRU and MSCNN-GRU. 
Only in A-C, the MSCNN-GRU model performs 
better, with an improvement of 0.6 %. In the rest of 
the variable conditions, the proposed method has a 
significant improvement, especially in C-A, with an 
improvement of 11.2 %. The accuracy of all methods 
is reduced when C is used as the source domain or C 
is used as the target domain., which is explained by 
the fact that C enhances the cyclic shock response of 
rolling bearings, making the data more regular and the 
features learned by the network model simpler. This 
makes accuracy significantly reduce when testing low 
load data (especially 1hp working conditions), which 
Table 5.  Experimental diagnosis results of various methods
Methods
Variable working conditions test
Average 
[%] A-B A-C A-D B-A B-C B-D C-A C-B C-D D-A D-B D-C
1DCNN 99.60 91.40 96.60 90.00 74.80 89.00 79.40 86.40 87.00 99.20 100.00 100.00 91.12
BP 54.20 54.40 64.80 50.60 52.20 58.00 49.60 55.00 59.80 78.80 73.20 79.60 60.85
CNN-GRU 96.40 94.80 95.80 90.40 85.00 90.20 90.40 94.80 92.60 99.40 100.00 100.00 94.15
GRU 73.60 81.00 81.60 87.60 88.80 88.60 72.20 73.60 77.40 97.60 97.00 98.00 84.75
MSCNN 80.80 86.00 86.00 90.00 93.80 95.60 77.80 86.00 86.20 99.80 99.40 100.00 90.12
MSCNN-GRU 99.60 96.20 99.00 98.20 94.80 98.40 85.40 91.40 91.00 100.00 100.00 100.00 96.17
Proposed method 99.00 98.80 99.80 98.20 99.00 99.20 96.60 98.80 98.40 100.00 100.00 100.00 98.98
DT 33.60 35.80 49.60 32.00 34.20 45.20 35.20 36.80 44.80 63.80 64.20 62.80 44.83
RF 42.60 48.00 46.80 42.80 46.80 46.80 41.60 40.80 46.40 51.00 52.60 54.80 46.75
SVM 54.80 66.40 66.00 64.00 65.00 67.60 63.80 56.40 67.80 87.20 79.40 89.00 68.95
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
271 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
influences the migration characteristics of the model. 
Thus, the proposed method maintains a high accuracy 
under conditions with large differences, which proves 
that the feature extracted by the proposed method 
have stronger transfer characteristics.
Although the SENet-MSCNN and GRU model 
exhibit strong migration properties in the Western 
Reserve University bearing data, the bearing dataset 
is very transferable. Therefore, to further validate 
the migration properties of the method, the gearbox 
dataset was used to test. Table 6 presents the accuracy 
of different methods in variable working conditions. 
The SENet-MSCNN and GRU fusion model still 
exhibits the best performance on the gearbox dataset 
with an average accuracy of 76.44 %, an improvement 
of 4.61 %, 11.36 %, 9.77 %, and 2.69 % compared to 
the MSCNN, GRU, MSCNN-GRU, and CNN-GRU 
respectively. The average accuracy of traditional 
machine learning methods is below 56 %, with poor 
migration characteristics.
2.7  Visualization Results
(1) Visualization of mid-layer activation. A sample 
of the fourth fault type of rolling bearing (fault position 
is outer, fault diameter is 0.1778 mm) is used as input to 
the model; the feature map of each hidden layer output 
is visualized in a 2D image, which is shown in Fig. 17. 
The yellow parts represent the activated part and the 
blue parts represent the inactivated part. The features 
extracted by the first convolutional layer Conv 1, which 
is the yellow activated part, correspond to the shock 
component of the vibration signal. The global features 
extracted at different scales by the MSCNN layer are 
the same, while the local features are different. The 
convolutional kernels of 3×1 and 5×1 sizes are used 
in MSCNN_b and MSCNN_c. The branches are more 
sensitive to the shock signal and the extracted feature 
information has a higher resolution. As the number 
of network layers deepens, irrelevant information 
is filtered out and useful information is refined and 
Table 6.  Experimental diagnosis results of various methods
Methods
Variable working conditions test
Average 
[%]
A-B A-C B-A B-C C-A C-B
1DCNN 69.00 53.00 61.50 100.00 48.00 89.50 70.17
BP 27.50 27.50 50.00 55.00 44.00 54.50 43.08
CNN-GRU 73.50 56.50 53.00 99.00 61.00 99.50 73.75
GRU 35.00 34.00 57.00 99.50 65.00 100.00 65.08
MSCNN 59.50 65.00 50.50 99.00 57.00 100.00 71.83
MSCNN-GRU 41.50 40.50 50.50 99.50 68.00 100.00 66.67
Proposed method 74.67 58.00 54.00 100.00 72.67 99.33 76.44
DT 28.50 26.00 33.00 42.00 34.50 28.00 32.00
RF 27.50 27.50 50.00 43.50 48.00 56.00 42.08
SVM 34.00 36.50 50.50 83.50 50.50 81.00 56.00
Fig. 17.  Visualization of the hidden layer activations of SENet-MSCNN and GRU, label 3 represents the fourth fault type of rolling bearing
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
272 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
scaled up. The feature map becomes clearer, the 
extracted features are more abstract, and the source 
domain information is less relevant. The target domain 
information is gradually more relevant.
Fig. 18.  Visualization of time-domain waveforms for class 
activation; the red parts represent the activated part and the blue 
parts represent the inactivated part
(2) Visualization of the time-domain plot for 
class activation. The time-domain waveform plot 
of the activation intensity of the vibration signal for 
the fourth fault states(fault location is the outer and 
the fault diameter is 0.1778 mm) is obtained by the 
method of class activation visualization [44]. As 
shown in Fig. 18, the medium and low-frequency 
shock signals in the red parts have a strong influence 
on the classification results of the fault diagnosis 
model, and the high-frequency signals in the blue 
parts have less influence on the classification results. 
The part of the SENet-MSCNN and GRU model that 
is more sensitive to the vibration signal of the fourth 
fault state is similar to the characteristic frequency 
of the vibration signal of the fourth fault state, which 
further explains the fault diagnosis model to diagnose 
the input signal as the fourth fault state.
(3) T-SNE Visualization. T-SNE (T-distributed 
Stochastic Neighbor Embedding) is a common 
method used for data dimensionality reduction and 
visualization. In this paper, high dimension data is 
represented by low dimension distribution using the 
T-SNE method. Fig. 19 presents 1000 validation sets 
classified by the SENet-MSCNN and GRU models 
and the T-SNE visualization of their intermediate 
processes. The T-SNE visualization picture of raw 
signal through the Input Layer is confusing in the two-
dimension space. The T-SNE visualization picture 
of raw signal through the SENet-MSCNN Layer 
has initial classification characteristics. GRU Layer 
already has obvious classification features, and the 10 
states are remarkably separated from each other. The 
Dense Layer has even more obvious classification 
features, with the same state clustered at the same 
location and the distance between different states is 
larger.
3  CONCLUSIONS AND FUTURE WORK
In this paper, a rolling bearing fault diagnosis 
method based on SENet-MSCNN and GRU model is 
proposed. The method was applied to the comparative 
Fig. 19.  Visualization feature distribution map using T-SNE; feature visualization of the Input Layer, SENet-MSCNN Layer, GRU Layer, and 
Dense Layer structures of SENet-MSCNN and GRU
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
273 An Improved MSCNN and GRU Model for Rolling Bearing Fault Diagnosis 
analysis of the bearing data set and achieved a 
recognition rate of over 99.67 %. Compared with other 
representative fault diagnosis methods, the proposed 
method has significant advantages in terms of fault 
identification rate and robustness performance. In 
addition, we tested the migration capability of the 
model under variable working conditions in both the 
bearing dataset and the gearbox dataset. The method 
achieved recognition rates of 98.98 % and 76.44 % in 
the cross-service tests, respectively. The results show 
that the method exhibits better migration properties.
Therefore, the method is expected to provide 
a new method for rolling bearing fault diagnosis. In 
future research, we will apply the proposed fault 
diagnosis method to other mechanical fault types 
to determine its effectiveness in diagnosing a wider 
range of mechanical faults.
4  ACKNOWLEDGEMENTS
This work was supported by the Youth Science 
Foundation of Northeast Petroleum University [Grant 
numbers 2018QNL-28].
5  REFERENCES
[1] Zhao, C., Sun, H. (2019). Dynamic distributed monitoring 
strategy for large-scale nonstationary processes subject to 
frequently varying conditions under closed-loop control. IEEE 
Transactions on Industrial Electronics, vol. 66, no. 6, p. 4749-
4758, DOI:10.1109/tie.2018.2864703.
[2] AlShorman, O., Alkahatni, F., Masadeh, M., Irfan, M., Glowacz, 
A., Althobiani, F., Kozik, J., Glowacz, W. (2021). Sounds and 
acoustic emission-based early fault diagnosis of induction 
motor: A review study. Advances in Mechanical Engineering, 
vol. 13, no. 2, DOI:10.1177/1687814021996915.
[3] Gong, T., Yang, J., Liu, S., Liu, H. (2022). Non-stationary feature 
extraction by the stochastic response of coupled oscillators 
and its application in bearing fault diagnosis under variable 
speed condition. Nonlinear Dynamics, vol. 108, no. 4, p. 
3839-3857, DOI:10.1007/s11071-022-07373-y.
[4] Yun, K., Chong, Y., Enzhe, S., Liping, Y., Quan, D. (2021). 
Fault diagnosis method of diesel engine injector based on 
hierarchical weighted permutation entropy. IEEE International 
Instrumentation and Measurement Technology Conference, p. 
1-6, DOI:10.1109/I2MTC50364.2021.9460083.
[5] Li, Y., Dai, W., Zhang, W. (2020). Bearing fault feature 
selection method based on weighted multidimensional feature 
fusion. IEEE Access, vol. 8, p. 19008-19025, DOI:10.1109/
access.2020.2967537.
[6] Wrzochal, M., Adamczak, S., Piotrowicz, G., Wnuk, S. (2022). 
Industrial experimental research as a contribution to the 
development of an experimental model of rolling bearing 
vibrations. Strojniški vestnik - Journal of Mechanical 
Engineering, vol. 68, no. 9, p. 552-559, DOI:10.5545/sv-
jme.2022.184.
[7] Yi, C., Wang, H., Ran, L., Zhou, L., Lin, J. (2022). Power spectral 
density-guided variational mode decomposition for the 
compound fault diagnosis of rolling bearings. Measurement, 
vol. 199, DOI:10.1016/j.measurement.2022.111494.
[8] Glowacz, A., Tadeusiewicz, R., Legutko, S., Caesarendra, 
W., Irfan, M., Liu, H., Brumercik, F., Gutten, M., Sulowicz, M., 
Antonino Daviu, J.A., Sarkodie-Gyan, T., Fracz, P., Kumar, A., 
Xiang, J. (2021). Fault diagnosis of angle grinders and electric 
impact drills using acoustic signals. Applied Acoustics, vol. 
179, DOI:10.1016/j.apacoust.2021.108070.
[9] Ribeiro Junior, R.F., dos Santos Areias, I.A., Campos, M.M., 
Teixeira, C.E., da Silva, L.E.B., Gomes, G.F. (2022). Fault 
detection and diagnosis in electric motors using convolution 
neural network and short-time fourier transform. Journal of 
Vibration Engineering & Technologies, vol. 10, no. 7, p. 2531-
2542, DOI:10.1007/s42417-022-00501-3.
[10] Li, Y., Cheng, G., Liu, C. (2021). Research on bearing fault 
diagnosis based on spectrum characteristics under strong 
noise interference. Measurement, vol. 169, DOI:10.1016/j.
measurement.2020.108509.
[11] Peng, H., Zhang, H., Fan, Y., Shangguan, L., Yang, Y. (2022). A 
review of research on wind turbine bearings’ failure analysis 
and fault diagnosis. Lubricants, vol. 11, no. 1, DOI:10.3390/
lubricants11010014.
[12] Peng, Z., Zhike, P., Shiqian, C. (2020). Review of signal 
decomposition theory and its applications in machine fault 
diagnosis. Journal of Mechanical Engineering, vol. 56, no. 17, 
DOI:10.3901/jme.2020.17.091.
[13] Kumar, A., Gandhi, C.P., Vashishtha, G., Kundu, P., Tang, 
H., Glowacz, A., Shukla, R.K., Xiang, J. (2021). Vmd based 
trigonometric entropy measure: A simple and effective tool for 
dynamic degradation monitoring of rolling element bearing. 
Measurement Science and Technology, vol. 33, no. 1, 
DOI:10.1088/1361-6501/ac2fe8.
[14] Zhou, J., Xiao, M., Niu, Y., Ji, G. (2022). Rolling bearing fault 
diagnosis based on WGWOA-VMD-SVM. Sensors, vol. 22, no. 
16, DOI:10.3390/s22166281.
[15] Jiang, J., Liu, Y., Xu, C., Shen, H., Soni, M. (2022). Research on 
motor bearing fault diagnosis based on the adaboost algorithm 
and the ensemble learning with bayesian optimization in the 
industrial internet of things. Security and Communication 
Networks, vol. 2022, p. 1-12, DOI:10.1155/2022/4569954.
[16] Hosseinpour-Zarnaq, M., Omid, M., Biabani-Aghdam, 
E. (2022). Fault diagnosis of tractor auxiliary gearbox 
using vibration analysis and random forest classifier. 
Information Processing in Agriculture, vol. 9, no. 1, p. 60-67, 
DOI:10.1016/j.inpa.2021.01.002.
[17] Gunerkar, R.S., Jalan, A.K., Belgamwar, S.U. (2019). Fault 
diagnosis of rolling element bearing based on artificial neural 
network. Journal of Mechanical Science and Technology, vol. 
33, no. 2, p. 505-511, DOI:10.1007/s12206-019-0103-x.
[18] Zhang, Z., Li, H., Chen, L., Han, P., Shi, H. (2021). Rolling 
bearing fault diagnosis using improved deep residual 
shrinkage networks. Shock and Vibration, vol. 2021, p. 1-11, 
DOI:10.1155/2021/9942249.
[19] Wang, H., Liu, Z., Peng, D., Cheng, Z. (2022). Attention-guided 
joint learning cnn with noise robustness for bearing fault 
Strojniški vestnik - Journal of Mechanical Engineering 69(2023)5-6, 261-274
274 Wang, T. – Tang, Y. – Wang, T. – Lei, N.
diagnosis and vibration signal denoising. ISA Transactions, 
vol. 128, p. 470-484, DOI:10.1016/j.isatra.2021.11.028.
[20] Mao, W., Feng, W., Liu, Y., Zhang, D., Liang, X. (2021). A 
new deep auto-encoder method with fusing discriminant 
information for bearing fault diagnosis. Mechanical 
Systems and Signal Processing, vol. 150, DOI:10.1016/j.
ymssp.2020.107233.
[21] Lv, D., Wang, H., Che, C. (2021). Fault diagnosis of 
rolling bearing based on multimodal data fusion and 
deep belief network. Proceedings of the Institution of 
Mechanical Engineers, Part C: Journal of Mechanical 
Engineering Science, vol. 235, no. 22, p. 6577-6585, 
DOI:10.1177/09544062211008464.
[22] Liu, H., Zhou, J., Zheng, Y., Jiang, W., Zhang, Y. (2018). 
Fault diagnosis of rolling bearings with recurrent neural 
network-based autoencoders. ISA Trans, vol. 77, p. 167-178, 
DOI:10.1016/j.isatra.2018.04.005.
[23] Li, D., Liu, S., Gao, F., Sun, X. (2021). Continual learning 
classification method for time-varying data space based 
on artificial immune system. Journal of Intelligent & Fuzzy 
Systems, vol. 40, no. 5, p. 8741-8754, DOI:10.3233/jifs-
200044.
[24] Bin, G.F., Gao, J.J., Li, X.J., Dhillon, B.S. (2012). Early fault 
diagnosis of rotating machinery based on wavelet packets-
empirical mode decomposition feature extraction and neural 
network. Mechanical Systems and Signal Processing, vol. 27, 
p. 696-711, DOI:10.1016/j.ymssp.2011.08.002.
[25] Lin, Y., Xiao, M., Liu, H., Li, Z., Zhou, S., Xu, X., Wang, 
D. (2022). Gear fault diagnosis based on cs-improved 
variational mode decomposition and probabilistic 
neural network. Measurement, vol. 192, DOI:10.1016/j.
measurement.2022.110913.
[26] Guo, S., Yang, T., Gao, W., Zhang, C. (2018). A novel fault 
diagnosis method for rotating machinery based on a 
convolutional neural network. Sensors, vol. 18, no. 5, 
DOI:10.3390/s18051429.
[27] Guo, M.-F., Yang, N.-C., Chen, W.-F. (2019). Deep-learning-
based fault classification using hilbert-huang transform and 
convolutional neural network in power distribution systems. 
IEEE Sensors Journal, vol. 19, no. 16, p. 6905-6913, 
DOI:10.1109/jsen.2019.2913006.
[28] Sun, W., Zhao, R., Yan, R., Shao, S., Chen, X. (2017). 
Convolutional discriminative feature learning for induction 
motor fault diagnosis. IEEE Transactions on Industrial 
Informatics, vol. 13, no. 3, p. 1350-1359, DOI:10.1109/
tii.2017.2672988.
[29] Chen, Z., Gryllias, K., Li, W. (2019). Mechanical fault diagnosis 
using convolutional neural networks and extreme learning 
machine. Mechanical Systems and Signal Processing, vol. 
133, DOI:10.1016/j.ymssp.2019.106272.
[30] Wang, S., Xiang, J., Zhong, Y., Zhou, Y. (2018). Convolutional 
neural network-based hidden markov models for 
rolling element bearing fault identification. Knowledge-
Based Systems, vol. 144, p. 65-76, DOI:10.1016/j.
knosys.2017.12.027.
[31] Hao, S., Ge, F.-X., Li, Y., Jiang, J. (2020). Multisensor bearing 
fault diagnosis based on one-dimensional convolutional 
long short-term memory networks. Measurement, vol. 159, 
DOI:10.1016/j.measurement.2020.107802.
[32] Zhang, P., Chen, C. (2022). Wind turbine planetary gearbox 
fault diagnosis using circular pitch cyclic vector and a 
bidirectional gated recurrent unit. Measurement Science and 
Technology, vol. 34, no. 1, DOI:10.1088/1361-6501/ac95b2.
[33] Zhu, Y., Zhu, C., Tan, J., Tan, Y., Rao, L. (2022). Anomaly 
detection and condition monitoring of wind turbine gearbox 
based on LSTM-FS and transfer learning. Renewable Energy, 
vol. 189, p. 90-103, DOI:10.1016/j.renene.2022.02.061.
[34] Shi, J., Peng, D., Peng, Z., Zhang, Z., Goebel, K., Wu, D. 
(2022). Planetary gearbox fault diagnosis using bidirectional-
convolutional lstm networks. Mechanical Systems and Signal 
Processing, vol. 162, DOI:10.1016/j.ymssp.2021.107996.
[35] Chen, X., Zhang, B., Gao, D. (2020). Bearing fault diagnosis 
base on multi-scale cnn and lstm model. Journal of Intelligent 
Manufacturing, vol. 32, no. 4, p. 971-987, DOI:10.1007/
s10845-020-01600-2.
[36] Li, X., Li, J., Qu, Y., He, D. (2019). Gear pitting fault diagnosis 
using integrated cnn and gru network with both vibration and 
acoustic emission signals. Applied Sciences, vol. 9, no. 4, 
DOI:10.3390/app9040768.
[37] Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E. (2020). 
Squeeze-and-excitation networks. IEEE Trans Pattern Anal 
Mach Intell, vol. 42, no. 8, p. 2011-2023, DOI:10.1109/
TPAMI.2019.2913372.
[38] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., 
Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper 
with convolutions. Proceedings of the IEEE Computer Society 
Conference on Computer Vision and Pattern Recognition, vol. 
07-12-June, p. 1-9, DOI:10.1109/CVPR.2015.7298594.
[39] Peng, D., Wang, H., Liu, Z., Zhang, W., Zuo, M.J., Chen, J. 
(2020). Multibranch and multiscale cnn for fault diagnosis 
of wheelset bearings under strong noise and variable load 
condition. IEEE Transactions on Industrial Informatics, vol. 16, 
no. 7, p. 4949-4960, DOI:10.1109/tii.2020.2967557.
[40] Gao, T., Yang, J., Jiang, S., Yan, G. (2020). A novel fault 
diagnosis method for analog circuits based on conditional 
variational neural networks. Circuits, Systems, and Signal 
Processing, vol. 40, no. 6, p. 2609-2633, DOI:10.1007/
s00034-020-01595-4.
[41] Lv, H., Chen, J., Pan, T., Zhang, T., Feng, Y., Liu, S. (2022). 
Attention mechanism in intelligent fault diagnosis of machinery: 
A review of technique and application. Measurement, vol. 199, 
DOI:10.1016/j.measurement.2022.111594.
[42] Lv, Y., Yuan, R., Song, G. (2016). Multivariate empirical mode 
decomposition and its application to fault diagnosis of rolling 
bearing. Mechanical Systems and Signal Processing, vol. 81, 
p. 219-234, DOI:10.1016/j.ymssp.2016.03.010.
[43] He, X., Zhou, X., Yu, W., Hou, Y., Mechefske, C.K. (2021). 
Bearing fault detection and diagnosis using case western 
reserve university dataset with deep learning approaches: 
A review. ISA Trans, vol. 111, p. 360-375, DOI:10.1016/j.
isatra.2020.10.060.
[44] Chao, Q., Wei, X., Tao, J., Liu, C., Wang, Y. (2022). Cavitation 
recognition of axial piston pumps in noisy environment based 
on grad-cam visualization technique. CAAI Transactions on 
Intelligence Technology, DOI:10.1049/cit2.12101.