Strojniški vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234, © 2020 Journal of Mechanical Engineering. All rights reserved. D0l:10.5545/sv-jme.2019.6285 Original Scientific Paper Received for review: 2019-08-11 Received revised form: 2020-02-15 Accepted for publication: 2020-03-13 Deep Stacked Auto-Encoder Network Based Tool Wear Monitoring in the Face Milling Process VanThien Nguyen - VietHung Nguyen* - VanTrinh Pham Hanoi University of Industry, VietNam Tool wear identification plays an important role in improving product quality and productivity in the manufacturing industry. The actual tool wear status with input cutting parameters may cause different levels of spindle vibration during the machining process. This research proposes an architecture comprising a deep learning network (DLN) to identity the actual wear state of machining tool. Firstly, data on spindle vibration signals are obtained from an acceleration sensor. The data are then pre-processed using the fast Fourier transform (FFT) method to reveal the relevant outstanding features in the frequency domain. Finally, the DLN is constructed based on stacked auto-encoders (SAE) and softmax, which is trained with the input data on the vibration features of the respective tool wear state. This DLN architecture is then used to identify the actual wear statuses of machining tool. The experimental results from the collected data show that the proposed DLN architecture is capable of identifying actual tool wear with high accuracy Keywords: Face milling; Tool wear; Stacked auto-encoder (SAE); Deep learning network (DLN); Cast iron Highlights • An expert technique for tool wear monitoring based on an experimental dataset is explored. • The feature values with respect to the wear status of cutting tools are extracted and analyzed. • The effects of a proposed deep learning network architecture for identifying the different tool wear statuses are considered. • A patterns prediction method is compared and developed. 0 INTRODUCTION In the machining process, cutting tool play an important role and can affect the stability of machine components and systems. Cutting tool status is closely related to the vibration, cutting heat and cutting force of a machining system, which normally includes the machine tool, cutting tool, workpiece, and clamping device. The working status of a cutting tool significantly influences machining efficiency and thus the accuracy of a product [1] to [3]. Therefore, monitoring cutting tool status is important. Normally, tool wear can cause major failure of the cutting tool, which may result in wastage of a product, slow operation, and a lack of productivity. Identifying the wear status of cutting tool is especially important in evaluating product accuracy [4] and [5]. The features of the original vibration signal can provide useful information for assessing the wear condition of cutting tool and can be extracted by principal component analysis (PCA) [6] and linear discriminate analysis (LDA) [7] and [6]. Dimensional feature data, in particular, can cause a computational burden, affect the efficiency of the classification phase as it is time-consuming, and reduce the diagnosis accuracy. These methods are effective with linear data but cannot be effectively applied for complex nonlinear and nonstationary vibration data. Machine learning techniques have been researched and applied in many fields of science and technology, such as computer vision, automatic diagnostics systems and pattern recognition. These techniques are often based on artificial intelligence methods such as k- nearest neighbour (K-NN) [8], support vector machines (SVM) [9] and [10], and the artificial neural network (ANN) [11] to [13]. These methods have been effectively applied to identify tool wear status. However, they have some weaknesses, including ineffective feature extraction. In particular, the objectivity of unsupervised feature learning has been ignored, and automation is not employed in these methods [14] and [15]. Therefore, using these techniques to identify tool wear in the machining process has been unsatisfactory. Recently, deep learning network (DLN) architecture has been widely used in research [16] and some engineering science applications such as medical informatics [17], pattern recognition [18] and [19], and time-series prediction [20] to [22]. DLNs are hierarchically constructed with many hidden layers with the aim of effectiveness in the output layer [23]. In this study, the authors propose a DLN architecture based on the SAE and softmax classifier, which are closely stacked together to implement tool wear diagnosis in the end-milling process. In this architecture, each auto-encoder (AE) implements vibration data reconstruction to generate higher- *Corr. Author's Address: Hanoi University of Industry, No. 298, Cau Dien Street, Bac Tu Liem Distric, Ha Noi City, Vietnam, hung2009haui@gmail.com 227 Strajniski vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234 level features with an unsupervised algorithm that is optimized to minimize errors between the input and output data. The relevant vibration features are precisely extracted at the hidden layer of the last AE, which can significantly improve the learning effect of the classification phase. In addition, softmax is derived from the multinomial logistic model, which is based on the supervised learning algorithm and is suitable for multiclass classification [24]. Finally, the parameters of the proposed DLN will be fine-tuned with the supervised condition in its complete architecture, with the goal of effectiveness of the classification accuracy result. The rest of the paper is structured as follows: Section 1 presents the materials. Section 2 proposes a DLN architecture, which is then used to identify tool wear status of the end-milling process. Section 3 presents experimental results and discusses the diagnosis results, which are analyzed and compared. Section 4 is the conclusion. Acknowledgments and a list of references then follow. 1 MATERIALS This section expresses the single AE architecture and softmax classifier model that are the basis for constructing the DLN for the diagnosis technique. The AE is used to exploit the features of the original vibration signal related to tool wear status. These features are then used to train softmax classifier and construct the proposed DLN. 1.1 Auto-Encoder Network Architecture In [23], AE is a special type of neural network architecture that acts as an unsupervised algorithm. The AE architecture consists of three layers instead of the input layer, the hidden layer and the output layer, which are organized into two phases -encoding and decoding- as shown in Fig. 1. The input layer x = {xi, X2, ..., x„}, hidden layer f = {{/2,...,fm], m ^ n, and output layer x = {,x2,...,xn] are seamlessly connected. This AE is implemented to reconstruct data of the input layer. The encoder phase has encoded the characteristics of the high-dimensional input data x into the low-dimensional f data in the hidden layer. The input and hidden layers are connected by the activation function, f= Sigmoid(W(1)) • x+60)), in the mapping process, where W(1) is the weight matrix and b(1) is the bias vector. More specifically, each input vector xi is mapped onto the hidden layer with the expression of significant, reduced features. In contrast, the decoder phase reconstructs the input layer x. Input data f is mapped back onto the output layer of x with high-dimensional reconstructed data. The activation function x = Sigmoid(W(2)) f+ b(2)) is used to connect the hidden layer to output layer, where the weight matrix W(2)=(W(1))T is interpreted as tied weights and b(2) is the bias vector of the decoder phase. The AE is optimized architecture with parameter sets (W(1), W(2), b(1), b(2)) to minimize error of restructuring in the output layer. The following cost function is used: 228 Fig. 1. Architecture of single AE network Nguyen, V.T. - Nguyen, V.H. - Pham, V.T. Strojniski vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234 Fig. 2. Diagnosis technique based on DLN architecture c = N£?((w-fn)i s, (i) where i is the number of variables in input data; N is the number of samples; X is the coefficient for the ; ft is the coefficient for the Q,S; Q,W is an L2 regularization term defined by Eq. (2); and QS is a sparsity regularization term defined by Eq. (3). ^ =HI(W)2, (2) i i j ns =YLkl (pII pj). (3) i i where p* is the mean activation for unit j in layer k; p is the desired mean activation; and KL is the Kullback-Leibler divergence, which is defined by Eq. (4). KL(p || pk) = plogP + (1 -p)logf-Pr. (4) V ' Pj 1 - P] ft can be seen that each AE is independently trained to represent the features. The features are extracted from hidden layer nodes that contain the most important information of the input layer. The extracted features can be input data for the next AE to produce higher-level features. 1.2 Sofmax Classifier Model As a follow-up, training the softmax classifier model to identify patterns is a necessary step for the whole model of the diagnosis technique. The model uses the encoded feature data in the hidden layer of the 229 Deep Stacked Auto-Encoder Network Based Tool Wear Monitoring in the Face Milling Process Strajniski vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234 a) Fig. 3. Schematic drawing of face milling process; a) cutting zone at tool/workpiece interfacea, and b) observing of Vb tool wear on the insert last AE. Softmax uses loss function based on cross entropy [16] and [23]. Softmax based on the supervised learning algorithm requires input samples x = {x(1), x(2), ..., x(n)} and class labels t = {t(1), t(2), ..., t(k)} for the classifier model. The training process for an input sample evaluates the probability: probability (t =j |x) for each value of j = 1, 2, ..., k. This means that the probability of the class label is estimated by each of the k different possible values. Therefore, hypothesis function He (x(^ is constituted as follows: Mx (i ')= probability (t(i) = 1|x(i) probability (t(,) = 2|x(i) probability (t(i) = k |x(i) ;d) exp (x(i)) exp (x(i)) I k=Pp (x (')) exp (x(i)) (5) where 6l,62,^,6k e Rn softmax classifier, and the are the parameters of the term z: & ()) represents the normalization of the distribution. AEs and softmax classifier are hierarchically stacked together to construct the diagnosis technique. This technique is then fine-tuned with all the parameters to optimize the whole model, which will be evaluated using a test dataset. 2 PROPOSED DIAGNOSIS TECHNIQUE This section describes the DLN in the proposed diagnosis technique. The significant feature data is extracted from vibration data related to tool wear status by unsupervised algorithm-based AEs, which provides the softmax classifier with inputs. The combination of AEs with softmax in the proposed DLN architecture achieves the impressive diagnosis results. Vibration data in the frequency domain corresponding to tool wear status are inputs to the DLN. Fig. 2 shows diagnosis implementation, which comprises the following seven steps: Step 1: Acquire vibration data based on cutting tool status. Step 2: Pre-process the collected data by FFT in time domain to express data clearly in the frequency domain. Step 3: Divide the training data-testing ratio for the diagnostic phase. The training data are then used to train the model. The testing data are used to evaluate the trained model. Step 4: Exploit the important features in the hidden layer AE, which is based on an unsupervised algorithm. Step 5: Train the softmax classifier model using the extracted features of the last AE. Arrange the AEs and softmax classifier in the DLN architecture. Step 6: Train the gained DLN by fine-tuning all the parameters using the class label. Step 7: Identify the actual status of tool wear by the trained DLN. 230 Nguyen, V.T. - Nguyen, V.H. - Pham, V.T. Strojniski vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234 3 EXPERIMENTAL RESULTS This section presents the experimental results, demonstrating the quality of the proposed method. The wear status of face milling tool is identified in the face milling process. Vibration data on the state of the tool is collected, and pre-processed before serving as input to the proposed method. 3.1 Spindle Vibration Based Tool Wear Data Acquisition The experimental dataset of BEST Lab at UC Berkeley is used to identify tool wear status [25]. The authors used a Matsuura machining center model MC-510V with a 70 mm face mill mounted with six inserts of the KC710 (Kennametal) type and a cast-iron milling workpiece is. The face milling operations dataset was experimented on different conditions, and the flank wear Vb values are verifiability measured, respectively. Fig. 3 is a schematic drawing of the face milling process. Vibration data was monitored by an accelerometer with a maximum sampling rate of 100 kHz. Table 1 shows a dataset of 40 samples. Samples were collected on machine spindle vibration corresponding to four tool wear states: Vb0 = 0 mm, Vb1 = 0.11 mm, Vb2 = 0.29 mm, and Vb3 = 0.50 mm, which correspond to no tool wear, slightly worn tool, half worn tool, and severely worn tool, respectively. The acquired vibration data was recorded at the end of 2nd, 7th, 29th, and 44th minutes. Tool status was tested with the same cutting parameters at a spindle speed of 826 rpm, feed of 0.5 mm, and cutting depth of 1.5 mm. Fig. 4 shows these vibration signals in the time domain with four examples of corresponding wear statuses. The figure may imply that the vibration intensity that corresponds with cutting tool status is unclear and that the tool wear status cannot be determined even though the cutting parameter is unchanged. Table 1. Tool wear statuses based vibration data Cutting tool status Training samples Testing samples Flank wear bandwidth [mm] Class Not worn Vb0 X - X5) (x6 - x1o) - 1 First worn Vb1 (x11 - x15) (x16 - x20) 0.11 2 Second worn Vb2 (x21 - x25) (x26 - x30) 0.29 3 Third worn Vb3 (x31 - x35) (x36 - x40) 0.50 4 200 250 300 Sample number [-] Fig. 4. Example of vibration signals use for cutting tool status determination; a) normal state; b) first-worn status; c) second-worn status, and d) third-worn status Deep Stacked Auto-Encoder Network Based Tool Wear Monitoring in the Face Milling Process 231 Strajniski vestnik - Journal of Mechanical Engineering 66(2020)4, 227-234 3.2 Results and Discussion To highlight the specific vibration frequencies of tool wear statuses, original time-domain vibration data was pre-processed and transformed into the frequency domain using the FFT method. These data were then used to extract the features of cutting tool status. The feature dataset of the vibration signal was extracted from two AEs, whose parameters are shown in Table 2. Table 2. Parameter of AEs AE structure X P AE 1 512-20-512 0.05 6 AE 2 20-3-20 0.05 4 Table 3 shows the experimental results for identifying tool status. The results indicate each state of the cutting tool. The original vibration data were accurately identified by the class to which they belong. The diagnostic result for each tool wear status is 100 % accurate, which means that the expression of the unsupervised features of the original vibration signal exploited their efficient and important features. In this case, the combination of the optimized AE is related to Eq. (1) for the reconstruction of data with the softmax classifier model, which, according to the probability of Eq. (5) is very effective. Fig. 5 shows the confusion matrix for the four wear statuses of cutting tool (i.e., Vb0, Vb1, Vb2, and Vb3), which are the same class as determined by the model trained by previous training data. Table 3. Results of tool wear Identification based on our proposed method Tool wear Testing data status Features Results 1 (x6 - x10) 0.559913 0.010705266 0.007923 5(1) 2 (x16 - x20) 0.055347 0.336805625 0.012599 5(2) 3 (x26 - x30) 0.000139 0.000127181 0.000102 5(3) 4 (x36 - x40) 0.000108 9.24E-05 8.32E-05 5(4) Training time [s] 0.3072 Accuracy [%] 100 To compare the identification results with the other classifiers, the authors constructed the shallow classifiers as the feed-forward neural network (FNN) classifier and a k-nearest neighbour (k-NN) classifier to identify tool wear status. The extracted three-dimensional feature data of the last AE was used to train and test the classifiers. An FNN classifier with 10 hidden layer, four output layer, and training error goal of 0.01 and a k-NN classifier with four nearest neighbours are formed to conduct the identification procedure. Table 4 shows the evaluation result of the FNN classifier and the k-NN classifier. The evaluation showed that the identification accuracy of both these classifiers was lower than the identification accuracy of the proposed DLN. It is known that the tool wear diagnosis technique, based on the proposed DLN, effects high-level feature representation in deep learning to gain high classification accuracy. Nevertheless, the perfect classification accuracy results for the proposed DLN come at a high price in terms of time compared to the only k-NN architecture, as the results in Table 3 and 4 show. This causes each phase of DLN construction to be optimized. Finally, the DLN based our proposed diagnosis technique is confirmed as efficient for feature representation and classification, which is illustrated in Fig. 6. 5 25.0% 0 0.0% 0 0.0% 0 0.0% 100% 0.0% 0 0.0% 5 25.0% 0 0.0% 0 0.0% 100% 0,0% 0 0.0% 0 0.0% 5 25.0% 0 0.0% 100% 0,0% 0 0.0% 0 0,0% 0 0.0% 5 25.0% 100% 0,0% 100% 0.0% 100% 0.0% 100% 0.0% 100% 0.0% 100% 0.0% Si %