https://doi.org/10.31449/inf.v46i3.4016 Informatica 46 (2022) 393-402 393 Intelligent Analysis and Processing Technology of Big Data Based on Clustering Algorithm Zheng Zheng 1 , Fukai Cao 1* , Song Gao 2 , Amit Sharma 3 1 Jitang College, North China University of Science and Technology, Tangshan,063210, China 2 Tangshan Power Supply Company, State Grid Jibei Electric Power Co., Ltd, Tangshan,063000, China 3 Southern Federal University, Russia Emails: zhengzheng873@163.com, fukaicao@126.com, songgao56@163.com, amit.amitsharma90@gmail.com Keywords: Clustering algorithm; Big data intelligence; Smart meter; Project cost; Genetic algorithm Received: February 15, 2022 An attribute category clustering method based on hierarchical clustering is proposed in order to study the big data intelligent analysis and processing technology. The proposed model combines the attribute categories with similar fault type distribution, reduces the data dimension, and binarizes it. To address the problem of more missing values of continuous data, a data completion method based on attribute distribution function is adopted. Through the perspective of selection and estimation of project unit price in construction enterprises, this paper summarizes the data mining process facing the characteristics of project cost data, and puts forward the method of analyzing and processing project cost data based on clustering algorithm. Finally, the processed data sets are subjected to bottom-up hierarchical clustering analysis, and finally the ideal analysis results can be obtained. The experimental results show that the preprocessing method based on attribute clustering proposed in this paper can effectively merge attributes, reduce the dimension after binary transformation and effectively reduce the amount of data under the condition of ensuring data information. Povzetek: S hierarhičnim gručenjem je narejena inteligentna analiza velikih podatkov. 1 Introduction The hidden value of big data promotes the derivation of big data mining technology and methods. Big data mining is to mine valuable knowledge for data processing through massive multiple data sources. Therefore, how to quickly and accurately mine valuable knowledge through big data has attracted much attention. In fact, data mining is also a decision support process. Its common methods mainly include classification, clustering, prediction, regression analysis, association rules and so on. Clustering is the most key technology. Big data is unstructured data, which is difficult and large in processing and analysis, making the structural analysis mode too complicated, and the traditional data analysis cannot effectively process, mine and analyze as shown in Figure 1 [1]. The classical methods of cluster analysis can be summarized as: partition method, hierarchical method, density-based method, grid-based method, model-based method, neural network method based on computational intelligence, evolutionary computing method, fuzzy method and so on, as well as the semi supervised clustering method which has attracted much attention at present. Recently, the new cluster integration method has rapidly become a new research hotspot of cluster analysis. The purpose of clustering integration is to fuse the results from multiple clustering algorithms to obtain higher quality and robust clustering results. The method based on graph theory is one of the fast- developing methods recently. It is a method to realize clustering by using the principles of graph theory and graphics. Compared with traditional algorithms, this algorithm can deal with more complex cluster structures, such as nonconvex structures, and can converge to the global optimal solution [2]. In recent years, with the rapid development of network information technology, the era of big data has come and penetrated into many fields. There are more and more big data application research for specific professional fields. However, for the field of project cost, this aspect has always been a blank. Every day, with the help of the Internet and various project cost systems, a large number of project cost data are generated, but there is no scientific and accurate processing method to process it, so that it is lost in vain. The acquisition and transmission of project cost information still rely on the traditional way, and the timeliness and accuracy cannot meet the needs of today's project management field [3]. To process and mine these huge project cost information data and provide basis and reference for the decision- making of project management process, it is not enough to rely on manual processing technology. We should innovate and apply data mining technology to make full use of the value of massive project cost data, so as to promote the rapid and healthy development of the industry. 394 Informatica 46 (2022) 393-402 Z. Zheng et al. Figure 1: Big data intelligent analysis and processing technology The rest of manuscript is organized as the most recent work done is discussed in Section 2. The research methodology, optimization of clustering algorithm, complexity and project acquisition is presented in Section 3. Results and analysis of the proposed model is discussed in Section 4 which is followed by the conclusion in Section 5. 2 Related work In this section various state-of-the-art work in the field of big data processing based on clustering algorithm is presented. Zhu et al. proposed an initial clustering center selection method based on point density, and processed outliers specially [4]. Ser et al. proposed an improved algorithm to determine the optimal cluster number k by calculating the contour coefficient of each object in the cluster under different K values, and determine the initial cluster center by hierarchical aggregation method [5]. Wu proposed a clustering method based on patent technology efficacy matrix. This method uses K-means to cluster by calculating the similarity of technology, and achieves good results. K-medoids and PAM algorithms are very effective for small data sets, but they do not have good scalability for large data sets [6]. Duan and Wang proposed a new heuristic search algorithm clarans algorithm based on PAM [7]. The algorithm finds the center point of the representative cluster by random search of the graph. Clarans algorithm is the first clustering algorithm successfully applied in the field of spatial data mining. It overcomes the shortcomings that other classical clustering algorithms cannot deal with large-scale data sets, but it still fails to solve the problem of low execution efficiency. Its time complexity is 2O (KN). In order to speed up the execution speed of the algorithm, the parallel clarans algorithm based on PVM mechanism proposed by Xing and Li effectively improves the speed of the algorithm [8]. In the artificial neural network, Cai applied the classical hierarchical clustering algorithm and partition algorithm to cluster SOM, which aims to reduce the computational complexity of the classical clustering method [9]. In addition, in terms of network application: Xu et al. proposed a three-dimensional facial expression clustering method based on network, which overcomes the shortcomings of limited information contained in data and sharp decline in recognition performance in the case of two-dimensional facial expression recognition [10]. In terms of project cost, Li et al. others established the power grid cost management method system and the construction framework of cost analysis information platform under the big data environment [11]. Shi and Intelligent Analysis and Processing Technology of Big… Informatica 46 (2022) 393-402 395 Zhu designed the cost management system of mine engineering construction project based on cost data [12]. Wendong et al. put forward the statistics and analysis method of project cost information data under the background of big data, and constructed the statistical calculation model of project cost information data [13]. The evolution of artificial intelligence and Internet of Things is considered for several industrial applications and contributing towards social life [14-17]. 3 Research methods This section includes the project design process, structural seismic analysis and detailed modeling steps of proposed design. As unstructured data, big data is difficult to be characterized by two-dimensional logic table of database. The multi-dimensional de clustering analysis algorithm shows the hidden structure of observation variables through the Bayesian network model structure, and constructs the logical correlation between leaf nodes and other nodes. In this model, multiple hidden variables are allowed to exist, corresponding to the corresponding data clustering methods. Based on the probability dependence between random variables, the multi-dimensional de reunion class analysis algorithm analyzes unstructured data, and quantitatively describes the reasonable distribution with the conditional concept as the carrier. The specific flow of data processing is as follows: Data preprocessing, that is, data cleaning, avoiding noise and solving the problem of data loss. During data processing, discrete the continuous values in the attribute and convert the data. The data result set and test training set are studied, and the data set is divided into two parts: data result set and test training set. The classifier is constructed by classification algorithm. Through the test set, the accuracy evaluation mode is selected to evaluate the classifier. The classifier that meets the accuracy standard is applied in practice, otherwise it will be modified. Figure 2: Data processing flow of cluster analysis model Word segmentation and document vectorization processing, reorganize the continuous word sequence according to the established norms to form the word sequence. In order to transform the document after word segmentation into a pattern that can be recognized and processed by computer, it is necessary to quantify the word features as the feature vector, which is currently processed by vector space model. Feature selection and multi-dimensional cluster analysis, word features will lead to a certain sparsity and high dimensionality in the document vector feature space, so an effective feature selection method is selected to reduce the dimension of the feature space and further improve the classification efficiency and accuracy [18]. The detailed data processing steps of the analysis model are shown in Figure 2. Clearly build a functional model for the classification process of big data and unstructured data. The problem can be described as a given data set and category set which is evaluated using Equation 1 and 2.   m F F F F F F , , , , , 4 3 2 1    (1)   m G G G G , , , , G 3 2 1    (2) The classification problem is to clarify the function mapping to make the data items of the data set map to the corresponding categories. Given the big data variable set, the variable takes the parent node set as the carrier, the carrier correlation between nodes can be characterized by a directed graph, that is, for each variable, it can be characterized as a node, and each node guides a directed edge from each directed node of the parent node set to enter the variable. Suppose that the variables of Bayesian network are a and b respectively, and X is the node set without a and b. once z separates a and b, the conditions remain independent based on a given z. The so-called isolation and conditional independence show the close relationship between the graph theory side and probability theory side of Bayesian network. Set to classify objects based on the evidence provided by the feature vector, then: j i x vi e x vj e               (3)     vi e vi x e vj e vj x e                (4) Decision rules are likelihood test rules which are evaluated using Equation 3 and 4. Bayesian network reasoning, through probability decomposition, reduces the reasoning complexity to localize the operation. Through the edge processing and analysis of the elimination process, the decision rules can be tested by likelihood rate for all given large data sets to obtain the minimum error probability calculation samples [19]. 396 Informatica 46 (2022) 393-402 Z. Zheng et al. 3.1 Optimization of clustering algorithm Based on the function model, an optimized clustering algorithm is constructed to divide the overall big data into multiple data intervals, which are stored through multiple files, and each file represents the corresponding interval. After scanning and comparing all the data, divide them into multiple sections, and sort and remove multiple files. The data quantity of each file is 1𝑀 and 2𝑀 respectively. After the data is de duplicated, cluster analysis is carried out and Bayesian formula is used to calculate, that is calculated in Equation 5.   2 log , log M n f F N e Max F N BIC                 (5) 𝑀𝑎𝑥 𝛼 log 𝑒 ( 𝑁 𝐹 , 𝛼 )represents the effect of data and model integration; 𝑓 ( 𝑛 ) 𝑙𝑜𝑔 represents that when the data is closely integrated with the model, it should be taken as the negative amount of difference, while when combined with sparse, it should be taken as the compensation amount. Based on the specific specification of Bayesian formula and the organic combination of model and data, on the basis of meeting the clustering characteristics, it is necessary to calculate and analyze the model through multi-dimensional clustering algorithm. The input of this algorithm contains m objects. The objects in the same cluster have high similarity, on the contrary, the similarity is small. The algorithm description process is shown in Figure 3. Figure 3: Algorithm description process 3.2 Complexity The space cost generated by the new algorithm needs to fully consider the characteristic samples of big data. If hierarchical clustering is used to optimize the clustering algorithm, all clusters to be clustered need to be reasonably set according to the serial mode, the total clustering time ( 𝑅 ) and the cost ( 𝑛 ) . Then the space complexity (W) is expressed in Equation 6.   m m R n W log 2   (6) In terms of optimization rules, when the model and data fusion are sparse, set x and y as the dimension of the data set. When dividing attributes, only scan the data set at one time, in which 𝑧 identifies clustering data, and the results will not be affected by factors such as multidimensional space and input order [20, 21]. Then the multi-dimensional spatial clustering can be found in time through the evaluation of weight and threshold, and the amount of calculation can be simplified. The total clustering time 𝑢 × 𝑛 is divided by the linear arrangement of the consumption time ( 𝑛 ) and the de duplication time ( 𝑚 ) ; Total weight removal time 𝑢 × 𝑛 ; Time complexity 𝑢 × 𝑚 2 log m, then the total time complexity of the algorithm is calculated by Equation 7.   m m u m u m u m R log 2       (7) 3.3 Acquisition of project cost data There are two ways to obtain project cost data based on big data. i. There are generally two methods of internal collection in the platform. First method is to build a unified project cost information data collection template, collect and import the relevant data in the platform according to the user-defined unified specifications, so as to directly convert the target cost data information and store it in the local database for backup. The second method is to set up fields conforming to certain specifications on the relevant cost information platform, collect the information of the same field and store it in the local database [22]. ii. The specific methods and principles are as follows: create a unified data exchange format through the corresponding platform interface, and realize the information exchange of relevant businesses inside and outside the platform. According to the collection method and the form of price change trend, we generally use the box method to process the project cost data studied in this paper. Before processing, we must first solve the problem of detection. For the detection of noise data, the change of cost data is mainly based on the overall change of market economy [23]. From the perspective of time series, it changes continuously, and is largely affected by the overall economic development. Generally, there will be no major Intelligent Analysis and Processing Technology of Big… Informatica 46 (2022) 393-402 397 fluctuations and changes. We set the percentage of the annual change threshold range of cost data to 19%. Within the sampling range, the data points exceeding 20% of the average value are regarded as noise, the regression curve is calculated, and its value is re solved and corrected [24-26]. Handling method of inconsistent data format: to deal with the problem of inconsistent data format, the common method is to establish a general data acquisition template and collect according to the general data template to ensure the consistency of data acquisition format. According to the requirements and characteristics of data analysis in this paper, the data acquisition template is established, as shown in Table 1 and Table 2. Listing Type Accuracy Format Explain Region text -- -- Area code Number double 1 XXX Sample number Company text -- -- Collection unit Unit Price double 0.02 XX.XX Unit Price Single time Date s -- Acquisition time Source Date … -- Data sources Table 2: Template description - labor unit price expense template As the material cost accounts for a large proportion of the project cost, usually about 0% ~ 70%, the material price has a great impact on the specific final settlement results and decisions [27]. Therefore, this paper selects the material price as the research object, and focuses on the specific application of material price data in the fields of relevant project cost index prediction, project price information analysis and investment estimation. Due to the dynamic, massive, multi-source and heterogeneous characteristics of project cost big data, we choose K- means clustering algorithm for specific solution [28]. 4 Results and Analysis This section illustrates the analysis of results obtained by comparing the seismic forces and finally presents its discussion and summary. In this proposed model, cluster analyze the quotations of 20 local suppliers for composite Portland cement. Number Region Specifications Unit Price Source 1 SSX PC32.1 452 Merchant A 2 SSX PC32.1 326 Merchant A 3 SSX PC32.1 419 Merchant A 4 SSX PC32.1 385 Merchant A 5 SSX PC32.1 453 Merchant A 6 SSX PC32.1 376 Merchant A 7 SSX PC32.1 413 Merchant A 8 SSX PC32.1 306 Merchant A 9 SSX PC32.1 378 Merchant A 10 SSX PC32.1 403 Merchant A 11 SSX PC32.1 487 Merchant A … SSX PC32.1 … Merchant A 20 SSX PC32.1 346 Merchant A Table 3: Data acquisition results The 20 data listed in Table 3 are combined according to the price and serial number to obtain the initial data set 𝐴 , 𝐴 is { 𝑥 1 , 𝑥 2 , 𝑥 3 , … , 𝑥 20 }. Before calculation, it should be noted that K-Means algorithm must give K value before solution, which directly Number Region Unit Time Source 1 Jiangsu yuan January Data survey 2 Shanghai yuan February Data survey 3 Beijing yuan March Data survey Table 1: Data collection template - labor unit price expense template 398 Informatica 46 (2022) 393-402 Z. Zheng et al. determines the accuracy and efficiency of the algorithm. This paper determines the 𝐾 value according to the following methods: firstly, compare the distance between each sample in the sample data set, select the point furthest from other points as the initial center point of the calculation according to the calculation results, and then determine the value of K through the newly generated classification [29, 30]. i. Select the two data with the smallest distance in the data sequence. In this example, the distance between the two points 𝑥 9 and 𝑥 12 is the largest. Take these two points as the center of the cluster for cluster calculation to obtain two cluster sets. They are: 𝑆 21 = {𝑥 9 , 𝑥 2 , 𝑥 4 , 𝑥 8 , 𝑥 10 , 𝑥 13 , 𝑥 14 , 𝑥 18 } and 𝑆 22 = { 𝑥 12 , 𝑥 3 , 𝑥 5 , 𝑥 7 , 𝑥 11 , 𝑥 12 , 𝑥 16 , 𝑥 17 }. ii. Combined with the above clustering calculation results, for the two clustering sets, first solve the first type of data and cluster center 𝑥 9 respectively, for example, to obtain the farthest distance of 83, the second type of data and cluster center 𝑥 12 respectively, with the maximum distance of 85, and then select the point 𝑥 11 with the maximum distance as the third cluster point. iii. Recalculate, select 𝑥 9 , 𝑥 12 and 𝑥 11 as three cluster centers, and calculate three cluster sets as follows: 𝑆 31 = {𝑥 9 , 𝑥 2 , 𝑥 10 , 𝑥 20 }, 𝑆 32 = { 𝑥 12 , 𝑥 1 , 𝑥 5 } and 𝑆 33 = {𝑥 11 , 𝑥 3 , 𝑥 4 , 𝑥 6 , 𝑥 7 , 𝑥 8 , 𝑥 10 , 𝑥 13 , 𝑥 14 , 𝑥 15 } iv. Calculate the distance between the data elements in the three set classes and each cluster center, continue the cluster analysis, and then obtain four cluster sets [31]. v. Based on the above calculation results, the cluster numbers of different cluster centers are listed, as shown in Table 4. Serial number Center point Numerical value Number of clusters 1 9 x 315 5 2 11 x 406 4 3 12 x 475 4 4 18 x 413 9 Table 4: Cluster analysis results According to the analysis of the results of the clustering algorithm in Table 4 and Figure 4, point 𝑥 18 is the center with the largest number of clustering samples in all clustering centers, so it can better reflect the real price of the market compared with other centers [32]. Taking this as an example, in the practical application of project cost budget and final accounts, we can analyze the market price of materials through the data mining algorithm proposed in this paper. By analyzing the solution results, we can assist relevant personnel to accurately grasp the market price information and help auditors judge the authenticity of price information in time. Figure 4: Results of clustering algorithm Figure 5(a): Result for different size of datasets for information loss Intelligent Analysis and Processing Technology of Big… Informatica 46 (2022) 393-402 399 Figure 5(b): Result for different size of datasets for execute time Different number of records are separated from the grown-up dataset and assess the exhibition of further developed anonymity model on various size of datasets, as depicted in Figure 5 (a and b). As shown in this figure, execute time increment and information loss with the increasing size of datasets. Execute time rises quickly, yet the incensement of information loss reportedly slows progressively. Clearly, the rising size of datasets genuinely affects execution time on the grounds that the grouping system of finding proportionality classes is perplexing and time consuming. Figure 6: Performance comparison of time measured for dataset 1 Figure 7: Performance comparison of time measured for dataset 2 The performance of the proposed clustering scheme is measured on two different datasets, dataset 1 i.e., BoW (Bag of words) dataset and dataset 2 i.e., HOUSE (household electric power consumption) dataset. To analyze the performance of clustering cost of proposed algorithm we have compared it with existing baseline models. The value of 𝑘 is considered as 40 and 80 for BoW and HOUSE datasets. Figure 6 and 7 illustrates the experimental analysis of HOUSE and Bag of words (BoW) datasets and the total running time of proposed model is observed. It is observed from the experimentation that the proposed model achieves higher performance in comparison with K-means ++, K-means and K-means || when implanted to execute in parallel. 5 Conclusions Different data analysis and mining methods are required for different purposes of project cost data mining under the background of big data. From the perspective of the selection and estimation of engineering unit price in construction enterprises, this paper summarizes the data mining process facing the characteristics of engineering cost data, and puts forward the method of analyzing and processing engineering cost data based on clustering algorithm. The proposed model provides a meaningful exploration for the research of massive engineering cost data mining. From the experimentation it is analyzed that the proposed clustering model achieves better time measurement when compared with existing baseline models. The clustering models based on computational intelligence are proposed. However, these intelligent technologies are not organically integrated. Machine learning and data mining technology have made great breakthroughs in today's academic and industrial circles. Therefore, how to integrate various intelligent technologies to give full play 0 10 20 30 40 50 60 70 80 K-means K-means || K-means ++ Proposed clustering method Time for dataset 1 (mins) Running time of dataset 1 Performance analysis for dataset 1 k=40 k=80 0 10 20 30 40 50 60 70 80 K-means K-means || K-means ++ Proposed clustering method Time for dataset 2 (mins) Running time of dataset 2 Performance analysis for dataset 2 k=40 k=80 400 Informatica 46 (2022) 393-402 Z. Zheng et al. to the functional characteristics of this kind of algorithm applied to cluster analysis is also one of the future research directions. References [1] Li, W., & Huang, Q. (2017). Research on intelligent avoidance method of shipwreck based on bigdata analysis. Polish Maritime Research. 10.1515/pomr-2017-0125 [2] Li, L., Wang, J., & Li, X. (2020). Efficiency analysis of machine learning intelligent investment based on K-means algorithm. Ieee Access, 8, 147463-147470. 10.1109/ACCESS.2020.3011366 [3] Dong-rui, L. (2017). Cluster analysis algorithm based on key data integration for cloud computing. International Journal of Reasoning- based Intelligent Systems, 9(3-4), 123-129. 10.1504/IJRIS.2017.090041 [4] Zhu, K., Joshi, S., Wang, Q. G., & Hsi, J. F. Y . (2019). Guest editorial special section on big data analytics in intelligent manufacturing. IEEE Transactions on Industrial Informatics, 15(4), 2382-2385. 10.1109/TII.2019.2900726 [5] Del Ser, J., Sanchez-Medina, J. J., & Vlahogianni, E. I. (2019). Introduction to the special issue on online learning for big-data driven transportation and mobility. IEEE Transactions on Intelligent Transportation Systems, 20(12), 4621-4623. 10.1109/TITS.2019.2955548 [6] Wu, C. (2019, June). Research on Clustering Algorithm Based on Big Data Background. In Journal of Physics: Conference Series (V ol. 1237, No. 2, p. 022131). IOP Publishing. 10.1088/1742-6596/1237/2/022131 [7] Duan, S., & Wang, Z. (2021). Research on the service mode of the university library based on data mining. Scientific Programming, 2021. https://doi.org/10.1155/2021/5564326 [8] Xing, Z., & Li, G. (2019). Intelligent classification method of remote sensing image based on big data in spark environment. International Journal of Wireless Information Networks, 26(3), 183-192. https://doi.org/10.1007/s10776-019-00440-z [9] Cai, Z. M. (2020). Network community partition based on intelligent clustering algorithm. Компьютерная оптика, 44(6), 985- 989. 10.18287/2412-6179-CO-724 [10] Xu, Z., Shi, D., & Tu, Z. (2021). Research on diagnostic information of smart medical care based on big data. Journal of Healthcare Engineering, 2021. https://doi.org/10.1155/2021/9977358 [11] Li, W., Luo, Y ., Tang, C., Zhang, K., & Ma, X. (2021). Boosted Fuzzy Granular Regression Trees. Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/9958427 [12] Shi, F., & Zhu, L. (2019). Analysis of trip generation rates in residential commuting based on mobile phone signaling data. Journal of Transport and Land Use, 12(1), 201-220. http://dx.doi.org/10.5198/jtlu.2019.1431 [13] Wendong, X., Yuanfeng, L., & Deli, C. (2017). Algorithm of key data ensemble clustering and approximate analysis in cloud computing. International Journal of Reasoning- based Intelligent Systems, 9(3-4), 177-184. 10.1504/IJRIS.2017.090038 [14] Singh, P. K., & Sharma, A. (2022). An intelligent WSN-UA V-based IoT framework for precision agriculture application. Computers and Electrical Engineering, 100, 107912. https://doi.org/10.1016/j.compeleceng.2022.107912 [15] Zeng, H., Dhiman, G., Sharma, A., Sharma, A., & Tselykh, A. (2021). An IoT and Blockchain‐based approach for the smart water management system in agriculture. Expert Systems, e12892. https://doi.org/10.1111/exsy.12892 [16] Sharma, A., & Singh, P. K. (2021). UA V‐based framework for effective data analysis of forest fire detection using 5G networks: An effective approach towards smart cities solutions. International Journal of Communication Systems, e4826. https://doi.org/10.1002/dac.4826 [17] Sharma, A., Singh, P. K., & Kumar, Y . (2020). An integrated fire detection system using IoT and image processing technique for smart cities. Sustainable Cities and Society, 61, 102332. https://doi.org/10.1016/j.scs.2020.102332 [18] Tseng, F. H., Cho, H. H., & Wu, H. T. (2019). Applying big data for intelligent agriculture-based crop selection analysis. IEEE Access, 7, 116965- 116974. 10.1109/ACCESS.2019.2935564 [19] Zhao, Y ., Ding, F., Li, J., Guo, L., & Qi, W. (2019). The intelligent obstacle sensing and recognizing method based on D–S evidence theory for UGV . Future Generation Computer Systems, 97, 21-29. https://doi.org/10.1016/j.future.2019.02.003 [20] Yuan, W., Deng, P., Taleb, T., Wan, J., & Bi, C. (2015). An unlicensed taxi identification model based on big data analysis. IEEE Transactions on Intelligent Transportation Systems, 17(6), 1703- 1713. 10.1109/TITS.2015.2498180 Intelligent Analysis and Processing Technology of Big… Informatica 46 (2022) 393-402 401 [21] Wang, L. (2021, December). Intelligent analysis of accounting information processing under the background of big data. In 2021 2nd International Conference on Big Data Economy and Information Management (BDEIM) (pp. 461-464). IEEE. 10.1109/BDEIM55082.2021.00100 [22] Ma, X., Wang, Z., Zhou, S., Wen, H., & Zhang, Y . (2018, June). Intelligent healthcare systems assisted by data analytics and mobile computing. In 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC) (pp. 1317-1322). IEEE. 10.1109/IWCMC.2018.8450377 [23] Hu, H., Tang, B., Gong, X., Wei, W., & Wang, H. (2017). Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks. IEEE Transactions on Industrial Informatics, 13(4), 2106-2116. 10.1109/TII.2017.2683528 [24] Vedavathi, N., Dharmaiah, Ghuram, Venkatadri, Kothuru and Gaffar, Shaik Abdul. Numerical study of radiative non-Darcy nanofluid flow over a stretching sheet with a convective Nield conditions and energy activation. Nonlinear Engineering, 10(1), 159-176, 2021. https://doi.org/10.1515/nleng-2021-0012 [25] Hayat, Tasawar, Ullah, Inayat, Muhammad, Khursheed and Alsaedi, Ahmed. Gyrotactic microorganism and bio-convection during flow of Prandtl-Eyring nanomaterial. Nonlinear Engineering, 10(1), 201-212, 2021. https://doi.org/10.1515/nleng-2021-0015 [26] Li, Zhenfang, Gao, Dong, Wu, Chuanji, Lv, Guoqing, Liu, Xin, Zhai, Haoran and Huang, Zhanfang. Mechanical performance of aerated concrete and its bonding performance with glass fiber grille. Nonlinear Engineering, 10(1), 240-244, 2021. https://doi.org/10.1515/nleng-2021-0018 [27] Liang, H., Yun, C., Kan, M. J., & Gao, J. (2019). Research and application of element logging intelligent identification model based on data mining. IEEE Access, 7, 94415-94423. 10.1109/ACCESS.2019.2928001 [28] He, Z., He, Y ., Liu, F., & Zhao, Y . (2019). Big data- oriented product infant failure intelligent root cause identification using associated tree and fuzzy DEA. IEEE Access, 7, 34687-34698. 10.1109/ACCESS.2019.2904759 [29] He, X., Wang, K., Lu, H., Xu, W., & Guo, S. (2020). Edge qoe: Intelligent big data caching via deep reinforcement learning. IEEE Network, 34(4), 8-13. 10.1109/MNET.011.1900393 [30] Lei, Y ., Jia, F., Lin, J., Xing, S., & Ding, S. X. (2016). An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data. IEEE Transactions on Industrial Electronics, 63(5), 3137-3147. 10.1109/TIE.2016.2519325 [31] Srivani, B., Sandhya, N., & Padmaja Rani, B. (2020). Literature review and analysis on big data stream classification techniques. International Journal of Knowledge-Based and Intelligent Engineering Systems, 24(3), 205-215. 10.3233/KES-200042 [32] Liu, X., Sun, Q., Lu, W., Wu, C., & Ding, H. (2020). Big-data-based intelligent spectrum sensing for heterogeneous spectrum communications in 5G. IEEE Wireless Communications, 27(5), 67-73. 10.1109/MWC.001.1900493 402 Informatica 46 (2022) 393-402 Z. Zheng et al.