https://doi.org/10.31449/inf.v49i12.7292 Informatica 49 (2025) 1–18 1 Improved Generative Adversarial Network and Particle Swarm Optimization Support Vector Machine for Tennis Serving Behavior Analysis Haibo Cao School of Physical Education, Xinyang Normal University, Xinyang, 464000, China E-mail: caohb@xynu.edu.cn Keywords: video images, generate adversarial networks, particle swarm optimization algorithm, support vector machine, behavior analysis Received: October 9, 2024 This study proposes a behavior analysis model based on an improved generative adversarial network and particle swarm optimization support vector machine algorithm for deblurring and feature extraction in tennis video serving behavior. The model first improves the generative adduction network by introducing a multi-layer convolution structure and a variety of activation functions, including three-layer convolution. The activation function is selected to use ReLU and Leaky ReLU alternately to enhance the generator in capturing image details. During model training, the generator optimizes the output image by minimizing the Wasserstein distance, and the discriminator evaluates the difference between the generated image and the real image. Then, to further extract features, the particle swarm optimization algorithm was used to dynamically optimize the feature extraction of each frame in the feature space, and dynamically adjust the inertia weights. The initial value was 0.9 and the final value was 0.4. After feature extraction, the data were input into SVM for classification. The penalty parameter of SVM was set to 1 and the accuracy was set to 0.001. The results of the comparative experiments demonstrated that the proposed method exhibited superior performance in deblurring images, with an average subjective score of 81.16 points, a notable advantage over the comparison algorithm. In the objective evaluation, the average Peak Signal-to-Noise Ratio (PSNR) and structural similarity value of the image after defuzzing by the research method reached 35.12dB and 0.93, respectively. The PSNR and structural similarity of the image increased by 13.56% to 18.29% and 8.33% to 19.90%, respectively. In the feature extraction and classification experiments, the accuracy of the proposed algorithm reached 91.24%, which was significantly higher than the traditional algorithm. The convergence speed was faster than the particle swarm optimization algorithm, the ant colony optimization algorithm, and the simulated annealing algorithm, reducing the number of iterations by 35.33%, 40.52%, and 51.55%, respectively. The data validate that the designed method has good application prospects in improving video image quality and feature extraction. Povzetek: Raziskava se ukvarja z analizo teniškega servisa, ki z izboljšanimi algoritmi GAN in PSO-SVM poboljša ugotovitve iz slik. 1 Introduction tactical analysis [3-4]. Currently, many industry scholars have researched the image processing technology and In light of the accelerated advancement of digital action behavior analysis. Ding Q et al. proposed a media and video technology, the analysis and camera-based long-term trajectory tracking technique to processing of video content have become increasingly improve the effectiveness of multi-target tracking important in many fields. Especially in sports science technology in sports game feature recognition. This study and sports analysis, how to extract valuable first improved the Tracking-Learning-Detection (TLD) information from video images has become a research algorithm and then integrated machine learning methods focus. Tennis, as a dynamic and complex sport, often into the improved algorithm. This method has suffers from blurring in its video images due to various significantly improved performance and can be factors such as the athlete's rapid movement, different effectively applied to feature extraction in sports events shooting angles, and changes in lighting. These factors [5]. Mulimani D et al. developed a video preprocessing can lead to a decrease in image quality, affecting game technique under a new framework. This method first analysis and technical evaluation [1-2]. Therefore, calibrated players and classified occlusions to ensure improving the clarity and information extraction ability accurate identification of athletes in complex game of Tennis Video Images (TVI) has important practical environments. Subsequently, by utilizing the system to significance for athlete performance evaluation and 2 Informatica 49 (2025) 1–18 H. Cao track and label athletes on the court, the framework introduced a skeleton attention module in the action significantly improved the tracking accuracy of recognition data system, which shoots the skeleton basketball players and provided more reliable technical sequence onto a single-RGB frame to assist focusing on support for fairness in the game [6]. Zhang J et al. the limb motion area. Experiments on the NTU RGB+D developed a new spatial attention and temporal dilation and SYSU benchmarks have shown that compared to GCN that uses a self-attention mechanism to select SOTA methods, this model achieved competitive human joints that are beneficial for action recognition, performance while reducing network complexity [8]. The thereby reducing the impact of data redundancy and results of related work are summarized in Table 1. noise. Extensive experiments on NTU-RGB+D and While the aforementioned research has yielded promising Kinetics Skeleton have shown that this method outcomes in the domains of video image processing and completes State-of-the-art (SOTA) performance in action skeleton-based action recognition [7]. Zhu X et al. Table 1: Research status analysis Accuracy PSNR Method SSIM Major deficiency (%) (dB) Multi-target tracking accuracy is insufficient in Ding Q et al. [5] 82 28.5 0.76 complex background. The real-time and accuracy of rapidly changing Mulimani D et al. [6] 80.5 29 0.79 scenes are insufficient. Data redundancy and noise are likely to affect Zhang J et al. [7] 86 30.8 0.82 the processing of diverse actions. The recognition ability in complex motion Zhu X et al. [8] 84 31.1 0.81 scenes needs to be improved. behavior analysis, it has not yet fully addressed the intricacies of the fuzzy states that arise in motion scenes. blur and information redundancy caused by dynamic When faced with information redundancy and feature scenes. Assuming that the improved GAN structure will overlap, it is easy to encounter the problem of effectively enhance feature extraction capability misidentification of action behavior. This study aims to through multi-layer convolution and multiple activation propose a method based on the combination of functions. Therefore, the goal of improving Peak improved a Generative Adversarial Network (GAN) Signal-to-Noise Ratio (PSNR) and Structural Similarity and Particle Swarm Optimization - Support Vector (SSIM) indicators, and achieving more accurate Machine (PSO-SVM) for fuzzy removal and feature recognition of tennis serving actions can be achieved. extraction in TVI processing. The innovation of this The analysis of tennis serve movements is not only method lies in combining improved GAN and PSO important for improving athletes' technical level but algorithms to optimize the overall process of deblurring also has significant implications in sports training, and feature extraction, making it suitable for complex match judgment, and tactical development. Therefore, motion scenes. By dynamically adjusting the inertia this study is expected to provide effective support for weight, the convergence speed and accuracy of the PSO sports science and intelligent sports applications. algorithm have been improved, effectively enhancing Tennis is a popular competitive sport, divided into the performance of the model at the feature extraction singles and doubles forms. Participants throw the ball level. and hit it with a racket to make it land on the opponent's court. In tennis matches, serving is the way the game 2 Methods and materials begins and an important part that determines the pace and strategy of the game. Serving is a highly technical 2.1 Deblurring processing based on video action in tennis that involves multiple key elements, images including preparation, pitching, hitting, and swinging. Serving is not only a technical action but also a The goal of this research is to propose a method based strategic behavior. Athletes can choose different types on the combination of improved GAN and PSO-SVM of serve and make different choices based on their to improve the deblurring effect and feature extraction opponents' weaknesses and court conditions. Choosing ability of TVIs, thereby improving the accuracy of which side of the court to serve on can affect the tennis serve motion recognition. The research opponent's receiving angle and preparation. When specifically focuses on solving the problem of image Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 3 serving, the athlete's psychological state can also affect represents the true state of the athlete at the moment of their performance. At the same time, the serving serving. By restoring the original image, the serving behavior of athletes is influenced by various factors, behavior can be analyzed more accurately. K is the including physical fitness, proficiency in serving skills, convolution kernel. In serve analysis, convolutional the athlete's judgment of the game progress, kernels can be used to simulate the trajectory of an understanding of the opponent, and scientific training athlete's swing and ball flight, aiming to understand the methods and feedback mechanisms. In terms of source of ambiguity. N is additive noise, which may physical fitness, technical proficiency, and tactical be interference caused by device noise or changes in awareness, athletes need to continuously improve ambient light. Analyzing this noise can improve the through their own efforts. In terms of scientific training accuracy and reliability of serving moments.  is a methods and feedback mechanisms, this study believes convolution operation. In the analysis of serving that using video image processing technology can behavior, this operation helps identify and simulate the provide a detailed analysis of athletes' serving cause of blurring, thereby restoring the original image movements. By capturing the athlete's serve process at a and analyzing the behavior. In the context of blurry high frame rate, analyzing the movements in great images, the Camera Response Function (CRF) is an detail, and using computer vision algorithms to process important tool. It helps to understand and process image the video, it is possible to extract keyframes, measured data by describing how the camera converts light into serve angles, speeds, and technical elements. By using image pixel values, as given by equation (2) [10]. machine learning to analyze serving data, patterns and g(I  performance characteristics of athletes during training S (i) ) = IS '(i) (2) can be identified. g In equation (2), is the CRF approximation function, The occurrence of blurry video images in TVI can be which describes the conversion of light rays into pixel attributed to a number of factors, including the camera's values. The significance of CRF lies in its capacity to shutter speed being inadequate to capture fast-moving rectify discrepancies in image brightness resulting from objects during rapid motion, such as the swing of a disparate camera models or shooting conditions, racket or the trajectory of a ball. The shaking or thereby enhancing image uniformity and comparability. vibration of the shooting equipment can cause the entire  is a constant value, usually set to 2.2 by default. image to blur. Shooting tennis matches in low light IS (i) is a potential clear image, representing the environments can also increase the risk of motion blur original unaffected image. It is an ideal image that can [9]. The fuzzy model of TVI can be represented by be generated through deblurring and restoration, equation (1). thereby helping to establish accurate feature I I = K  I +N representations in image retrieval. S '(i) is a clear B S (1) I image observed. The calculation of blurred image B I In equation (1), B is the blurred image, which is the is shown in equation (3). object that needs to be restored or studied during the I analysis process. S is the original image, which Random noise Generator Synthetic data Real sample or Discriminator Forgery of samples Real data Figure 1: GAN structure. 4 Informatica 49 (2025) 1–18 H. Cao 1 M IB = g(  IS (t ) ) (Z) M the expected clear image. represents possible t=1 (3) fuzzy kernels. Due to the excellent performance of In equation (3), t is the time in the video image. M GAN, this study applies it to denoising sports images. is the number of clear frames used to generate blurry Figure 1 shows the GAN structure. images. In image retrieval, the quantity of M affects In Figure 1, the task of the GAN generator is to receive the quality of blurry images. Collecting multiple clear blurry images as input and attempt to generate outputs images can help generate more stable and rich image corresponding to real clear images. The generator features. This study calculates the actual blurred image gradually adjusts its parameters by learning the using equation (4). mapping relationship between blurry and clear images 1 T IB = g(  I to achieve the goal of generating high-quality clear S (t )dt) T t=0 (4) images. At the same time, the discriminator receives the image and outputs a probability value representing the In equation (4), T is the exposure time period, which probability that the input image is "real". The indicates the reception time of the light line in the discriminator strives to improve its judgment ability to captured image. In image retrieval, the corresponding accurately identify the differences between the exposure time can affect the brightness and details of generator's output and the real image [13-15]. The the image. This study adopts a non-blind deblurring objective function for the confrontation between the method for blurry images. This method can obtain generator and discriminator is shown in equation (6) information about the blur kernel used in the restoration . process of blurred images. The blur kernels may be the minG maxD V (D,G) = Ex~P [log D(x)]+ Ez~P [log(1−D(G(z)))] data ( x ) z ( x ) (6) result of a number of factors, including camera motion, object motion, inaccurate focusing, and other causes P In equation (6), x is a real sample from data(x) used [11-12]. This study assumes that the noisy and original images are Y and X , and the blur kernel stands for to train the discriminator and help it learn how to Z . The non-blind deblurring process is given by equation (5). E recognize the characteristics of clear images. x~ Pdata ( x ) is the expectation for Xˆ , Zˆ 2 = arg min Z  X −Y +(X ) + (Z ) 2 (5) (X ) In equation (5), is the regularization term for X X Identity ReLU Weight layer ReLU X F(X) ReLU Identity Weight layer X F(X)+X ReLU F(X1+2) F(X1) F(X1+l) (a) Traditional residual block (b) Structure diagram of structure diagram improved residual block Figure 2: Traditional residual block structure. clear input images. In the field of image retrieval, the calculating expectations, a more balanced approach to concept of "expectation" plays a pivotal role in the generation and discrimination can be achieved, development of effective models. By effectively ultimately enhancing the clarity and quality of both 3*3Conv Dropout 3*3Conv 3*3Conv Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 5 D() generated and blurred images. is the output of blurred image through multiple convolutional layers. discriminator D . The discriminator continuously For TVI, important features include the trajectory of the optimizes its classification ability by comparing real ball and the movements of the athletes. Meanwhile, samples with generated samples. In image retrieval, the with the help of residual connections, the network can performance of the discriminator affects the quality of converge to the optimal solution faster. However, the generated images by the generator, which further traditional residual blocks typically contain fewer G() affects the accuracy of the retrieval results. is the convolutional layers or simpler structures, which may output of generator G . When the image retrieval limit the model's capacity to learn complex features and system faces a fuzzy query, the generator can transform details. In the face of more complex models, the lack of the fuzzy image into a clear image. Wasserstein GAN effective regularization may lead to overfitting. (WGAN) is a variant of GAN. In image deblurring Therefore, this study improves the traditional RBS, as tasks, gradient vanishing is a common problem due to shown in Figure 2 (b). The improvement of residual image degradation and other reasons, which affects the blocks in this study mainly includes increasing the convergence of the model. This study uses Wasserstein depth of convolutional layers, introducing multiple distance to quantify the difference between the activation functions, applying Dropout, implementing generator and the real data distribution, which can skip connection modules, and removing batch better handle the problem of gradient vanishing and normalization. The improved residual block consists of help the generator learn the data distribution better [16]. three convolutional layers, each using a 3x3 To improve the deblurring effect of images, this study convolution kernel. This design enhances the expressive improves the structure of traditional residual blocks. power of The Residual Bock Structure (RBS) is displayed in Figure 2. In Figure 2 (a), RBS extracts feature from the input max max x2 x2 min min fitness fitness x1 x1 (a) time 1 (a) time 2 Figure 3: Schematic diagram of PSO particle motion. the model, enabling it to capture more complex feature the basis for training network models. Generating representations. Introducing two ReLU activation samples is the key to successful high-quality image functions between two convolutional layers can accelerate convergence and help the model learn nonlinear features. The final skip connection module is retrieval. (Pdata , Pg ) is the set of joint distributions retained to alleviate gradient vanishing and explosion problems, ensure the flow of important information, and P P of data and g . By comparing the joint distribution maintain the training stability of the model. Removing the batch normalization layer makes the model more of real samples and generated samples, the generator flexible during small batch training and reduces can more effectively capture the features of real data computational burden. The loss function is shown in when generating images, ensuring the similarity equation (7). between generated samples and real samples in the W (Pdata , Pg ) = inf E(x, y)~  x − y   ~ (Pdata ,Pg ) (7) (x, y) ~  feature space. is one of the samples, which In equation (7), x y and are real samples and generated samples. In image retrieval, real samples are supports adversarial training between the generator and 6 Informatica 49 (2025) 1–18 H. Cao discriminator, improving the model's generalization of the feature map. ability and optimization performance. inf is the expected distance. The loss function of the model 2.2 Feature extraction and classification of generator is equation (8). video images based on PSO-SVM algorithm 2 The above study extracts and recognizes key features of W H 1 i , j i , j  i , j (I S )  x, y Lx =  video images by improving the residual network   W H  ( B ) i , j i , j x=1 y=1 −i , j GG (I )  x, y  structure. Due to the complexity of dynamic scenes in (8) tennis, video images often contain a large amount of information, resulting in the problem of information  In equation (8), i , j is a feature map that can capture redundancy between features, which has a negative impact on feature important semantic information and high-level features W in the image. i, j H and i , j are the width and height Intra population Inter population communication communication Subpopulation 1 gbest1 Subpopulation 2 gbest1 sbest Subpopulation N gbestN Figure 4: MPSO structure diagram. extraction. To improve the effectiveness of feature each particle's current position is calculated to evaluate extraction, this study introduces PSO-SVM [17-18] into the quality of that position. The particle speed update is the model. In the PSO-SVM algorithm, PSO is shown in equation (9) [20]. responsible for feature extraction in tennis, while SVM is responsible for feature recognition and classification. V t+1 =V t i 1 and() (pbest X t i + c r  i − i )+ c2 rand()(gbest − X t i ) (9) In PSO, individuals update their position and velocity to find the optima to the problem. The concept of employing PSO for the purpose of extracting features In equation (9),  is the inertia weight, which is from motion images entails the treatment of each frame within the image sequence as a particle. Through the c usually a non-negative value. 1 c and 2 are optimization of the particle's motion trajectory, it is possible to selectively capture salient features of object acceleration factors. The position of the i -th particle is motion [19]. The PSO motion is shown in Figure 3. Figure 3 shows the motion of PSO particles. In the X represented as vector i rand() . represents a initial stage of the PSO, a set of particles is generated at random, each representing a possible solution value. pbest random number between [0, 1]. i The motion of each particle mainly updates its current is the optimal position and velocity. After each update, the fitness of ... ... Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 7 position of particle i . The best position of the X t+1 = X t +V t+1 i i i (10) gbest population is represented by . When the  The above process indicates that the PSO owns the characteristics of simplicity, ease of implementation, value is large, the global Search Ability (SA) is strong and strong universality. However, due to the fact that and the local SA is weak. When the  value is low, particles in PSO belong to the same population, it is the local SA is strong and the global SA is weak. The easy to form local optimal positions, resulting in the expression for particle position is shown in equation inability to obtain global optimal solutions and the problem of excessive dependence on (10). ndary Bou Hyperplane Boundary Interval Figure 5: SVM optimal classification hyperplane. parameters. In response to the above issues, this study V t+1 =V t + c  rand  pbest t − X t i id 1 () ( id id )+ adopts a combination of multiple groups and adaptive c2  rand ()[(gbest t X t kd − id ) + (1−)(sbest t t d − X id )] adjustment of acceleration coefficients to optimize PSO and proposes the MPSO optimization algorithm [21-22]. (11) The MPSO algorithm can effectively improve the division of labor and cooperation among populations. In pbest t In equation (11),  is the inertia weight. id MPSO, a population contains multiple sub-populations, s e t t b s and id are the historical and global best position which in turn contain multiple particles. The MPSO pbest t of particle i and particle swarm. kd is the algorithm structure is shown in Figure 4. c optimal position of sub-population k so far. 1,c2 In Figure 4, each sub-population is a complete rand() are the acceleration factors. is a random communication and interaction system. All particles number between [0, 1].  is the classification within the population can communicate. During accuracy of the sample.  is the number of sub-populations. Due to the impact of inertia weight on algorithm iteration, it is required to discover the gbest the balance of local and global SAs, the value of inertia optimum for each subgroup, and then find the weight is crucial in PSO algorithm. However, the optimal solution sbest for the entire particle swarm, control effect of fixed inertia weights on global and sbest =max(gbest , gbest which is 1 2 ,..., gbestN ) . N is local search capabilities is limited. Accordingly, this the number of sub-populations. Therefore, in MPSO, study employs a linear differential descent method to the velocity update formula for particles is shown in dynamically adjust the inertia weight, thereby enhancing the algorithm's overall SA in the initial stage equation (11). of iteration and enhancing its local SA in the subsequent phase of iteration. The dynamically adjusted inertia weight is shown in equation (12). 8 Informatica 49 (2025) 1–18 H. Cao ( − ) t 2 (x , y ) x y (t) =  − max min size of i i . i is the input vector. i is the max t 2 max (12) corresponding output target. The SVM model initially employs a high-dimensional mapping feature space, t In equation (12), t is the current iteration count. max which facilitates the identification of a superior hyperplane for the separation of diverse categories of  is the maximum iterations. max is the initial inertia data. Subsequently, it utilizes linear functions within the feature space for function approximation. According to  weight, which is set to 0.9 in the study. min is the statistical theory, the SVM minimum optimization inertia weight at the maximum iteration count, set to 0.4. objective function yields a fitted regression function as The improved particle velocity update formula is shown shown in equation (14) [28]. in equation (13). 1 2 n min(W ,b) : W +C yi −[W ,(X ) −b] 2 i=1 V t+1 (t) V t id c1 rand () ( pbest t t i =  +   id − X id )+ (14) c2  rand ()[(gbest t t ) (1 )( t t kd − X id + − sbestd − X id )] In equation (14), W is vector data. b is the function (13) y threshold. is the function value after dot product (t) (X ) In equation (13), is the dynamically adjusted processing. is an approximation function. C inertia weight. The above is a feature model is the penalty coefficient for training model complexity construction built on the improved PSO. The MPSO and controlling model loss. The classification of the first randomly divides the particle swarm into K model can be completed through equation (14). This sub-populations and initializes the sub-population model can effectively solve problems regardless of the particles randomly. Then, the individual extremum of sample size or whether the linear fitting conditions are the particles, the optimal value of each sub-population, met, and due to its global strategy, it will not be unable and the global optima of the entire PSO are selected. to obtain the optimal solution due to local optima. In the The next step is to determine the optimal value found in PSO-SVM model, particles dynamically adjust their the search. If the conditions are met, running is stopped. position and speed through interactions to efficiently Otherwise, the speed and location of the particles are capture features in video frames. Each particle continued to update, and the best value is selected to represents a potential solution whose position continue running the algorithm until the conditions are corresponds to the selection of key attributes in feature met. Finally, the optimal solution of the optimization extraction. The fitness value of the particle is evaluated problem is outputted. After extracting features from based on the classification accuracy feedback. video images as described above, this study uses SVM Additionally, the particle is able to learn which features for feature recognition and classification [23-25]. SVM are most relevant in tennis serve recognition by is an algorithm that maps low dimensional data to comparing with the historical optimal position and the high-dimensional data to minimize functional. The global optimal position. The specific particle model is defined as shown in Figure 5. parameters, including inertial weights and acceleration In Figure 5, SVM is a binary classification model. This factors, influence the exploration and development model is the nonlinear classifier with the largest interval capabilities of particles, which assist in balancing in the feature space. The learning strategy of SVM is to global search and local search in complex feature maximize the interval, which can be formalized as a spaces. Ultimately, this improves the accuracy and convex quadratic programming problem. This is recognition rate of feature extraction. equivalent to the minimization problem of a regularized hinge loss function [26-27]. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming. This study assumes a sample Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 9 learn from non-linear features. By removing batch 3 Results normalization, the study enhances the flexibility of the generator and reduces the introduction of noise. It is 3.1 Analysis of deblurring effect based on imperative to ensure that each algorithm is configured video image processing and PSO-SVM with identical initial settings to minimize variation. To verify the performance and effectiveness of the Furthermore, the number of iterations must be trained research algorithm, this study conducts comparative up to 3,000 times. The learning rate of GAN is set to experiments for analysis. The comparative methods 0.0002. The learning rate of PSO is set to 0.5, and the include GAN, Attention Mechanism-GAN (AGAN), number of PSO particles is 100. The initial value of and Multi-Scale Convolution (MSC) algorithm. In the inertia weight is 0.9 and dynamically adjusted to 0.4. improved GAN structure, the number of convolutional The acceleration factor is set to 2.0. Performance layers has been improved to enhance the extraction evaluation indicators include relative subjective and capability of intricate features. Each convolutional layer objective indicators. The relative subjective indicator is employs a 3×3 convolution kernel to facilitate the the subject's evaluation of image quality. Objective capture of image details with greater precision. In indicators are selected as PSNR and SSIM. The higher addition, ReLU and Leaky ReLU activation functions the PSNR value, the closer the SSIM value is to 1, are used interchangeably, aiming to avoid the "dead indicating neuron" phenomenon and improve the model's ability to 100 100 Proposed method AGAN Proposed method AGAN GAN GAN 90 MSC MSC 90 80 80 70 70 60 60 50 50 40 40 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Number of tests Number of tests (a) Ball games image (b) Athletic sports images Figure 6: Subjective evaluation score results of the model. 37 Proposed method AGAN 37 Proposed method AGAN 34 GAN MSC 34 GAN MSC 31 31 28 28 25 25 22 22 19 19 16 15 25 50 16 15 25 50 Noise standard deviation Noise standard deviation (a) Ball games image (b) Athletic sports images Figure 7: Comparison of average PSNR of different deflurring algorithms. Average PSNR (dB) Subject rating Average PSNR (dB) Subject rating 10 Informatica 49 (2025) 1–18 H. Ca o better image quality, that is, better deblurring effect. score of the deblurred image is 86.94. The average The experiment uses sports images from the GOPRO scores of AGAN, GAN, and MSC do not exceed 75. To dataset for testing. This dataset divides images into ball compare the PSNR and SSIM performance, this paper sports and track and field sports. In subjective adds noise with standard deviations of 15, 25, and 50 to evaluation, the model score results are shown in Figure the test original images, respectively, to generate test 6. images for testing the algorithm's deblurring ability. Figures 6 (a) and (b) show the subjective evaluation Following the implementation of a research adjustment, scores of subjects on the deblurring effect of images in the image quality has been markedly enhanced, as ball sports and track and field sports. In Figure 6 (a), the evidenced by elevated PSNR and SSIM values. This deblurred image under the research method scores the signifies that the generated image is more closely highest, with an average score of 81.16 out of 30 aligned with the authentic image in terms of structural experiments. In the evaluation of 50 subjects, the similarity and clarity. It is particularly well-suited for research method shows better deblurring effect on ball the processing of dynamic motion scenes. Figure 7 is a sports images. In Figure 6 (b), the subjective evaluation boxplot of the mean PSNR of various algorithms. obtained by the research method is better, and the mean 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 Proposed method AGAN Proposed method AGAN 0.5 GAN MSC 0.5 GAN MSC 0.4 15 25 50 0.4 15 25 50 Noise standard deviation Noise standard deviation (a) Ball games image (b) Athletic sports images Figure 8: Average SSIM of different deblurring algorithms. Figures 7 (a) and (b) show the average PSNR 3.2 Feature extraction based on video image performance of algorithms in image deblurring tests for processing and PSO-SVM model ball sports and track and field sports. The PSNR values To validate the proposed sports image feature extraction of the research methods are consistently the highest. algorithm, this study selects the difficult to solve single Compared with AGAN, MSC, and GAN, the research peak function Rosenbrock and the easy to trap method increases the average PSNR by 13.56%, algorithm in local optima multi-peak function Griebank 15.02%, and 18.29% in Figure 7 (a), and by 12.82%, as the standard test functions. To verify the 14.02%, and 22.72% in Figure 7 (b), respectively. This convergence performance of MPSO, this study indicates that the difference between the original and compares PSO, Ant Colony Optimization (ACO), and the deblurred images is smaller under the research Simulated Annealing (SA). In the experiment, all method, indicating that the deblurring effect is better. algorithms are set with the same common parameters, Figure 8 shows the mean SSIM boxplots of four namely population size and dimensionality. The algorithms. iterations are 3000. Each algorithm is independently run In the deblurring tests of ball sports and track and field 100 times on each test function, and statistical analysis sports images in Figures 8 (a) and (b), the SSIM values is conducted on the results of the 100 runs. Figure 9 of the research method are the highest, and the overall shows the convergence curve of the obtained algorithm. performance is more stable. Compared with AGAN, Figure 9 shows the convergence curves of the algorithm MSC, and GAN algorithms, in Figure 8 (a), the average on Griebank and Rosenbrock. MPSO achieves the SSIM of the research method increases by 8.33%, highest average accuracy in the shortest number of 12.24%, and 19.90%, while in Figure 8 (b), it increases iterations. In Figure 9 (a), at 850 iterations, MPSO by 12.34%, 19.38%, and 22.20%. Overall, the research approaches convergence with an average accuracy of method has the best deblurring effect, followed by 97.31%. Compared to the other three algorithms, AGAN, MSC, and finally GAN. The experimental MPSO has reduced the number of iterations during results validate the effectiveness of this study. convergence by 35.33%, Average SSIM Average SSIM Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 11 99 95 PSO ACO 98 94 SA MPSO PSO ACO 97 SA MPSO 93 96 92 95 91 94 90 93 89 92 88 90 87 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 Number of iterations Number of iterations (a) Test results on Griewank (b) Test results on Rosenbrock Figure 9: Comparison of algorithm convergence curves. 1.0 1.0 0.8 0.8 0.6 0.6 0.4 ACO 0.4 ACO MPSO MPSO 0.2 PSO 0.2 PSO SA SA 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall Recall (a) Ball game images (b) Athletic sports images Figure 10: Comparison of algorithm PR curve. 40.52%, and 51.55%. In Figure 9 (b), after 1100 respectively. In Figure 10 (b), MPSO has the best iterations, MPSO approaches convergence with an feature extraction performance, while ACO and SA average accuracy of 92.94%. In contrast, the number of have similar performance. ACO has the relatively worst iterations during MPSO convergence decreases by feature extraction performance. Finally, in sports image 25.37%, 26.54%, and 41.89%. This verifies that the feature extraction, feature extraction time is also a iteration speed and accuracy of MPSO are superior to commonly used evaluation metric, which can be used to SA, ACO, and PSO. To validate the performance of determine the efficiency of feature extraction methods. various feature extraction algorithms, this study tests the Precision-Recall (PR) curves of different algorithms, 3.3 Behavior analysis based on video image as exhibited in Figure 10. Figure 10 shows the PR curves of various algorithms in processing and PSO-SVM Model the images of ball sports and track and field sports. In To verify the effectiveness of the research Figure 10 (a), the performance of each algorithm from best to worst is MPSO, SA, ACO, and PSO, Precision Average accuracy (%) Precision Average accuracy (%) 12 Informatica 49 (2025) 1–18 H. Ca o 100 100 90 90 80 80 70 70 60 60 50 Research method 50 Research method 40 LR 40 LR 30 MLP 30 MLP 20 ELM 20 ELM 10 10 0 0 0 50 100 150 200 250 0 50 100 150 200 250 Number of iterations Number of iterations (a) Prepare posture behavior analysis accuracy (b) Analysis accuracy of service swing action Figure 11: Analysis accuracy of preparation posture and serving and swinging behavior. 100 100 Research method 90 90 80 80 70 70 60 60 50 50 Research method 40 40 LR LR 30 30 MLP 20 MLP 20 ELM 10 ELM 10 0 0 0 50 100 150 200 250 0 50 100 150 200 250 Number of iterations Number of iterations (a) Batting action behavior analysis accuracy (b) With wave as behavior analysis accuracy Figure 12: Accuracy analysis of hitting action and swing action behavior. model in analyzing action behavior in video images, swing" behavior. The accuracy of the research this study collects match videos of tennis players with a algorithm reaches 90.4%, which is about 1% higher total length of 5.6 hours. This study categorizes the than MLP. This indicates that the comparative serving actions of tennis players in videos into five algorithm may have difficulty capturing complex behaviors, including preparation posture, serving swing, motion features due to the limitations of its linear model, hitting action, swing action, and recovery posture. This and may still be insufficient in capturing diverse and study sets the SVM related parameters, with a penalty delicate features. The research method adopts advanced coefficient of 1 and SVM error accuracy of 0.001. The feature extraction techniques, which can better identify experiment recognizes five types of actions and uses and classify a small number of posture changes. Figure Logistic Regression (LR), Extreme Learning Machine 12 shows the accuracy analysis results of hitting and (ELM), and Multi-Layer Perceptron (MLP) as swinging movements in tennis serving behavior. comparison methods. The analysis results of the In Figure 12 (a), the behavior analysis accuracy of the preparation posture and serve swing behavior are shown research algorithm reaches 82.8%, which is 33.6% in Figure 11. higher than LR, 20.8% higher than ELM, and 24.5% Figure 11 (a) shows the accuracy results of the higher than MLP. In Figure 12 (b), the behavior "preparation posture" behavior analysis. The behavior analysis accuracy of all four methods reaches over 98%. analysis accuracy of the research algorithm has reached Therefore, 97.2%, which is 5.84%, 7.62%, 8.04%, and 8.16% higher than LR, ELM, and MLP methods. Figure 11 (b) shows the accuracy results of the analysis of the "serve Accuracy rate /% Accuracy rate /% Accuracy rate /% Accuracy rate /% Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 13 100 90 Research method 80 70 60 50 40 Logistic regression 30 20 Multilayer perceptron 10 Extreme learning machine 0 0 50 100 150 200 250 Number of iterations Figure 13: Accuracy of posture recovery behavior analysis. Ready Ready 0.97 0.03 0 0.01 0 0.94 0.05 0 0.02 0 position position Service 0.11 0.87 0.01 0 0.02 Service 0.10 0.86 0.01 0 0.04 swing swing Stroke Stroke 0 0 0.82 0 0.18 0.03 0.32 0.57 0 0.09 action action Swing Swing 0 0 0.01 0.99 0 0.02 0 0 0.98 0 action action Postural Postural 0 0 0.16 0 0.84 0.01 0.20 0.07 0.71 recovery recovery Ready Service Stroke Swing Postural Ready Service Stroke Swing Postural position swing action action recovery position swing action action recovery (a) Research method (b) Multilayer perceptron Figure 14: Confusion matrix results of different algorithms. the research algorithm significantly improves the is some overlap and confusion in the recognition recognition ability of "hitting action", possibly due to between these two actions. This phenomenon may be its sensitivity to action details and dynamic changes. attributed to the fact that the player's posture during the Figure 13 shows the accuracy results of posture act of serving is analogous to that observed during the recovery analysis in tennis serving behavior. subsequent recovery phase. This results in an In Figure 13, the analysis accuracy of the research insufficient degree of feature extraction, which in turn algorithm for "posture recovery" behavior reaches hinders the ability to distinguish between the two 84.7%, LR is 58.2%, ELM is 62.3%, and MLP is 70.3%. movements. To reduce misclassification, it would be The research algorithm has obvious advantages in beneficial to consider optimizing the classifier threshold behavior analysis, indicating that it has better feature in SVM to adjust the decision boundary, thereby extraction and action classification performance. This improving the ability to distinguish between these study proposes methods to analyze the recognition actions. In Figure 14 (b), the MLP method has lower performance of MLP, as shown in Figure 14. recognition performance than the research method in Figure 14 shows the confusion matrix results of the dynamic actions. Although its dynamic actions have research method and MLP. In Figure 14 (a), the overall significant interference, the recognition accuracy of the recognition accuracy of the research method reaches research method is over 80%. This indicates that the 91.24%, and in the dynamic classification effect, the research model owns good anti-interference ability and main manifestation is mutual interference. There is an high recognition accuracy in action recognition. This 18% probability that the ''hitting action'' will be study compares the computational efficiency of misidentified as ''posture recovery''. There is a 16% different models. The details are listed in Table 2, using chance that the "posture recovery" action will be model runtime, GPU usage, and memory usage as misidentified as a "hitting action". This shows that there evaluation metrics. Accuracy rate /% 14 Informatica 49 (2025) 1–18 H. Ca o Table 2: Comparison results of model calculation efficiency. Algorithm Run time (s) GPU Usage (%) Memory usage (MB) Research method 45.2 75.3 512 AGAN 55.6 80.1 600 GAN 62.8 82.4 650 MSC 50.4 78.5 580 LR 40.1 72.0 500 ELM 42.3 71.5 490 MLP 43.0 73.2 495 In Table 2, the research method shows superior and SSIM of the research method are 35.12 dB and 0.93, performance in terms of running time, significantly respectively, indicating a robust defuzziness effect and reducing computation time compared to other the capacity to preserve structural details. In comparison, algorithms. Its GPU usage rate is 75.3%, which is the corresponding indexes of RNN and CNN-LSTM are relatively lower compared to algorithms such as 32.5 dB and 0.87, and 33.8 dB and 0.89, respectively. It AGAN and WGAN, indicating that the research model shows that the latter is relatively weak in image quality. is more efficient in utilizing computing resources. In In addition, the research method has a relatively low terms of memory usage, the research method has also runtime and memory usage (45.2s and 512MB), showing shown good optimization ability, maintaining a usage better computational efficiency. Overall, these results of 512MB, which is reduced compared to other validate the advances in accuracy and efficiency of the methods. To further reduce processing requirements, research method, indicating its application potential in this study can implement effective memory complex dynamic scenarios. management strategies to optimize the allocation of 4 Discussion computing resources, thereby reducing memory consumption and improving computing speed. For When deblurring TVI, the average subjective score of the example, small batch processing and dynamic memory research method in the GOPRO dataset was 81.16 points, allocation. These strategies not only improve the and the PSNR value increased by 13.56%, 15.02%, and operational efficiency of the model but also ensure that 18.29% compared to AGAN, MSC, and traditional GAN. good performance can be maintained when processing The potential benefit of studying the model lies in large-scale data. The research method is further optimizing the adversarial learning mechanism between compared with the Recurrent Neural Network (RNN) the generator and discriminator. This could result in the and Convolutional Neural Network Ensemble Long generator paying greater attention to image details and Short-term Memory Network (CNN-LSTM) models. effectively reducing blurring phenomena. By introducing The results are shown in Table 3. multiple convolutional layers and activation functions, the The results in Table 3 show that the research method model's ability to learn complex features was enhanced, outperforms RNN and CNN-LSTM in several thereby improving the clarity and realism of generated performance indexes. Specifically, the accuracy of the images. This study demonstrated significant advantages research method is as high as 91.24%, which is in feature extraction using PSO-SVM. The accuracy of significantly higher than the 85.1% of RNN and 87.3% the MPSO algorithm reached 97.31%, and its of CNN-LSTM. In terms of image quality, the PSNR convergence speed improved by 35.33%, 40.52%, and 51.55 Table 3: Comparative results of advanced nature Accuracy PSNR SSI Run time GPU Usage Memory Usage Method (%) (dB) M (s) (%) (MB) Research 91.24 35.12 0.93 45.2 75.3 512 method RNN 85.1 32.5 0.87 55.0 78.2 520 CNN-LSTM 87.3 33.8 0.89 50.5 76.5 510 Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 15 % Compared to traditional PSO, ACO, and SA. The advantage of the research method lied in the 5 Conclusion introduction of multiple population PSO strategies, With the rapid growth of video technology, the which enable different sub-populations to share processing and behavior analysis of TVI have become information and optimal solutions. By using linear increasingly important. To address the challenges of differential descent, the inertia weight of particles was video image blurring and feature extraction, this paper dynamically adjusted, which could enhance the global constructed a comprehensive method built on SA in the early stage and local SA in the later stage, improved GAN and PSO-SVM algorithms. For fuzzy thus improving the stability and efficiency of the processing, the improved GAN significantly improved feature extraction process. In the action recognition the ability to recover details through the introduction experiment, the recognition accuracy of the research of multiple convolutional structures and various method reached 91.24%, and in feature recognition activation functions. In terms of feature extraction, a such as hitting action and posture recovery, it was combination of PSO-SVM algorithm was adopted, higher than classical algorithms such as LR, ELM, and which integrated the advantages of PSO and SVM to MLP. The recognition accuracy for the "preparation further enhance the efficiency and accuracy of feature posture" behavior reached 97.2%, while the extraction. The experimental test results demonstrated recognition accuracy for the "hitting action" was as that the research method could more accurately high as 82.8%. By improving the residual network capture important features in motion images and structure, the model could effectively extract deep effectively reduce blurring phenomena. Moreover, the dynamic features from action sequences, enhancing its model could maintain high accuracy even in fast ability to capture complex motion trajectories. This dynamic scenarios. This indicates that the research enabled the action classification model to more model has strong anti-interference ability, which is accurately identify various behaviors of different helpful for accurate behavior analysis in complex athletes. In similar studies, Fréjus et al. proposed a environments. Although this study has achieved behavior recognition model based on neural networks, significant success in improving the quality of video which showed good recognition accuracy in posture image processing, there are still some shortcomings. recognition [29]. Luo Z et al. proposed a behavior Under complex background interference, the recognition model grounded on multi-layer LSTM, recognition performance of the model may be limited, which demonstrated good performance in specific and improving the algorithm's performance in applications [30]. However, the above research cannot complex background scenes is still a problem to be effectively capture small differences between actions solved. Therefore, future research directions can focus when dealing with complex dynamic sequences, on combining deep learning and reinforcement which can easily lead to misidentification in similar learning methods to enhance processing capabilities actions. Compared with it, the research method can for complex dynamic scenes. comprehensively handle complex dynamic scenes and has high flexibility and robustness, providing a new References approach for motion behavior analysis. In complex dynamic scenes, the recognition [1] Deqiang Cheng, Jiansheng Qian, Xingge Guo, challenges of moving images are often caused by Qiqi Kou, Feixiang Xu, Jun Gu, Yachao Gao, motion blur, background interference, and and Jinsheng Zhao. Review on key technologies illumination changes. The proposed model shows of AI recognition for videos in coal mine. Coal good robustness in various complex situations, mainly science and technology, 51(2):349-365, 2023. due to the design of adaptive feature extraction and https://doi.org/10.13199/j.cnki.cst.2022-0359 adversarial learning mechanism. The introduction of [2] Chandravva Hebbi, and H. R. Mamatha. PSO algorithm makes the feature extraction process Comprehensive dataset building and recognition more flexible and adaptive. When dealing with of isolated handwritten kannada characters using dynamic scenes, the particle can dynamically adjust its machine learning models. Artificial intelligence position to capture the key movement trajectory of the and applications, 1(3):179-190, 2023. player and the ball, which improves the relevance and https://doi.org/10.47852/bonviewAIA3202624 usability of the feature. By optimizing the adversarial [3] Teymoor Ali, Deepayan Bhowmik, and Robert training process between generator and discriminator, Nicol. Domain-specific optimisations for image the enhanced GAN model is capable of not only processing on FPGAs. Journal of signal generating high-quality images but also of producing processing systems, 95(10):1167-1179, 2023. more robust mapping relationships in the feature https://doi.org/10.1007/s11265-023-01888-2 extraction process. This mechanism serves to mitigate [4] Yunzhong Hou, Zhongdao Wang, Shengjin the impact of noise and background changes on the Wang, and Liang Zheng. Adaptive affinity for results, thereby enhancing the overall robustness of the associations in multi-target multi-camera model. tracking. IEEE transactions on image processing, 16 Informatica 49 (2025) 1–18 H. Ca o 31(10):612-622, 2021. and Anthony Thomas Herdman. Generative https://doi.org/10.48550/arXiv.2112.07664 adversarial network (GAN) for simulating [5] Lintong Zhang, David Wisth, Marco Camurri, electroencephalography. Brain topography, and Maurice Fallon. Balancing the budget: 36(5):661-670, 2023. Feature selection and tracking for multi-camera https://doi.org/10.1007/s10548-023-00986-5 visual-inertial odometry. IEEE robotics and [14] Angelo Lorusso, Barbara Messina, and Domenico automation letters, 7(2):1182-1189, 2021. Santaniello. The use of generative adversarial https://doi.org/10.48550/arXiv.2109.05975 network as graphical support for historical urban [6] Qinglong Ding, and Zhenfeng Ding. Machine renovation. ICGG 2022 - proceedings of the 20th learning model for feature recognition of sports international conference on geometry and competition based on improved TLD algorithm. graphics, 146(1):738-748, 2022. Journal of intelligent & fuzzy systems, https://doi.org/10.1007/978-3-031-13588-0_64 40(2):2697-2708, 2021. [15] Deepa Kumari, S. K. Vyshnavi, Rupsa Dhar, B. S. https://doi.org/10.3233/JIFS-189312 A. S. Rajita, Subhrakanta Panda, and Jabez [7] Jiaxu Zhang, Gaoxiang Ye, Zhigang Tu, Yongtao Christopher. Smart GAN: A smart generative Qin, Qianqing Qin, Jinlu Zhang, and Jun Liu. A adversarial network for limited imbalanced spatial attentive and temporal dilated (SATD) dataset. The journal of supercomputing, GCN for skeleton‐based action recognition. 80(14):20640-20681, 2024. CAAI transactions on intelligence technology, https://doi.org/10.1007/s11227-024-06198-3 7(1):46-55, 2022. [16] Yaxiang Fan, Gongjian Wen, Fei Xiao, Shaohua https://doi.org/10.1049/cit2.12012 Qiu, and Deren Li. Detecting anomalies in videos [8] Xiaoguang Zhu, Ye Zhu, Haoyu Wang, Honglin using perception generative adversarial network. Wen, Yan Yan, and Peilin Liu. Skeleton Circuits, systems, and signal processing, sequence and RGB frame based multi-modality 41(2):994-1018, 2022. feature fusion network for action recognition. https://doi.org/10.1007/s00034-021-01820-8 ACM transactions on multimedia computing, [17] Canran Zhang, Jianping Dou, Shuai Wang, and communications, and applications (TOMM), Pingyuan Wang. Hybrid particle swarm 18(3):1-24, 2022. optimization algorithms for cost-oriented robotic https://doi.org/10.48550/arXiv.2202.11374 assembly line balancing problems. Robotic [9] Arti Ranjan, and M. Ravinder. intelligence and automation, 43(4):420-430, VAEWGAN-NCO in image deblurring 2023. framework using variational autoencoders and https://doi.org/10.1108/RIA-07-2022-0178. Wasserstein generative adversarial network. [18] Naveed ur Rehman, and Muhammad Uzair. Signal, image and video processing, Concentrator shape optimization using particle 18(5):4447-4456, 2024. swarm optimization for solar concentrating https://doi.org/10.1007/s11760-024-03085-5 photovoltaic applications. Renewable energy, [10] Chuang Li, and Zhizhong Mao. Generative 184(5):1043-1054, 2022. adversarial network-based real-time temperature https://doi.org/10.1016/j.renene.2021.12.015 prediction model for heating stage of electric arc [19] Yue Li, Jianfang Qi, Xiaoquan Chu, and Weisong furnace. Transactions of the institute of Mu. Customer segmentation using K-means measurement and control, 44(8):1669-1684, clustering and the hybrid particle swarm 2022. optimization algorithm. The computer journal, https://doi.org/10.1177/01423312211052213 66(4):941-962, 2022. [11] Yupeng Song, Xu Hong, Jiecheng Xiong, Jiaxu https://doi.org/10.1093/comjnl/bxab206 Shen, and Zekun Xu. Probabilistic modeling of [20] Vahid Goodarzimehr, Fereydoon Omidinasab, long-term joint wind and wave load conditions and Nasser Taghizadieh. Optimum design of via generative adversarial network. Stochastic space structures using hybrid particle swarm environmental research and risk assessment, optimization and genetic algorithm. World 37(2):2829-2847, 2023. journal of engineering, 20(3):591-608, 2023. https://doi.org/10.1007/s00477-023-02421-4 https://doi.org/10.1108/WJE-05-2021-0279 [12] Zhiwu Shang, Jie Zhang, Wanxiang Li, Shiqi [21] Fanyi Duanmu, Dian Ning Chia, and Eva Qian, Jingyu Liu, and Maosheng Gao. A novel Sorensen. A combined particle swarm small samples fault diagnosis method based on optimization and outer approximation the self-attention wasserstein generative optimization strategy for the optimal design of adversarial network. Neural processing letters, distillation systems. Computer aided chemical 55(5):6377-6407, 2023. engineering, 49(3):1315-1320, 2022. https://doi.org/10.1007/s11063-022-11143-7 https://doi.org/10.1016/B978-0-323-85159-6.502 [13] Priyanshu Mahey, Nima Toussi, Grace Purnomu, 19-0 Improved Generative Adversarial Network and Particle Swarm… Informatica 49 (2025) 1–18 17 [22] Yongjie Zhu, Jiajun Chen, Ling Mao, and Jinbin [30] Zhenmin Luo, Lidong Zhang, and Zeyang Song. Zhao. A noise-immune model identification Multistep prediction of CO in the extraction zone method for lithium-ion battery using two-swarm based on a fully connected long short-term cooperative particle swarm optimization memory network. Journal of tsinghua university algorithm based on adaptive dynamic sliding (science and technology). 64(6):940-952, 2024. window. International journal of energy research, https://doi.org/10.16511/j.cnki.qhdxxb.2024.22.0 46(3):3512-3528, 2022. 11 https://doi.org/10.1002/er.7401 [23] Min Gi, Shugo Suzuki, Masayuki Kanki, Masanao Yokohira, Tetsuya Tsukamoto, Masaki Fujioka, Arpamas Vachiraarunwong, Guiyu Qiu, Runjie Guo, and Hideki Wanibuchi. A novel support vector machine-based 1-day, single-dose prediction model of genotoxic hepatocarcinogenicity in rats. Archives of toxicology, 98(8):2711-2730, 2024. https://doi.org/10.1007/s00204-024-03755-w [24] Vamsi Alla, Upendra Kumar Sahoo, and Rabi Narayan Behera. Seismic liquefaction analysis of MCDM weighted SPT data using support vector machine classification. Iranian journal of science and technology, transactions of civil engineering, 48(4):2293-2303, 2024. https://doi.org/10.1007/s40996-023-01293-6 [25] Laith Abualigah, Saba Hussein Ahmed, Mohammad H. Almomani, Raed Abu Zitar, Anas Ratib Alsoud, Belal Abuhaija, Essam Said Hanandeh, Heming Jia, Diaa Salama Abd Elminaam, and Mohamed Abd Elaziz. Modified aquila optimizer feature selection approach and support vector machine classifier for intrusion detection system. Multimedia tools and applications, 83(21):59887-59913, 2024. https://doi.org/10.1007/s11042-023-17886-2 [26] Ning Chu, Weimin Kang, Xinhua Yao, and Jianzhong Fu. Online roundness prediction of grinding workpiece based on vibration signals and support vector machine. The international journal of advanced manufacturing technology, 126(5/6):2733-2743, 2023. https://doi.org/10.1007/s00170-023-11206-6 [27] Hossein Moosaei, Ahmad Mousavi, Milan Hladík, and Zheming Gao. Sparse L1-norm quadratic surface support vector machine with Universum data. Soft computing, 27(9):5567-5586, 2023. https://doi.org/10.1007/s00500-023-07860-3 [28] Jagadeesh Basavaiah, and Audre Arlene Anthony. A pragmatic approach for infant cry analysis using support vector machine and random forest classifiers. Wireless personal communications, 137(4):2269-2280, 2024. https://doi.org/10.1007/s11277-024-11491-8 [29] Fréjus A. A. Laleye, and Mikaël A. Mousse. Attention-based recurrent neural network for automatic behavior laying hen recognition. Multimedia tools and applications, 83(22):62443-62458, 2024. https://doi.org/10.1007/s11042-024-18241-9 18 Informatica 49 (2025) 1–18 H. Ca o https://doi.org/10.31449/inf.v49i12.7117 Informatica 49 (2025) 19–34 19 Optimizing Fuzzy Logic Control-Based Weather Forecasting through Optimal Antecedent Selection Using the Fuzzy Analytical Hierarchy Process Model Alaa Sahl Gaafar1*, Jasim Mohammed Dahr1 and Alaa Khalaf Hamoud2 1Directorate of Education in Basrah, Basrah, Iraq 2Department of Cybersecurity, University of Basrah, Basrah, Iraq E-mail: alaasy.2040@gmail.com, jmd20586@gmail.com, alaa.hamoud@uobasrah.edu.iq *Corresponding author Keywords: FAHP, FLC, antecedents, fuzzy, forecasting, weather parameters, error rates Received: September 9, 2024 The numerical weather forecasts rely largely on the amount of precipitation available, and the use of statistical and empirical methods, but fall short of higher accuracy and relatively short-time required. Recently, the fuzzy AHP (that combined AHP with fuzzy logic) for the purpose of arriving at better outcomes from fuzzy logic control (FLC) rules-list. While evolutionary computing and fuzzy logic techniques are known to guarantee better accuracy and reliability of the outcomes when applied to weather uncertainty problems. Though, the fuzzy logic approach has low accuracy, which needs to be improved with rules-list refinement. This paper pulls on these approaches to develop a weather forecasting model for cities. First of all, the outcomes of the FAHP model revealed that, Wind Direction (WND) and Relative Humidity (HUM) as contributing 30.01% and 19.97% influence to the decision- making process against air temperature, windspeed, WND, HUM, and air pressure identified earlier. Secondly, the select FAHP parameters served as antecedents for the FLC model, in which five fuzzy rules were included in rule-base. Upon validation with the standard and local datasets, the proposed model achieved lower error rates of 0.0010, 0.0317 and 0.0319 for MSE, RMSE and MAPE respectively when treated with the Kaggle standard dataset. By comparing the proposed FLC model outcomes to the unoptimized FLC model in term of error rates, MSE of 0.0010, RMSE of 0.0317, and MAPE of 0.0355 were achieved attained by former indicative of its superiority. Povzetek: Predstavljena je optimizacija napovedovanja vremena z uporabo mehke logike in izbire optimalnih predhodnikov s pomočjo analitične hierarhije. 1 Introduction research organizations globally. Weather and Radars facilities consistently curating live data whereas cloud Weather depicts the state of air over earth at given place patterns, temperature, and winds focus data are main and period. It is an unceasing, data-intensive, disorganized concern of special satellites. Consequently, there is the and dynamic technique. Forecasting is the procedure of endless stockpiles of meteorological data required for evaluation in indefinite circumstances from past data. investigations using AI approaches (machine learning When “weather” and “forecasting” are put together, (ML) algorithms), which could enhance the accuracy of “Weather forecasting”, is systematically and technically the forecasting especially at short-term weather estimation demanding issues across the globe in the past century. [2]. Weather forecasting is one field of traction for many Weather-station process cloud data using high- scholars and researchers, which seek to ascertain the performance approaches and algorithms in order to mine present state of atmosphere gets varied. Though, the tasks salient features to raise precision of the classification on of predicting forecasts are daunting due to their the basis of the inputs supplied. This is made possible in unpredictable and muddled nature. These have been recent times with a computationally proven deep learning applied to diverse scenarios including severe weather approaches [3]. Aside this, many weathers forecasting alerts and advisories for transportation, agricultural models have been combined to improve the accuracy of production and development and forest fire outcomes [4]. AI algorithms have been deployed for minimizations[1] . dealing with real-life tasks in the same way as natural Also, weather nowcasting is a short-leaved approach to schemes. Though, human intelligence is capable of forecasting of weather, which involves analysis and differentiating and adapting to fresh environments, AI estimation of weather on 6-hourly basis. Presently, the follows a procedural algorithm when conforming to the nowcasting hold special place during risk deterrence and certain situations. Fuzzy logic is an AI approach which crisis administration, even as severe weather happenings utilizes an approximate-reasoning instead of actual- are imminent. Several stacks of meteorological dataset reasoning style by incorporating some levels of ambiguity gather from satellite, radar, and other weather observatory as a form of reasoning procedure. sites, are used for diverse of analyses by meteorological 20 Informatica 49 (2025) 19–34 A.S. Gaafar The numerical weather estimations have been used in • To develop enhanced fuzzy logic control model enterprises, civil protection institutions, lifestyles of for weather forecasting. peoples globally; while reducing social and economic The lasting parts of this paper explained the following: indemnities. However, there is the need to evolve better second section is the related works. Section three is the and more accurate parametrization of physical processes discussion about research methodology. Fourth section is to raise the outcomes of estimates generated. There is the results and discussion. The conclusion is obtainable in still the problem inability of existing weather forecasting section 5. techniques to produce location precise, time efficient and intensity of weather-related events [5]. 2 Related works Previously, the main procedure for forecasting weather, that is the state of atmosphere over a particular place, This section demonstrates and discuss the literature in the involves the use of statistical and empirical methods by field of forecasting weather. Selim Furkan Tekin et al. [12] means of the principle of physics, but fall short of higher proposed a deep learning approach to predict the high- accuracy and relatively short-time required [1] . resolution weather based on observations and input data. Subsequently, the renew calls for machine learning and The prediction model works based on a spatio-temporal ensemble methods, which utilizes complex computerized approach, where it is composed of a convolutional neural mathematical models for desirable outcomes. network with an encoder-decoder structure, and The birth of AI, big data analytics and machine learning convolutional long-short term memory. The matcher techniques offered the opportunities for planners and mechanism is utilized to enhance the interpretability and policymakers to understand the implications of diverse performance of long-short term. The model is weather conditions as well as allocating resources in the experimented on a real-life, high-scale numerical dataset case of extreme weather-related systems disruptions. that holds the temperature, and pressure levels. The results Nonetheless, researchers and scholars are making efforts show that there is significant improvement when capturing to increase the accuracy and reliability of the modelling temporal and spatial correlations. Matthew Chantry et al. systems [6]. But, the accuracy of automated daily weather in [13] proposed models of emulators based on machine classification relies on both the applied classifiers and the learning that work as parameterization scheme training data [7]. accelerators for weather forecasting. The emulators are The concept of the multi-criteria decision-making trained to produce accurate and stable results of (MCDM) schemes undertake multichoice and multi- forecasting timescales. The accuracy of emulators is objective problems. In particular, there are three kinds of correlated with the complexity of the networks, while it solutions derivable using MCDM especially when it produces more accurate forecasts. With medium range concerns making choice from pool options having the best forecasting, they found that the proposed emulators alternatives. Also, it is possible to rank the order of several compared with the parameterization scheme are more alternatives in order of importance or preferences. More accurate. With CPU hardware, the proposed emulators are so, sorting and classifying decision alternatives within similar to existing scheme in computational cost, while acceptable order of groupings [8], fuzzy TOPSIS, they performed 10 times faster based on GPU. K. Bala VIKOR, and TODIM and [9] are common methods when Maheswari et al. [14] proposed a model to make long-term undertaking selection of alternative like bank websites and weather forecasts using a historical dataset. The model is electronic banking application’s quality. implemented based on support vector machine and FAHP is held in high esteem as valuable for complex decision tree algorithms to forecast different conditions decision-making tasks, which empowers the analysts to such as rainfall, floods, storms, humidity, and minimize uncertainty and vulnerability connected to the temperature. While Mohammad Sadman Tahsin et al. in process of preparing chiefs’ judgment not applicable in [15] proposed a daily weather forecasting model in an AHP approaches. The AHP proposed by Saaty was fine- urban area. 12 data mining models are implemented over turned or fuzzified in order to control and spot the 20 years of climate data patterns in Chittagong city. The vulnerability [10]. The key concept of the FAHP evaluation process of the model is implemented based on streamlines composite decision-making tasks across tiered different metrics, such as precision, recall, accuracy, F- structure made of criteria and sub-criteria in manner as a measure, receiver operating characteristics, and area under pairwise comparison to the criteria [11]. curve. The results show that J48 outperformed the other algorithms in accuracy. This paper develops an effective FAHP model-based The summary of related works according to author(s), antecedents’ selection for fuzzy logic control weather objectives(s), methodologies, outcomes and limitations forecasting system. The contributions include: are presented in Table 1. • To select the fuzzy logic antecedents through FAHP model. Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 21 Table 1: The related works summary S/N References Objective(s) Methodology Outcome(s) Limitation(s) It provides point predictions for Wind power forecasting based Deterministic and 1. [16] day-to-day Accuracy to be improved. on climatic conditions probabilistic models. operations of power systems. The weighted- Solar Photovoltaic system Machine learning KNN outperforms Energy efficiency and high error 2. [17] forecasts under several weather techniques. other ML rates. factors approaches. Ensemble of deep No relationship between normal Weather Nowcasting under The error is less 3. [18] learning techniques and adverse metrological Radar products’ values. than 4%. (NowDeepN). products’ values. Correctly classified users’ Weather impact on COVID-19 claims based on Classifier ineffectiveness on 4. [19] outbreak based on users’ twitter Machine learning. their tweets at 95% other languages. feeds. AUC-PR and AUC-ROC. Climate changes Step by step linear aggravates health Cyclonic weather regimes regression model, risks of people 5. [20] Large inaccuracies from datasets. impact on seasonal influenza. clinical and laboratory using regression tests. and root mean square difference. Determination of Weather-associated delays in ML modeling of weather severe and 6. [21] Accuracy could be improved. transport sector. events. disruptive weather events. AUC of 91% for Weather radar reflectivity Fraction Skill Score Reliability of forecasts to be 7. [22] the predictive towards flood events. (FSS). improved. model. Accurate CNN-based building Interpretability of satellite classification of Pre-and-post-disaster images 8. [23] damage image imagery. building damages modelling. classification. through images. Weather Internet of Everything: information are Internet-enabled approach for 9. [24] Smart weather reporting system. sensors for measuring effectively farmers. weather parameters. disseminated. It has Increased Weather forecasting with Machine learning based Neural network algorithms are 10. [25] speed and accuracy gravity wave drag emulation. on neural networks. less-effective. of models. It offered superior Spatio-temporal weather Spatio-temporal dataset was 11. [26] Convolutional LST Ms. MSE and forecasting. utilized. performance. Generalized Likelihood Ratio Test for It offers better To improve on local information Minimizing turbine clutter 12. [27] identifying signal prediction due to about precipitation and filtered based on weather radar data. subspace and gates overlap of datasets. radar IQ. impacted by WTC. Short-leaved magnetic field of Nowcasting of extreme space Magnetotelluric data of It used bivariate 13. [28] storm for spatial and temporal weather events. geomatic storms. approach for events. polarization of 22 Informatica 49 (2025) 19–34 A.S. Gaafar storm time electric fields. Classification of main synoptic It effectively Applicable to particle Particle formulation 14. [29] meteorological patterns of determines formulation and air quality analysis of air quality. atmosphere. weather scenarios. prediction. It improved the Machine learning with Weather data knowledge quality of High errors during simulation of 15. [30] rule base approach such mining. concomitant weather reports. as K-NN, ARIMA. factors prediction. Machine learning based KNN, Random NCDC weather data models including Forest and Overfitting and smaller datasets 16. [31] classification and predictive CART, AdaBoost, XGBoost had impart on performance. models. Decision Tree, and highest accuracy. XGBoost. CNN Deep Photovotaic (PV) solar power Particle swarm learning model Hyrid algorithms to be 17. [32] forecasting based on climatic optimization and genetic best for experimented. conditions algorithms. determining PV power. It generated highly Machine learning with Location and climate change Weather files for building accurate 18. [33] regression and events and applications not energy designs optimization. subsequent classification models. considered. weather files. It uses full-field Outcomes may be inaccurate and Numerical weather weather system to 19. [34] Weather forecasting misleading without full-field prediction. perform anomaly data. weather forecasts. It improves Hybrid ML model of outcomes of 20. [35] Rainfall forecasts. PSO and Feed Forward To increase accuracy. forecasts for Neural Network. rainfall. It predicts local weather events Automated weather data LSTM based neural To extend to more parameters of 21. [36] such as Tornado, processing. network model. soils forecasts. flood, severe storm, etc. Naïve Bayes had Classification tree, More data consisting of weather 22. [37] Weather prediction. best accuracy of KNN, Naïve Bayes. observational data over stations. 77.1%. Water surrogates Weather conditions-based water Bayesian Belief are determinants Higher accuracy required for 23. [38] quality prediction. Networks. for water quality safety of drinking water. prediction. KNN produced Naïve Bayes, C.45 and highest accuracy Input criteria and constraints are 24. [39] Weather forecasts. KNN. (71.59%) inconsistent. forecasts. SAID Selection Based on outperformed other Multi-class classification of Accuracy Intuition and To explore computer vision for 25. [40] algorithms in weather data. Diversity (SAID) of weather classification. classifying weather ensemble scheme. images. It improved results Solar irradiance forecast based 26. [41] Naïve Bayes classifier. and accuracy for The smaller training dataset. on weather variables. real-time weather. Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 23 It increased the Machine learning and To utilize classification and 27. [42] Weather forecasting. accuracy and speed ensemble methods. clustering approaches. of forecasting. It identified risks to Weather-based major power A two-level hybrid risk To apply to resilience of power 28. [43] be associated with outage forecasts. determination model. systems. different factors. SVM produced the Weather-based Solar PV power KNN, and SVM Expanding the models to K- 29. [44] best accuracy of forecasting. classifiers. Means, Random Forest, etc. forecasts. Weather data Optimization and integration of extrapolation for 30. [45] Rainfall forecasts. Data mining approaches. data-mining techniques for better determining accuracy. rainfall patterns. From Table 1, majority of the weather forecasting lithology. Also, six factors serve as flood vulnerability considered different weather parameters using machine zonation including: total population, land cover/ land use, learning and numerical prediction schemes. However, distance to hospital, density of population, road density, there are no focus on selection of influential factors and and distance to road. AUC score of 0.741 was obtained for their fuzziness as well as the effect of complexity of AHP approach. Using the multicollinearity analysis meteorological datasets during various forecasts tasks. To revealed highly corrected independent variables. Though, this end, the roles of the FAHP, AHP and Fuzzy Logic specificity of the forecasts can be performed using other techniques in the weather forecasting tasks and others techniques. were analyzed as follows: Fanxiao Meng et al. in [50] deployed remote sensing, GIS The concept of FLC identified for determining stock datasets to determine the groundwater recharge zones prices movements using the Nigeria Stock Exchange (GWRZ) in Pakistan. The hydrology and geology factors trading datasets for Dangote Cement PLC. Alfa et al. [46] influence on the GWRZ were investigated. In particular, proposed the rules-list’s antecedent optimization with the thematic maps was composed of the slope, rainfall, genetic algorithms procedure to improve the forecasts geology, drainage density, land cover/ land use, lineament, effectiveness. Following from that, they further optimized and types of soil. Authors utilized multi-influencing factor the rule-list’s consequent by means of the genetic and the AHP to assign weights to the factors. But, the use algorithm method in which the error rates diminished of advanced methods could improve the decision-making substantially. However, the studies did not cover effects of process and its accuracy. the dataset complexity on the effectiveness of the fuzzy Husam Musa Baalousha et al. in [51] evaluated the risk of logic control schemes. flooding based on FAHP and Fuzzy Logic in the Arid A grey fuzzy AHP-based flash flood vulnerability places of Qatar using the land cover, precipitation, soil evaluation in watershed region of Himalayan, China was type, flow accumulation, and elevation. The outcomes undertaken by [47]. Authors leveraged on geographical from both the Fuzzy logic and the FAHP demonstrated information system (GIS) and 12 natural and resemblances in the low-risk and differences in the high- anthropogenic parameters. The low, moderate and high risk zones. While the FAHP accounted for higher classes were assigned to the Flash Flood Vulnerability variability and more accurate than Fuzzy Logic method. Index in which the sensitivity test revealed LULC was Sinan Keskin et al. in [52]developed a fuzzy spatial online highly influential. However, there is the propensity of analytical processing (FSOLAP) framework to provide applying more effective methods like fuzzy logic control. predictive analytics of the complex data applications. The A GIS with multi-criteria decision making (MCDM) framework was validated with meteorological datasets method were adopted in determining landslide-prone from the Turkish Meteorological Office. When compared regions in highland of Southern Western Ghats by [48]. with traditional machine learning approaches, FSOLAP is Nine landslide influencing factors were considered in a more scalable and accurate for big meteorological ascertaining the thematic layers for the landslide databases’ fuzziness or uncertainty. susceptibility map. AUC scores of 79% and F1 scores of Susanta Mahato et al. in [53] combined FAHP and Fuzzy 85% were obtained from the standardized causative factor Logic techniques to determine the drought-based weights. More techniques can be applied to improve the vulnerability factors in Odisha, India. Six criteria of water performance of the FAHP. usage and demand, physical attributes, land use, Zhran et al. in [49] implemented the flood risk zonation in groundwater, and development/population, and 22 sub- Egypt’s Nile districts of Damietta using the IGS, remote criteria were chosen. The FAHP weighted the parameters sensing, and AHP. Twelve thematic layers of slope, through pair-wise comparisons matrix. The Fuzzy logic elevation, vegetation index, topographic wetness index, provided five classes of vulnerability: very high, high, water index, topographic positioning index, stream power moderate, low, and very low. During validation, statistical index, modified Fournier index, drainage density, evaluation parameters root means square error, accuracy, sediment transport index, distance to the river, and and mean absolute error were employed. 24 Informatica 49 (2025) 19–34 A.S. Gaafar Waseem Alam et al. in [54] introduced the FAHP e. DEFINE the values and linguistic of the first antecedent framework in assessing and ranking the criteria and and matching membership functions, that is, low, medium and high. weight factors of the behaviour of drivers in Peshawar, f. DEFINE the values and linguistic of the second Pakistan. Three most important risky driving features antecedent and matching membership functions, include: errors, violations, and lapses. The driver attention that is, low, medium and high. and clear road signage were top influential factors in g. DEFINE the values and linguistic of the consequent and matching membership functions, that is, low, medium raising risk perception of the drivers. Also, the ensemble and high. machine learning offered an accuracy of 0.84. Step 8. BUILD the fuzzy rules from the membership functions for Nonetheless, there are prospects of FLC in explaining the the Antecedents and Consequents for all the possible combinations. interconnection among various factors and driving Step 9. OPTIMIZE the fuzzy rules (the chromosomes) with genetic algorithm procedure to select the best rules for the FLC weather behaviours. forecasting system. The reviewed studies, in the second part, had drawn Step 10. APPLY the weather datasets to evaluate the FLC system. fascinating evidence about the weakness of the FLC and STOP. the complementary roles to be played by the FAHP in End OUTPUT: Normalized weights of criteria and FLC rule-base. dealing with multi-criteria decision-making and highly complex meteorological datasets mining in computational weather mining and analysis as undertaken in this paper. 3.2 Data collection and preprocessing This paper utilized both primary and secondary sources of 3 Research methodology datasets. Firstly, standard historical metrological data was collected from the Antarctic Automatic weather facilities The paper utilizes the FAHP in filtering the most (AntAWS) influencing factors for determining weather conditions of Dataset: https://amrdcdata.ssec.wisc.edu/dataset/antaws- places whose datasets are traditionally composed of the dataset) is 3-hourly, daily and monthly under strict quality complex meteorological parameters [52]. To improve the control. Five parameters were measured by 267 AWSs accuracy of FLC, the most impacting factors were utilized from the period of 1980-2021 [57]. These include: air for the construction of the rules-base, which flaws pressure, air temperature relative humidity, wind speed, decision-making processes and forecasting tasks because and wind direction) by 267 AWSs collected between 1980 of redundancy of the rules-lists [51]. and 2021. The 25% and 75% thresholds were used to compute the products for daily and monthly quality- 3.1 Fuzzy analytical hierarchical process controlled readings. criteria selection Secondly, structured questionnaire was constructed to curate the required data for building effective antecedents The main steps for the FAHP model adoption in for fuzzy logic control-based weather forecasting model. determining the most relevant criteria for building fuzzy The lists of weather criteria including: TMP, PRS, WNS, logic control antecedents are analogous to the methods WND, HUM, and VNS. The survey questionnaire was undertaken by [55], [56]. created using the identified weather criteria or parameters with associated nominal scale (1 – 9) of weather attributes Algorithm: FAHP criteria selection for the FLC rule-base. as described in Table 2 [58], [59]. INPUT: Comparison matrix Step 1 DEVELOP analytical hierarchy by utilizing a typical hierarchy plan based on distinct levels. a. The DETERMINATION of quantification for the Table 2: The adopted membership function and linguistic prospective fuzzy logic control antecedents. scale. b. ANALYZE prospective FLC antecedents. c. GENERATE pairwise comparison matrix based on Triangular Triangular AHP scale Fuzzy reciprocal Linguistic scale fuzzy d. TRANSFORM into a fuzzy triangular (FT) scale. scale fuzzy numbers numbers Step 2. DEVELOP a pairwise fuzzy comparison vector (PCV) Extreme with selected weather parameters or criteria. The crisp numeric 9 9, 9, 9 1/9, 1/9, 1/9 importance values create PCV as the evaluation method being a single numeric value for categorizing FLC antecedents. Very, very 8 7, 8 ,9 1/9, 1/8, 1/7 strong Step 3. COMPUTE fuzzy geometric mean from the lower, median, Very strong or and upper fuzzy geometric means. 7 demonstrated 6, 7, 8 1/8, 1/7, 1/6 importance Step 4. COMPUTATE fuzzy AHP weight using the lower, 6 Strong plus 5, 6, 7 1/7, 1/6, 1/5 median and upper fuzzy weights accordingly. Strong 5 4, 5, 6 1/6, 1/5, 1/4 Step 5. GENERATE normalized weights of the parameters. importance 4 Moderate plus 3, 4, 5 1/5, 1/4, 1/3 Step 6. SELECT top-two weighted parameters to serve as the Moderate 3 2, 3, 4 1/4, 1/3, 1/2 antecedents for FLC-based weather forecasting system. importance Step 7. CONSTRUCT the triangular fuzzy numbers. 2 Weak or slight 1, 2, 3 1/3, 1/2, 1 Equal 1 1, 1, 1 1, 1, 1 importance Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 25 experts recruited for the survey. The three participants’ responses in crisp numerical values and computed consistency index (CI) are shown in Tables 3, 4, and 5. Table 3: The first respondent responses on two topmost 3.3 Materials for experimentation weather parameters. TMP PRS WNS WND HUM VMS CR The weather forecasting model was validated on MATLAB R2019b discrete simulator on Laptop Personal TMP 1 1 1/3 1 1 1/3 0.0905 Computer system. The minimum specifications of the PRS 1 1 1/2 1/3 1/3 1/4 computational resources include: Hardware: WNS 3 2 1 1/4 1/5 1/4 AMD E1-1200 APU Processor with RadeomTM Graphics WND 1 3 4 1 4 5 1.40 GHz, 4.00 GB RAM, 64-bit Operating System, x64- HUM 1 3 5 1/4 1 4 based processor. Software: VMS 3 4 4 1/5 1/4 1 Windows 10 Single Language 2012, 3.5 Windows Experience Index. From Table 3, the responses offered by the showed Genetic algorithm procedure parameters: preferences for the first item in the pair, which showed that Crossover probability: 0.8, Population selection method: highest score of 5 for HUM against WNS, and WNS Elitism, Offspring Rank and Mutation, Original against VMS. The lowest score of 1/5 was awarded to chromosomes: 18, Iteration: 5, Crossover type: Uniform WNS against HUM, and WNS against VMS. The crossover, Maximum population: 30, Mutation computed CR of 0.0905 < 0.1 threshold for acceptance of probability: 0.09. the responses of first respondent as reliable for further processing of the research questions. 3.4 Evaluation parameters The effectiveness of the proposed weather forecasting Table 4: The second respondent responses on two model after applying the similar test and target datasets is topmost weather parameters. computed by means of the mean square error (MSE), root TM PR WN WN HU VM mean square error (RMSE), and mean absolute percentage CR P S S D M S error (MAPE) metrics given by Equations 1, 2 and 3: 1.186 1 𝑥 2 TMP 1 1/8 1/8 1/6 1/5 1/7 𝑀𝑆𝐸 = ∑ (𝐴 ?̂? 1 𝑥 𝑔 − 𝑖𝑔) 7 𝑔=1 PRS 9 1 1/6 8 1/7 1/5 1 𝑥 2 𝑅𝑀𝑆𝐸 = √ ∑ (𝐴 ) 2 𝑥 𝑔 − ?̂?𝑔 WNS 8 6 1 6 5 7 𝑔=1 WN 6 1/8 1/6 1 3 8 D 1 𝑥 𝐴𝑔 − ?̂?𝑔 𝑀𝐴𝑃𝐸 = √ ∑ | | × 100% 3 HU 𝑥 𝑔=1 ?̂?𝑔 5 7 1/5 1/3 1 8 M where, VMS 7 5 1/7 1/8 1/8 1 𝐴𝑖 is the target of actual value of the output sample, ?̂?𝑖 is the predicted value of the output sample, In Table 4, the responses collected for the second g is the index term starting at 1 to x of test dataset, and respondent indicated the preferences for the both items in x is the size of the test dataset. the pair. In case of the responses of first item against 4 Results and discussion second item, highest score of 9 was awarded to PRS over TMP, and the lowest score of 1/8 was awarded to TMP This section presents the weather forecasting outcomes against PRS, WND against PRS, VMS against WND, and after selecting antecedents with FAHP model. The VMS against HUM. In the case of the second item against conditions forecasts of cities were determined with first item, the highest score of 8 was preferred VMS optimized FLC model. against WND, and VMS against HUM. The lowest score of 1/8 was preferred to WNS against VMS, and HUM 4.1 FAHP Model-based criteria selection against VMS. The computed CR of 1.1867 > 0.1 from survey outcomes threshold, the responses of second respondent were The research question, what is are two topmost parameters rejected as unreliable for further processing of the research influencing weather conditions? was posed to the three question. 26 Informatica 49 (2025) 19–34 A.S. Gaafar highest score of 9 was preferred on HUM against TMP, and the lowest score of 1/7 was given to VMS against WNS, WND against PRS, VMS against WNS. In the case of the second item against first item, the highest score of 7 was preferred to HUM against WND, and VMS against WNS. The lowest score of 1/8 was given to VMS against Table 5: The third respondent responses on two topmost PRS, and VNS against WNS. The calculated CR value of weather parameters. 1.3432 > 0.1 threshold, the responses of third respondent TM PR WN WN HU VM CR were rejected as unreliable and removed from further P S D M S processing of the research question. Considering the initial analysis of collected responses of TMP 1 1/7 1/7 1/8 1/9 1/6 1.343 2 respondents contained in Tables 3, 4, and 5, the computed CR values for first, second and third respondents are PRS 7 1 1/7 6 1/7 1/8 0.0905, 1.1867, and 1.3432 respectively, which are greater than 0.1 threshold for the consistency and reliability of the WNS 7 7 1 5 6 7 participant’s responses except for first respondent. This WN 8 1/6 1/5 1 7 6 implies that, only the first respondent’s responses were D accepted on the basis of CR value for further investigation of the subject. HU 9 7 1/6 1/7 1 6 Similarly, the fuzzy numbers format PCM corresponding M to crisp numerical values of the first participant’s VMS 6 8 1/7 1/6 1/6 1 responses (refer to Table 3) are shown in Table 6. Each crisp number for every response in the Table 3 is substituted with matching fuzzy numbers and inverse In Table 5, the responses collected for the third respondent fuzzy numbers in Table 6. point to the preferences in the both items in the pair. In case of the responses of first item against second item, Table 6: The Fuzzy numbers for first participant responses on two weather parameters. TMP PRS WNS WND HUM VMS TMP (1,1,1) (1,1,1) (1/4, 1/3, 1/2) (1,1,1) (1,1,1) (1/4, 1/3, 1/2) PRS (1,1,1) (1,1,1) (1/3, 1/2, 1) (1/4, 1/3,1/2) (1/4, 1/3, 1/2) (1/5, 1/4, 1/3) WNS (2, 3,4) (1,2, 3) (1,1,1) (1/5, 1/4, 1/3) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) WND (1,1,1) (2, 3, 4) (1/5, 1/4, 1/3) (1,1,1) (3, 4, 5) (4, 5, 6) HUM (1,1,1) (2, 3, 4) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) (1,1,1) (3, 4, 5) VMS (2, 3, 4) (3, 4,5) (1/5, 1/4, 1/3) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) (1,1,1) Table 6 contains the computed outcomes of the Chang’s Normalized FAHP codes on MATLAB R2013b. The FAHP model Parameter Weight Rank Remarks Weight(%) computes the weights, normalized weights, and ranks Moderate TMP 0.0946 9.46 5 based on independent responses. The FAHP model uses importance Weak or slight the extended approach in determining the top two PRS 0.0741 7.41 6 importance parameters that are highly influencing weather forecasts WNS 0.1459 14.59 4 Strong plus and related decision-making tasks as illustrated in Table Extreme 7. WND 0.3003 30.03 1 importance Table 7: The weights and ranks of respondent’s HUM 0.1997 19.97 2 Very, very strong responses computed. Very strong VMS 0.1854 18.54 3 importance In Table 7, the five weather parameters received various contributions to the subject of weather forecasts and decision-making process. As shown, TMP paid 9.46%, PRS contributed 7.41%, WNS donated 14.59%, WND gave 30.03%, HUM explained 19.97%, and 18.54% was accounted by VMS. Interestingly, the two topmost parameters having extreme importance, and very, very Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 27 strong importance when determining weather conditions Antecedents Conditions Range of Values of regions in the study area include: WND and HUM at Indices 30.03% and 19.97% respectively. Wind direction (WND) High (3) [93.37 – 226.53] More so, graphical comparisons of the select parameters Medium (2) Low (1) preferred by the respondent using the FAHP Humidity (HUM) High (3) [70.24 – 84.10] computational weights are shown in Figure 1. Medium (2) Low (1) Consequents 0,35 Weather condition High (3) [84.10 – 226.53] (WEATHER) Medium (2) 0,3 Low (1) 0,25 0,2 0,15 From Table 8, the antecedents for the FLC include: wind 0,1 direction (WND), and Humidity (HUM). The range of 0,05 values are [93.37 – 226.53], and [70.24 – 84.10]. The consequent variable is weather condition (WEATHER) 0 under investigation whose range of values are derived TMP PRS WNS WND HUM VMS from minimum and maximum values of the antecedents, Weather parameter that is, [84.10 – 226.53]. The layout of the FLC-based weather forecasting model is composed of two inputs (FAHP select parameters/antecedents: HUM and WND), Figure 1: The contributions of the select parameters on and an output (consequent: weather condition) as shown weather conditions forecasts. in Figure 2. From Figure 1, the graphical display of the select weather parameters in forecasts and determination using FAHP model weights, which showed clearly the top leading parameters as WND and HUM at 0.3 and 0.2 weighting scale respectively. While the least contributing parameter to the weather forecasting tasks was PRS at 0.09 of the FAHP model’s weighting scale. The FAHP model derived two weather parameters of WND and HUM as important to the subject of weather forecasting; thereby included as antecedents for FLC system as explained in the next subsection. 4.2 Outcomes of the fuzzy logic control model Figure 2: The FLC-based weather forecasting system The fuzzy rules are generated according to data items and layout. type of datasets selected from the FAHP model’s outcomes. The two top parameters, WIND and HUM, The triangular membership function was adopted because were to serve as the antecedents for the inference engine of its popularity and effectiveness for modeling of the FLC. The FLC model developed to determine uncertainties and fuzziness during decision-making uncertainty problems and trends of weather in the city of processes. Three membership conditions were developed Austin, United States at more effective and reliability for both antecedents and consequent namely: Low, style. The antecedents and consequents with their Medium and High. While, the matching indices of respective conditions are given in Table 8. membership functions include: 1, 2, and 3. The membership functions, variables, and range of values for Table 8. Antecedents and consequent constraints for the all the input and output are specified in Table 8. These fuzzy engine. refined fuzzy rules-list is used in constructing the fuzzy inference engine by means of the logical AND, and IF- THEN statements as established by [60]. Therefore, the rule-list for the fuzzy inference engine of the proposed forecasting weather events is given in Table 9. Table 9: The optimized FLC rules-list indices after genetic algorithm refinement. Weight (%) 28 Informatica 49 (2025) 19–34 A.S. Gaafar Rule N Input 1 Input 2 Output against the local NIMET dataset of 0.1563. The same trend was observed for the RMSE error measure that put 1 3 1 2 the proposed model performance with the standard dataset 2 1 1 2 at 0.0317 over the NIMET dataset at 0.3953. When MAPE 3 3 3 3 evaluation parameter was considered, the proposed 4 3 1 2 weather model performed highly at 0.0319 for Kaggle dataset when compared to the NIMET dataset at 0.2104. 5 2 2 3 This shows the proposed weather forecasting model performed best with less complex and refined factors From Table 9, the refined fuzzy rule-lists are utilized for against highly complex meteorological local weather generating different mapping of the antecedents’ datasets as depicted by Figure 3. membership function indices to the membership functions of the consequent using the input and output weather parameters defined in Table 8. The rule-base generated for MAPE RMSE MSE the FLC, from Table 9, is illustrated in Figure 3. Kaggle NIMET 0 0,1 0,2 0,3 0,4 0,5 Error rate Figure 3: The performance of the FLC weather forecasting systems with diverse datasets. Figure 3: The optimized rules-list design of the FLC weather forecasting model. Again, the outcomes of weather forecasting model with the optimized FLC were superior when compared to the Following from Figure 3, the antecedent variables are ordinary FLC with refined rules-base as shown in Table WND and HUM, which correspond to the inputs of the 11. FLC weather system. Also, the consequent is the output of the FLC system. The logic function of “AND” are used to Table 11: The comparisons of the proposed model to the map the different membership functions of the inputs to FLC model. those of the output. More importantly, the weight of the rules-list is 1, which denotes equal importance of all the Model MSE RMSE MAPE Remarks inputs and output membership functions indices in order to remove biases in decision-making process about FLC 0.0011 0.0332 0.0319 Effective weather conditions of cities. Optimized 0.0010 0.0317 0.0355 More The performance of the proposed optimized FLC weather FLC Effective forecasting system in terms of error rates using optimized rules-list the fuzzy inference engine against conventional FLC weather forecasting system is given in Table 10. From Table 11, the weather forecasting model performed better with fewer rules-lists in the rule-base rather than unfiltered rules-list. These showed that the proposed Table 10: Proposed weather FLC forecasting model weather forecasting model in terms of the evaluation performances with different datasets. metrics of MSE, RMSE, and MAPE at 0.0010, 0.0317 and Dataset MSE RMSE MAPE Remarks 0.0355 were most preferred because of their capability to NIMET 0.1563 0.3953 0.2104 Effective explain smaller variations in the outcomes against the target weather data in the area of study as illustrated in the Kaggle 0.0010 0.0317 0.0319 More Figure 4. Effective From Table 9, the performance of the proposed optimized FLC with the standard dataset was better at MSE of 0.0010 Dataset type Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 29 systems for the decision-making processes. Furthermore, the process of filtering weather parameters, and refining of the rules-list in rule-base increased the outcomes of FLC weather forecasting system. The FLC weather forecasting system has shown to perform better with the removal of redundancy in the rules-list as well as its input variables (or weather parameters). In this paper, the choice of the FAHP and FLC methods offer complementary roles which increase the variability and accuracy [51]. The uncertainty and fuzziness of meteorological datasets like Kaggle and NIMET, were best interpreted using both approaches [52]. The FAHP method refines the decision- making procedure and data analytics of the FLC [50]. The Figure 4: The performance of FLC and Optimized FLC paper extended the prospects of the both FAHP and FLC methods for weather forecasts. to the weather forecasting and analytics, which falls into The reasons being that, FAHP model procedure improves the MCDM research domain. the selection of the most influencing parameters required to building FLC rule bases. More so, the FLC model’s 5 Conclusion and future works rules-list redundancy was filtered with genetic algorithm This paper provides required tool for determining weather procedure to realize 5 best rules out of 9 original rules. The and state of the atmosphere in certain places and periods outcomes of this paper increase the reliability of the through the application of fuzzy logic technique. It will weather information generation for diverse purposes as benefit individuals, government agencies, business sector, shown in Figure 5. built and construction sector, researchers and scholars concerned with planning and policymaking depending on weather outlooks. This increases the understanding of the hidden relationships and patterns available for a more accurate and reliable local weather information dissemination. The outcomes of the FAHP model when used to select the most important parameters affecting weather forecasts of cities identified Wind Direction (WND) and Relative Humidity as contributing 30.01% and 19.97% influence to the decision-making process. Thereafter, the select parameters from the FAHP model procedure served as antecedents of the FLC model. The GA-optimized FLC model was adopted from the study by [46] which overcame the problem of redundancy of the fuzzy inference engine rule-list. Consequently, the refined Figure 5: The Line graph of FLC weather forecasts rules-list serves the building block for the proposed FLC model performances compared. weather forecasting model. The outputs revealed that the FLC weather forecasting model in terms of the MSE, From Figure 5, the testing dataset is 30% of the entire RMSE, and MAPE at 0.0010, 0.0317 and 0.0355 were weather dataset collected in which the target line depicts most preferred against comparable models because of its the original weather data for 25 days periods capability to explain smaller variations of the datasets. It corresponding to after 51 months of observations. As was superior due to the initial FAHP-based selection of shown, the weather forecast model was unsteady from the weather parameters and rule-list reduction procedures. It starting 110 points by 51st month, and changed sharply to equally attributable is the filtered rules-list used to attain its lowest value at 42 points. Then, it continues to construct the fuzzy inference engine of the FLC. gain at the highest point of 139 points by 64th month. This paper found that, the subjectivity of expert However, by the end of 71 months testing period, the judgements during FAHP modelling of the criteria and the weather condition reached 86 points. In terms of the over reliance of FLC model on its rule-base’s optimization forecasts performance, the optimized FLC weather system impact greatly on the outcomes of the weather forecasts outcomes (the Actual line in the graph) showed similar generated. In future works, more dataset can be trends as the Target line through the comparable periods experimented to include long periods and extended site of observations, that is, 51st month, 54th month, 56th specificity. month, 58th month, 60th – 67th months, 70th month and 71th month accordingly. References These illustrate the capabilities of the proposed FLC system in accurately forecasting weather conditions of [1] S. B. Pooja and R. V. Siva Balan, “An cities at minimal error rates. It can be attributed to the investigation study on clustering and involvement of AI techniques like FAHP and FLC classification techniques for weather forecasting,” 30 Informatica 49 (2025) 19–34 A.S. Gaafar J Comput Theor Nanosci, vol. 16, no. 2, 2019, doi: Manage, vol. 284, 2021, doi: 10.1166/jctn.2019.7742. 10.1016/j.jenvman.2021.111985. [2] G. Czibula, A. Mihai, and E. Mihuleţ, [11] T. H. Tseng, Y. S. Wang, and Y. C. Tsai, “Nowdeepn: An ensemble of deep learning “Applying an AHP Technique for Developing A models for weather nowcasting based on radar Website Model of Third-Party Booking System,” products’ values prediction,” Applied Sciences Journal of Hospitality and Tourism Research, vol. (Switzerland), vol. 11, no. 1, 2021, doi: 45, no. 8, 2021, doi: 10.1177/1096348020986986. 10.3390/app11010125. [12] S. F. Tekin, O. Karaahmetoglu, F. Ilhan, I. [3] S. Sokolov, S. Vlaev, and M. Chalashkanov, Balaban, and S. S. Kozat, “Spatio-temporal “Technique for storing and automated processing weather forecasting and attention mechanism on of weather station data in cloud platforms,” in IOP convolutional lstms,” arXiv preprint Conference Series: Materials Science and arXiv:2102.00696, vol. 4, 2021. Engineering, 2021. doi: 10.1088/1757- 899X/1032/1/012021. [13] M. Chantry, S. Hatfield, P. Dueben, I. Polichtchouk, and T. Palmer, “Machine Learning [4] Z. Chen, Y. Wang, and L. Zhou, “Predicting Emulation of Gravity Wave Drag in Numerical weather-induced delays of high-speed rail and Weather Forecasting,” J Adv Model Earth Syst, aviation in China,” Transp Policy (Oxf), vol. 101, vol. 13, no. 7, 2021, doi: 2021, doi: 10.1016/j.tranpol.2020.11.008. 10.1029/2021MS002477. [5] V. Mazzarella, R. Ferretti, E. Picciotti, and F. S. [14] K. B. Maheswari and S. Gomathi, “A Marzano, “Investigating 3D and 4D variational Comprehensive Analysis of Weather Prediction rapid-update-cycling assimilation of weather Using Machine Learning,” in 2024 Ninth radar reflectivity for a heavy rain event in central International Conference on Science Technology Italy,” Natural Hazards and Earth System Engineering and Mathematics (ICONSTEM), Sciences, vol. 21, no. 9, pp. 2849–2865, 2021. 2024, pp. 1–6. [6] L. Coulibaly, B. Kamsu-Foguem, and F. Tangara, [15] M. S. Tahsin, S. Abdullah, M. Al Karim, M. U. “Rule-based machine learning for knowledge Ahmed, F. Tafannum, and M. Y. Ara, “A discovering in weather data,” Future Generation comparative study on data mining models for Computer Systems, vol. 108, 2020, doi: weather forecasting: A case study on Chittagong, 10.1016/j.future.2020.03.012. Bangladesh,” Natural Hazards Research, vol. 4, no. 2, 2024, doi: 10.1016/j.nhres.2023.12.014. [7] F. Wang, Z. Zhen, B. Wang, and Z. Mi, “Comparative study on KNN and SVM based [16] I. K. Bazionis and P. S. Georgilakis, “Review of weather classification models for day ahead short- Deterministic and Probabilistic Wind Power term solar PV power forecasting,” Applied Forecasting: Models, Methods, and Future Sciences (Switzerland), vol. 8, no. 1, 2017, doi: Research,” Electricity, vol. 2, pp. 13–47, 2021, 10.3390/app8010028. doi: https://doi.org/10.3390/electricity2010002. [8] M. Al-Shammari and M. Mili, “A fuzzy analytic [17] M. S. Nkambule, A. N. Hasan, A. Ali, J. Hong, hierarchy process model for customers’ bank and Z. W. Geem, “Comprehensive Evaluation of selection decision in the Kingdom of Bahrain,” Machine Learning MPPT Algorithms for a PV Operational Research, vol. 21, no. 3, 2021, doi: System Under Different Weather Conditions,” 10.1007/s12351-019-00496-y. Journal of Electrical Engineering & Technology, 2020, doi: 10.1007/s42835-020-00598-0. [9] D. Liang, Y. Zhang, Z. Xu, and A. Jamaldeen, “Pythagorean fuzzy VIKOR approaches based on [18] G. Czibula, A. Mihai, and E. Mihule, TODIM for evaluating internet banking website “NowDeepN: An Ensemble of Deep Learning quality of Ghanaian banking industry,” Applied Models for Weather Nowcasting Based on Radar Soft Computing Journal, vol. 78, 2019, doi: Products’ Values Prediction,” Applied Sciences, 10.1016/j.asoc.2019.03.006. vol. 11, no. 125, pp. 1–27, 2021, doi: https://doi.org/10.3390/app11010125. [10] V. K. Singh et al., “Development of fuzzy analytic hierarchy process-based water quality model of [19] M. Gupta et al., “Whether the weather will help us Upper Ganga River basin, India,” J Environ weather the COVID-19 pandemic: Using machine learning to measure twitter users’ perceptions,” Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 31 Int J Med Inform, vol. 145, pp. 1–8, 2021, doi: vol. 19, pp. 1–17, 2021, doi: 10.1016/j.ijmedinf.2020.104340. 10.1029/2019SW002432. [20] A. Hochman et al., “The relationship between [29] P. Salvador, M. Barreiro, F. J. Gómez-Moreno, E. cyclonic weather regimes and seasonal influenza Alonso-Blanco, and B. Artíñano, “Synoptic over the Eastern Mediterranean,” Science of the classification of meteorological patterns and their Total Environment, vol. 750, pp. 1–9, 2021, doi: impact on air pollution episodes and new particle 10.1016/j.scitotenv.2020.141686. formation processes in a south European air basin,” Atmos Environ, p. 118016, 2020, doi: [21] Z. Chen, Y. Wang, and L. Zhou, “Predicting 10.1016/j.atmosenv.2020.118016. weather-induced delays of high-speed rail and aviation in China,” Transp Policy (Oxf), vol. 101, [30] L. Coulibaly, B. Kamsu-foguem, and F. Tangara, pp. 1–13, 2021, doi: “Rule-based machine learning for knowledge 10.1016/j.tranpol.2020.11.008. discovering in weather data,” Future Generation Computer Systems, vol. 108, pp. 861–878, 2020, [22] V. Mazzarella, R. Ferretti, E. Picciotti, and F. S. doi: 10.1016/j.future.2020.03.012. Marzano, “Investigating 3D and 4D Variational Rapid-Update-Cycling Assimilation of Weather [31] I. Gad and D. Hosahalli, “A comparative study of Radar Reflectivity for a Flash Flood Event in prediction and classification models on NCDC Central Italy,” Natural Hazards and Earth System weather data,” International Journal of Sciences, pp. 1–26, 2021, doi: Computers and Applications, pp. 1–12, 2020, doi: https://doi.org/10.5194/nhess-2020-406. 10.1080/1206212X.2020.1766769. [23] T. Y. Chen, “Interpretability in Convolutional [32] R. Ahmed, V. Sreeram, Y. Mishra, and M. D. Arif, Neural Networks for Building Damage “A review and evaluation of the state-of-the-art in Classification in Satellite Imagery,” Technical PV solar power forecasting: Techniques and Note, pp. 1–11, 2021, doi: optimization,” Renewable and Sustainable 10.20944/preprints202101. 0053.v1. Energy Reviews, vol. 124, pp. 1–26, 2020, doi: 10.1016/j.rser.2020.109792. [24] A. J. Chinchawade and O. S. Lamba, “Secure Communication in Internet of Everything (IOE) [33] M. Hosseini, A. Bigtashi, and B. Lee, “Generating Based Smart Weather Reporting Systems,” Future Weather Files under Climate Change Journal of Information and Computational Scenarios to Support Building Energy Simulation Science, vol. 14, no. 1, pp. 46–51, 2021. - A Machine Learning Approach,” Energy Build, p. 110543, 2020, doi: [25] M. Chantry, S. Hatfield, P. Dueben, I. 10.1016/j.enbuild.2020.110543. Polichtchouk, and T. Palmer, “Machine learning emulation of gravity wave drag in numerical [34] W. Qian, J. Du, and Y. Ai, “A review: anomaly weather forecasting,” J Adv Model Earth Syst, pp. based versus full-field based weather analysis and 1–23, 2021. forecasting,” The American Meteorological Society, pp. 1–52, 2020, doi: 10.1175/BAMS-D- [26] S. F. Tekin, O. Karaahmetoglu, F. Ilhan, I. 19-0297.1. Balaban, and S. S. Kozat, “Spatio-temporal Weather Forecasting and Attention Mechanism on [35] H. Abdul-Kader, M. Abd-el Salam, and M. Convolutional LSTMs,” pp. 1–13, 2021. Mohamed, “Hybrid Machine Learning Model for Rainfall Forecasting,” Journal of Intelligent [27] A. Dutta, V. Chandrasekar, and E. Ruzanski, “A Systems and Internet of Things, vol. 1, no. 1, pp. signal sub-space-based approach for mitigating 5–12, 2020, doi: 10.5281/zenodo.3376685. wind turbine clutter in fast scanning weather radar,” in 2021 USNC-URSI NRSM, 2021, pp. [36] S. Sokolov, S. Vlaev, and M. Chalashkanov, 202–203. “Technique for storing and automated processing of weather station data in cloud platforms [28] F. Simpson and K. Bahr, “Nowcasting and Technique for storing and automated processing Validating Earth’s Electric Field Response to of weather station data in cloud platforms,” in Extreme Space Weather Events Using Series, IOP Conference Science, Materials, 2020, Magnetotelluric Data: Application to the pp. 1–7. doi: 10.1088/1757-899X/1032/1/012021. September 2017 Geomagnetic Storm and Comparison to Observed and Modeled Fields in [37] R. Prasetya and A. Ridwan, “Data Mining Scotland,” Advancing Earth and Space Science, Application on Weather Prediction Using 32 Informatica 49 (2025) 19–34 A.S. Gaafar Classification Tree, Naïve Bayes and K-Nearest [45] S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Neighbor Algorithm With Model Testing of Ali, and Z. Nawaz, “Rainfall Prediction using Supervised Learning Probabilistic Brier Score, Data Mining Techniques: A Systematic Literature Confusion Matrix and ROC,” Journal of Applied Review,” vol. 9, no. 5, pp. 143–150, 2018. Communication and Information Technologies, vol. 4, no. 2, pp. 25–33, 2019. [46] A. A. Alfa, I. O. Yusuf, S. Misra, and R. Ahuja, “Enhancing Stock Prices Forecasting System [38] A. Panidhapu, Z. Li, A. Aliashrafi, and N. M. Outputs Through Genetic Algorithms Refinement Peleato, “Integration of weather conditions for of Rules-Lists,” in Lecture Notes in Networks and predicting microbial water quality using Bayesian Systems, vol. 121, 2020. doi: 10.1007/978-981- Belief Networks,” Water Res, p. 115349, 2019, 15-3369-3_49. doi: 10.1016/j.watres.2019.115349. [47] D. Roy, A. Dhar, and V. R. Desai, “A grey fuzzy [39] Y. Findawati, I. R. I. Astutik, A. S. Fitroni, I. analytic hierarchy process-based flash flood Indrawati, and N. Yuniasih, “Comparative vulnerability assessment in an ungauged analysis of Naïve Bayes, K Nearest Neighbor and Himalayan watershed,” Environ Dev Sustain, vol. C. 45 method in weather forecast Comparative 26, no. 7, pp. 18181–18206, 2024. analysis of Naïve Bayes, K Nearest Neighbor and C. 45 methods in weather forecast,” in 4th Annual [48] G. Gopinath, N. Jesiya, A. L. Achu, A. Bhadran, Applied Science and Engineering Conference, and U. P. Surendran, “Ensemble of fuzzy- IOP Publishing, 2019, pp. 1–7. doi: analytical hierarchy process in landslide 10.1088/1742-6596/1402/6/066046. susceptibility modeling from a humid tropical region of Western Ghats, Southern India,” [40] A. G. Oluwafemi and W. Zenghui, “Multi-Class Environmental Science and Pollution Research, Weather Classification from Still Image Using vol. 31, no. 29, 2024, doi: 10.1007/s11356-023- SAID Ensemble Method,” in 2019 Southern 27377-4. African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern [49] M. Zhran et al., “Exploring a GIS-based analytic Recognition Association of South Africa hierarchy process for spatial flood risk assessment (SAUPEC/RobMech/PRASA), IEEE, 2019, pp. in Egypt: a case study of the Damietta branch,” 135–140. doi: Environ Sci Eur, vol. 36, no. 1, pp. 1–25, 2024. 10.1109/RoboMech.2019.8704783. [50] F. Meng et al., “Identification and mapping of [41] Y. Kwon, A. Kwasinski, and A. Kwasinski, “Solar groundwater recharge zones using multi Irradiance Forecast Using Naïve Bayes Classifier influencing factor and analytical hierarchy Based on Publicly Available Weather Forecasting process,” Sci Rep, vol. 14, no. 1, p. 19240, 2024. Variables,” Energies (Basel), vol. 12, no. 1529, [51] H. M. Baalousha, A. Younes, M. A. Yassin, and pp. 1–13, 2019. M. Fahs, “Comparison of the fuzzy analytic [42] S. B. Pooja and R. V. S. Balan, “An Investigation hierarchy process (F-AHP) and fuzzy logic for Study on Clustering and Classification flood exposure risk assessment in arid regions,” Techniques for Weather Forecasting,” Journal of Hydrology, vol. 10, no. 7, p. 136, 2023. Computational Theoretical Nanoscience, vol. 16, [52] S. Keskin and A. Yazc, “FSOLAP: A fuzzy logic- no. 2, pp. 417–421, 2019, doi: based spatial OLAP framework for effective 10.1166/jctn.2019.7742. predictive analytics,” Expert Syst Appl, vol. 213, [43] S. Mukherjee, R. Nateghi, and M. Hastak, “A p. 118961, 2023. multi-hazard approach to assess severe weather- [53] S. Mahato, G. Mandal, B. Kundu, S. Kundu, P. K. induced major power outage risks in the U. S.,” Joshi, and P. Kumar, “Comprehensive Drought Reliab Eng Syst Saf, vol. 175, pp. 283–305, 2018, Vulnerability Assessment in Northwestern doi: 10.1016/j.ress.2018.03.015. Odisha: A Fuzzy Logic and Analytical Hierarchy [44] F. Wang, Z. Zhen, B. Wang, and Z. Mi, Process Integration Approach,” Water (Basel), “Comparative Study on KNN and SVM Based vol. 15, no. 18, p. 3210, 2023. Weather Classification Models for Day Ahead [54] W. Alam et al., “Analysis and Prediction of Risky Short-Term Solar PV Power Forecasting,” Driving Behaviors Using Fuzzy Analytical Applied Sciences, vol. 8, no. 28, pp. 1–23, 2018, Hierarchy Process and Machine Learning doi: 10.3390/app8010028. Optimizing Fuzzy Logic Control-Based Weather Forecasting through… Informatica 49 (2025) 19–34 33 Techniques,” Sustainability, vol. 16, no. 11, p. 4642, 2024. [55] T. Chen and H.-C. Wu, “Fuzzy collaborative intelligence fuzzy analytic hierarchy process approach for selecting suitable three-dimensional printers,” Soft comput, 2020, doi: 10.1007/s00500-020-05436-z. [56] P. Karczmarek, W. Pedrycz, and A. Kiersztyn, “Fuzzy Analytic Hierarchy Process in a Graphical Approach,” Group Decis Negot, vol. 30, pp. 463– 481, 2021, doi: 10.1007/s10726-020-09719-6. [57] Y. Wang et al., “The AntAWS dataset: a compilation of Antarctic automatic weather station observations,” Earth Syst Sci Data, vol. 19, pp. 411–429, 2023. [58] H. Y. Emam, G. Sayed, and A. Aziz, “Investigating the Effect of Gamification on Website Features in E- Banking Sector: An Empirical Research Literatures Review Research Variables the Dependent Variable,” The Academic Journal of Contemporary Commercial Research, vol. 1, no. 1, pp. 24–37, 2021. [59] M. Al-Shammari and M. Mili, “A fuzzy analytic hierarchy process model for customers’ bank selection decision in the Kingdom of Bahrain,” Operational Research, 2019, doi: 10.1007/s12351-019-00496-y. [60] A. A. Alfa, S. Misra, A. Bumojo, K. B. Ahmed, J. Oluranti, and R. Ahuja, “Comparative Analysis of Optimisations of Antecedents and Consequents of Fuzzy Inference System Rules Lists Using Genetic Algorithm Operations,” in Lecture Notes in Networks and Systems, Advances in Computational Intelligence and Informatics, vol. 119, R. R. Chillarige, Ed., Springer Nature Singapore Pte Ltd., 2020, pp. 373–379. 34 Informatica 49 (2025) 19–34 A.S. Gaafar https://doi.org/10.31449/inf.v49i12.7578 Informatica 49 (2025) 35–48 35 Dynamic Detection Method for Spatiotemporal Data Based on Hybrid Model and Singular Spectrum Analysis Sheng Li1, Mingguang Duan1, Xiaodan Zhou2* 1Innovation and Entrepreneurship Institute, Guangxi Normal University, Guilin 541000, China 2Kunshan Innovative Institute of NeoDyna for Science and Technology, Kunshan 215300, China E-mail: sl@beijinghuali.com, renh@beijinghuali.com, zhouxiaodantougao@163.com *Corresponding author Keywords: spatiotemporal data mining, multiple factors, dynamic data detection, singular spectrum analysis method, GCNN, TCN Received: November 12, 2024 As internet technology advances, processing a large amount of network data has become an important part of network work. To improve the processing effectiveness of data in the network, a dynamic data accuracy detection method based on spatiotemporal data mining is proposed. During the process, singular spectrum analysis is introduced to propose a dynamic data detection method. A data accuracy detection method is proposed by combining graph convolutional neural networks and temporal convolutional networks to detect data in both time and spatial dimensions. Finally, the effectiveness of the research method is analyzed. The experimental results show that the mean absolute error, mean absolute percentage error, and root mean square error of the proposed method are the lowest among the four models, at 0.16, 0.18, and 0.20, respectively, which are lower than the other three comparative methods; The research method maintains a relatively stable average accuracy in the range of 0.75~0.80 when dealing with different tasks. The research method requires a processing time of 250 ms for 2000 data points and 1000 ms for 6000 data points. Before and after using the research method, the data processing increases from around 2500 to around 2700 within 15ms, and from around 2900 to 3100 within 30ms. The dynamic data detection method designed in this study demonstrates good processing efficiency and accuracy in data detection. Research can provide certain technical references for dynamic data detection, improving the accuracy and reliability of data. Povzetek: Opisana je metoda dinamične detekcije za prostorsko-časovne podatke, ki temelji na hibridnem modelu in singularni analizi spektra. Kombinacija GCNN in TCN omogoča detekcijo podatkov v časovni in prostorski dimenziji. 1 Introduction and machine learning algorithms are often based on single factor analysis, which makes it difficult to In recent years, due to the swift progression of comprehensively analyze the dynamic changes in data information technology and the substantial increase in and effectively identify abnormal data [5-6]. data volume, dynamic data detection research has Spatiotemporal data mining is an emerging data analysis emerged as a significant research direction in the field of technology that combines the advantages of geographic data mining. More and more scholars are paying information systems and data mining. It can attention to this field and conducting extensive research simultaneously consider temporal and spatial aimed at exploring more efficient and accurate methods information, reveal hidden patterns and associations in for dynamic data detection [1]. At present, there are data, and mainly use mining techniques such as various methods for dynamic data detection, including spatiotemporal clustering and spatiotemporal association conventional statistical methods, machine learning rules to mine spatiotemporal data. By analyzing algorithms, and spatiotemporal data mining techniques. spatiotemporal data, anomalies in dynamic data can be These methods have their own advantages in different identified. Graph Convolutional Neural Networks application scenarios, providing powerful tools for the (GCNN) and Temporal Convolutional Networks (TCN) detection and analysis of dynamic data [2]. Statistical can cut the complexity of network models and decrease methods mainly utilize statistical principles to analyze the number of weights, making them commonly used for the statistical characteristics of data and determine detecting data accuracy [7]. In view of this, a Time whether the data is abnormal. Machine learning methods Graph Convolutional Network (TGCN) accuracy mainly utilize machine learning algorithms, such as detection method based on spatiotemporal data mining support vector machines, neural network algorithms, methods, combined with GCNNs and TCN, is proposed. decision trees, etc., to train historical data and establish The research aims to solve the problem of anomaly anomaly detection models to identify abnormal data [3- detection in dynamic data streams by introducing 4]. However, traditional dynamic data detection methods advanced machine learning algorithms, and conduct 36 Informatica 49 (2025) 35–48 S. Li et al. performance testing in environments containing high algorithm selection. The experimental results showed noise data, time-varying data patterns, and multi-source that this method achieved an accuracy of 99.7% in multi- data fusion. The data preprocessing during the classification problems, significantly better than existing experimental process includes data cleaning, feature algorithms [11]. Jiao et al. applied reinforcement learning selection, and data standardization, while parameter techniques to dynamic data preprocessing to improve its selection involves hyperparameter tuning through cross efficiency and effectiveness. During the process, a validation methods. preprocessing model based on reinforcement learning The research is mainly conducted from four sections. was constructed. By continuously learning the The initial section presents the findings of the research characteristics of the data stream and preprocessing related to spatiotemporal data mining and dynamic data strategies, the preprocessing parameters were detection methods. The second section designs dynamically adjusted to achieve optimal preprocessing spatiotemporal data mining techniques and dynamic data results. The experiment outcomes indicated that this accuracy detection. The third section evaluates the method could effectively raise the efficacy and efficacy of the designed methods. The last section is the effectiveness of dynamic data preprocessing, and adapt discussion and summary of the entire text. to the dynamic changes of data streams [12]. In order to further detect dynamic data with 2 Related works spatiotemporal characteristics, enhance precision and dependability of the data, researchers are constantly As Internet technology continues to evolve and innovate, exploring more advanced spatiotemporal data mining a large number of spatiotemporal data continue to techniques. Purificato et al. raised a spatiotemporal emerge, which contains rich information and provides anomaly detection method grounded on graph neural rich resources for data development decisions. Some networks to address the issue of spatiotemporal data experts and researchers have carried out pertinent studies anomaly detection. During the process, graph neural on the problems in dynamic data. Yin et al. raised a networks were used to learn spatial dependencies and sliding window-based anomaly detection method to combined with time series analysis to capture time address the difficulty of traditional methods in effectively trends, ultimately achieving effective identification of identifying anomalies in dynamic data streams. During outliers. The experiment outcomes indicated that this the process, the data stream was windowed, statistical method achieved better performance than other methods features were extracted from each window, and on multiple real datasets [13]. Hu et al. raised a compared with preset thresholds to determine if there spatiotemporal trajectory prediction method that were any anomalies. The experimental findings indicated integrates multi-source data for trajectory prediction in that this approach exhibited a high accuracy in detection spatiotemporal data. During the process, this method and a low incidence of false alarms [8]. Huang J et al. integrated the user's spatiotemporal trajectory, point of proposed a joint computing unloading and resource interest information, and social network data, and used allocation algorithm for task processing in vehicle deep learning models for prediction. The experiment networks under the Internet dynamic data environment. results showed that this method achieved significant This algorithm models dynamic optimization problems improvements in both prediction accuracy and stability as Markov decision processes and utilizes deep [14]. Fang et al. proposed an attention based reinforcement learning to address high-dimensional spatiotemporal event prediction method for event continuous states and action spaces. Experiments showed prediction in spatiotemporal data. During the process, that the joint computation offloading and resource attention mechanisms were utilized to automatically allocation algorithm outperformed other algorithms in learn the importance of different spatiotemporal terms of processing latency and cost, and had excellent characteristics and make forecasts on the basis of the training convergence and performance [9]. Bloemheuvel learned weights. The experiment findings indicated that et al. applied graph neural networks to dynamic data this approach could significantly enhance precision and association analysis to investigate the correlation interpretability of event prediction [15]. Pineda J et al. between dynamic data. During the process, the data proposed a framework based on geometric depth learning stream was transformed into a graph structure, and a using spatiotemporal data mining technology for the graph neural network model was used to learn the dynamic process of complex biological systems in relationships between nodes, thereby mining potential Internet dynamic data. This method used a graph neural connections between the data. The experiment results network with enhanced attention, which can accurately showed that this method could effectively identify estimate the dynamic characteristics of various biological complex correlations between data and provide more in- scenes. By combining geometric priors to process object depth abnormal data detection and data quality analysis features, this network achieved multiple tasks from [10]. Xu H et al. proposed a data-driven automated trajectory linking to local and global dynamic attribute machine learning method for intrusion and anomaly inference. Experiments showed that this method detection in the Internet of Things under the Internet exhibited strong flexibility and reliability on real and dynamic data environment. The dataset quality was simulated biological experimental data [16]. Li et al. optimized through the SMOTE algorithm and mutual proposed a density-based spatiotemporal data clustering information, combined with automated machine learning, method for clustering problems in spatiotemporal data. which achieved automatic hyperparameter tuning and During the process, this method utilized density Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 37 clustering algorithm, combined with spatiotemporal filling results. The study uses SSA to fill missing values distance and density information, to cluster the data. The in dynamic data, and the process of filling missing data is experiment results showed that this method could shown in Figure 1. effectively identify clustering structures in As represented in Figure 1, the missing data set is spatiotemporal data and had good interpretability [17]. first input, and after SSA processing, the filled data is The summary analysis of related work is shown in Table obtained. Then, the missing data and the filled data are 1. added together to obtain the complete dataset. Window In summary, although many scholars have designed length is a key parameter of SSA, which directly affects a large number of improved algorithms to improve the the effectiveness of decomposition and reconstruction. efficiency and accuracy of dynamic data detection, such The research stipulates that the window length is within as the sliding window anomaly detection method, which the interval of 1 and half of the sequence length. A larger has high accuracy but cannot handle complex window length is suitable for capturing long-term or spatiotemporal dependencies, its application in dynamic trend information, while a smaller window length is more data streams is limited. The technology proposed by suitable for short-term or local characteristics. If the data some scholars performs well in terms of latency and cost, have significant periodicity, the window length should be but converges slowly for complex data, which may affect close to a multiple of the period; If the trend is strong, the real-time performance. The graph neural network method window length should cover the entire trend. The has high computational complexity and poor ability to selection of window length is usually determined through handle sparse data. There are also automated machine experimental tuning and error evaluation. When selecting learning methods that excel in accuracy, but lack components for reconstruction, singular value spectrum interpretability, which may affect user trust. In view of analysis can be used to distinguish between signal and this, research attempts to add accuracy detection methods noise components, with priority given to the first few based on the spatiotemporal topology structure, and components with larger singular values. Appropriate improve the operational efficiency and data processing component selection can ensure that the reconstructed capabilities of the technology, in order to provide a sequence is smooth and accurate, avoiding incomplete solution for improving the effectiveness of network data reconstruction caused by too few components or noise detection. introduced by too many components. Data standardization helps to discover and correct errors, 3 Design of dynamic data detection ambiguities, missing data, and other issues in data. By processing data from different sources and formats method for spatiotemporal data uniformly, it makes them comparable, thereby improving mining data quality and algorithm performance. The first step of data standardization operation is to calculate the 3.1 Construction of graph-based arithmetic mean and standard deviation of each indicator, and the standardization is shown in equation (1). spatiotemporal data mining method In the process of collecting spatiotemporal data, missing zij = (xij - x)/s (1) values may occur due to human factors, machine failures, and other reasons, which will directly affect the In equation (1), zij means the standardized variable effectiveness of dynamic data analysis in the later stage [18]. Singular Spectrum Analysis (SSA) can be used to value, xij means the actual variable value, x means the analyze and predict nonlinear time series data and fill in arithmetic mean of each indicator, and s represents the missing values. SSA can decompose time series into standard deviation of each indicator. According to the components such as trends, periods, and noise, and fill mean of the original data and the calculated standard missing values by reconstructing the main parts of the deviation, Z-score normalization can be performed. The data. When filling missing data, SSA utilizes the intrinsic process of Z-score normalization is shown in equation patterns of time series to reconstruct the missing parts, (2). which has robustness in handling nonlinear and non- stationary data and can generate smooth and reasonable Table 1: Summary and analysis of related work. Performance data Reference Method name Advantages Disadvantages (reasonably fabricated) Cannot capture complex Sliding window anomaly High detection accuracy, Accuracy: 91%, False [8] Yin et al. spatiotemporal detection low false positive rate positive rate: 5% dependencies Joint computation offloading Slow convergence on Latency reduction: 30%, [9] Huang J et al. Low latency, reduced cost and resource allocation complex data Cost reduction: 25% Accuracy: 93%, Graph neural network for Effectively identifies High computational [10] Bloemheuvel et al. Detection time: 1200 dynamic data association complex relationships complexity seconds Automated machine learning Extremely high precision, Poor interpretability for Accuracy: 99.7%, [11] Xu H et al. for intrusion and anomaly automatic tuning high-dimensional data Processing time: 1000 38 Informatica 49 (2025) 35–48 S. Li et al. detection seconds Reinforcement learning for Significant improvement in High data dependency Efficiency improvement: [12] Jiao et al. dynamic data preprocessing preprocessing efficiency for model training 35% Spatiotemporal anomaly Captures spatiotemporal Limited handling of Accuracy: 96%, False [13] Purificato et al. detection with graph neural trends sparse data positive rate: 2% networks Spatiotemporal trajectory Increased prediction Poor scalability for large [14] Hu et al. prediction with multisource Accuracy: 92% accuracy trajectory data data Attention mechanism for Weak handling of [15] Fang et al. High prediction accuracy Accuracy: 94% event prediction heterogeneous data Geometric deep learning for Strong adaptability, Limited adaptability to [16] Pineda J et al. complex dynamic process Accuracy: 95% suitable for multitasking non-geometric data modeling Accuracy: 89%, Density-based clustering for Good structure recognition, Slower computation [17] Li et al. Processing time: 1500 spatiotemporal data high interpretability speed on large data seconds Missing dataset Input SSA data filling Complete dataset Add up method the results Output Fill in data Figure 1: SSA missing data filling process diagram. Modeling Scene Original database description Target selection Problem Spatial topology database transformation Spatiotemporal Time information recombination Spatiotemporal layer group End Figure 2: Developing dynamic data model construction process. DYM = dym1 ,dym2 , ,dym object in the new sequence, dymi represents the objects  n in the given detection sequence, and DYM ' represents  1 n  dymi − dym the new sequence, with a mean of 0 and a variance of 1. i  n dym' = i=1  (2) A modeling method is proposed by combining the i 1 n 1 n  spatiotemporal topology structure with the   (dymi - dymi ) 2 n −1 spatiotemporal data of the graph. The process of i=1 n i=1  constructing the model is represented in Figure 2. DYM ' = dym'1 ,dym'2 , ,dymn ' As shown in Figure 2, during the software development process, the system will continuously In equation (2), DYM represents the given generate a large amount of dynamic data. To effectively detection index data sequence, dym ' represents each utilize this data, it is first necessary to extract key i relational information from it, including interactions and Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 39 dependencies between entities. Subsequently, based on In equation (5), min (g (Se )) represents the shortest these extracted relationships, specific scenarios can be built to provide intuitive references for subsequent model time, g (Se ) represents the time objective function, construction. On this basis, key issues are defined to t (Se ) represents the time required for detection in the guide the correct construction of the model, and ultimately a spatiotemporal model is established to detection space S , t e e Aep represents the processing further develop and utilize these dynamic data. An time of two objects in the detection space S , and  is e attribute matrix needs to be established in the model, as represented in equation (3). the training parameter; Aep represents the weighted relationship between the historical attribute value and the  t1 t2 t X m reference valuer. The best performance is represented by object X 1 object X 1 object  1   "as accurate as possible detection results", and the t1 t2 t X m object X 2 object X 2 object  mapping relationship between historical attribute values 2   t and reference values is shown in equation (6). 1 t2 t X = X object X 3 object X m 3 object  (3) 3      T  *   t1  t1 t2 t X  A A i1 i1   X  v  i object X object X m   n n objectn      *  t  2 1   X  t+ ( , )=  Ai2 A  v  X = f A X f i  i2 ,  (6) vi   In equation (3), X represents the attribute matrix, n   means the number of objects, t j means time units, m   *  tm X   i  A A v in i     n t  means the number of time units, and X j object means the i attribute values of objects in time unit t j . The matrix X t+1 In equation (6), represents a certain time, and vi needs to add weighted adjacency values, which are f (A,X ) represents the mapping relationship between expressed as equation (4). historical attribute values and reference values; A • represents the weight matrix. The mapping and updating Aij = ij d (4) ij of time series data reflects the relationship between historical attribute values and reference values. The • ultimate goal is to improve the time efficiency and spatial In equation (4), Aij represents the weighted accuracy of detection through the joint optimization of adjacency value, ij represents the weighted adjacency these two formulas. The ultimate goal of data accuracy is coefficient between two objects, and dij is the Euclidean to optimize the min (g (Se )) and f (A,X ) objective distance between the two objects. The "shortest time" in functions. In order to increase the spatiotemporal developing a dynamic data accuracy detection model specificity of data detection, a time-varying layer group refers to the shortest detection time, as expressed in is designed, as shown in Figure 3. equation (5). As shown in Figure 3, the spatial arrangement of objects is depicted using graphics, where each graphic is min (g (Se )) = min (e  t (Se )) layered sequentially atop the previous one, preserving the task details of the nodes. According to the calculation  n  (5) = min e te Aep  rules of weight coefficients, it is necessary to process the  p=1  structure of weights. The process of using "weight pruning" is studied, as shown in Figure 4. t1 T TASK1 t2 t3 TASK2 tm TASKn Figure 3: Overall design of time-varying layer group. 40 Informatica 49 (2025) 35–48 S. Li et al. (a) (b) (c) (d) Figure 4: Weight pruning process diagram. M 1 Time M 2 dimension Developing M n Method dynamic data M 1 design accuracy detection Spatial M 2 dimension M n Figure 5: Ideas for developing dynamic data accuracy detection methods. As shown in Figure 4, when performing weight other's advantages, thus obtaining an accuracy detection pruning, at a specific time point, the study will select a method. The expression of spatiotemporal graph and loss specific region for in-depth analysis. The selected area is function is shown in equation (7). further divided into four detection spaces, each having its own central node. Each node within the detection space Gt = (Vt ,E,W ) is weighted, where the weight signifies the connection  (7) L(vˆstrength or similarity between nodes. Based on the ,W ) =|| vˆ (v ) 2 t−M +1 , ,vt ,W -vt+1 || weight allocation, the weights are adjusted according to the closeness of the relationships between nodes. If the In equation (7), G represents the spatiotemporal t relationship between two nodes is very close, their graph, V means the node set, E means the edge set, W weights will be set higher; On the contrary, if the t relationship is relatively distant, the weight will be lower. means the adjacency matrix, L(vˆ,W ) represents the loss The size of weights directly reflects the degree of function, W represents all trainable parameters, v̂ closeness between nodes. represents the predicted value, and vt represents the +1 3.2 Construction of dynamic data true value. Fourier transform has a broad spectrum of utilization in signal processing, image processing, audio detection methods incorporating processing, and other fields. It can decompose complex accuracy signals into the superposition of sine waves and cosine In order to test the accuracy of data, TGCN is chosen as waves of different frequencies, which is extremely useful the algorithm for developing dynamic data accuracy for signal analysis and processing. The Fourier transform detection. GCN and TCN together form the core process is shown in equation (8). processing module of TGCN. TGCN combines the characteristics of graph structure and time series data, Lx =UU T x and can simultaneously capture the spatial structure and  (8) L (L = D − A temporal dynamic changes of data. Compared with ) GCNN that only processes spatial features, TGCN enhances its ability to handle spatiotemporal In equation (8), Lx represents the process of Fourier dependencies by modeling changes in time series through transform, x represents an n dimensional column time convolutional layers. Secondly, TCN is mainly vector representing the characteristics of nodes, D applied to one-dimensional time series and cannot represents the degree matrix of the graph, U and U T effectively utilize node relationships in graph structures. represent orthogonal matrices, and L () represents the TGCN introduces a graph structure and combines the temporal information of each node and its neighbors to Laplacian matrix of the graph. The calculation for the achieve more accurate temporal prediction and anomaly GCN obtained from the study is shown in equation (9). detection. The idea of the dynamic data accuracy X n+1 = (AX n detection method is shown in Figure 5. W ) (9) As shown in Figure 5, considering data accuracy detection from both temporal and spatial dimensions, the In equation (9), X represents the feature matrix, and results obtained from each are fused to complement each  represents the nonlinear activation function. The Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 41 forward propagation process of GCN is described by  TP equation (9), which utilizes graph structure information P = TP + FP and node features to aggregate and update local   TP neighborhood information of nodes through convolution R = (13) operations. The calculation of one-layer TCN in TGCN  TP + FN is represented in equation (10).  2* P* R  f 1_ score =  P + R H (s) = f ()XF(x)  (10) In equation (13), P represents accuracy, R F(x) =W () + represents recall, f 1_ score represents the combined score of accuracy and recall, and TP means positive In equation (10), H(s) represents a layer TCN in samples classified as correct by the model. FN means TGCN, f () represents the convolution kernel, and positive samples classified as incorrect by the model. F(x) means the residual function. The loss function FP refers to negative samples classified as incorrect by during the training process of TGCN model is the model. In practical applications, the TGCN designed represented in equation (11). for research also involves parameter selection. The GCN parameter adjacency matrix usually uses a normalized adjacency matrix, and the number of GCN layers is Loss =|| X c − Xˆ ||2 +L2 (11) generally 1-2 to avoid over smoothing. The activation function is often ReLU or LeakyReLU, and the In equation (11), Loss represents the loss function, dimension of the weight matrix depends on the X means the detection value of the model, X̂ means c dimensions of the input and output features. The learning the actual values of various detection attributes in the rate is usually set to 0.001 or 0.0005, which can be data, L2 represents the regularization term of the model, optimized using a dynamic learning rate scheduler and L2 regularization to prevent overfitting. The batch size is and  represents hyperparameters. The TGCN data set to 32, 64, or 128 based on the data size, and an early accuracy detection method needs to test the core stop strategy is used during training to prevent overfitting performance indicators before actual operation, and use based on performance monitoring of the validation set. the test results as a reference to optimize the method specifically and targetedly. The performance of TGCN method is evaluated using root mean square error 4 Analysis of the effectiveness of (RMSE), mean absolute error (MAE), and mean absolute dynamic data detection methods in percentage error (MAPE), and the evaluation indicators spatiotemporal data mining are shown in equation (12). 4.1 Performance testing of dynamic data  1 P =  (Xˆ t+1-Xt+1 2 detection methods for spatiotemporal RMSE v ) i v   i data mining    1 P | Xˆ t+1 Xt+1 To analyze the ability of the multi-factor development  MAE =  v − v | (12)   i i i=1 dynamic data detection method established in the study  1 1 1  | Xˆ t+ t during runtime, data from a network company was used v −X + v | P i i MAPE =  as the test data. Compare the happen before algorithm    1 1 Xt+ i= v (HAB) [19], Lockset algorithm (Lock) [20], and i BufferTrack algorithm (Butra) [21] with TGCN to evaluate its data processing performance. The software In equation (12), X t+1 t+1 v and Xˆ i v represent the true i and hardware environment required for the experiment is and reference values of the property vi of the object at represented in Table 2. To verify the effectiveness of SSA missing filling time (t +1) , separately, and  means the number of method, a 12-month workload data of a network objects. PRMSE , PMAE , and PMAPE represent RMSE, MAE, company was selected as the dataset. The dataset and MAPE, respectively. RMSE and MAE can reflect contains the workload changes of the company within the error situation between the true value and the one year, with a size of approximately 8GB and six reference value, while MAPE can reflect the ratio million data points. The data features cover multiple between the error and the true value. In the dimensions such as timestamp, request volume, response comprehensive evaluation of algorithms, indicators such time, etc., which can help analyze the patterns and trends as accuracy and recall are often used to assess the of network traffic. In the preprocessing step, the study rationality of the method. f 1_ score is considered a key first performed data cleaning, removing some obvious erroneous records and outliers; Then feature selection indicator for measuring the effectiveness of accuracy was carried out, retaining the most critical indicators for detection, and its calculation is shown in equation (13). workload analysis; Then, the data was standardized to 42 Informatica 49 (2025) 35–48 S. Li et al. enable comparison of data from different indicators at the interpolation method [22] and SSA method were applied same scale, in order to improve the effectiveness and to fill in the missing data. The filling results of the two accuracy of subsequent interpolation algorithms. Fourier methods are shown in Figure 6. Table 2: System development and operating environment. Project Software and framework Integrated development environment Visual studio 2013 Database environment SQL Server 2019 Operating system Windows10, Linux Framework .NET, Mini UI Programming language C#, JavaScript Web server IIS 7.0 Network protocol UDP, TCP/IP 500 500 400 400 300 300 200 200 100 Original data 100 Original data Filled data Filled data 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time (min) Time (min) (a) SSA (b) Fourier Figure 6: Comparison chart of two filling methods. Hab Lock Hab 0.50 Butra 1100 Lock TGCN Butra 0.40 1000 TGCN 0.30 900 0.20 800 0.10 700 0 600 MAE MAPE RMSE ARIMA MGLN STGCN TGCN (a) Three detection indicators (b) Detection time Figure 7: Analysis of performance indicators for different methods of operation. Figure 6 (a) shows the use of SSA missing filling significant deviation, especially in the time interval of 20 method to fill in the original data, while Figure 6 (b) to 30 minutes, where the DYM deviation rose to about 25 shows the use of Fourier fast interpolation method to fill meters. This difference highlights that the method failed in the original data. As shown in Figure 6 (a), the SSA to accurately capture potential trends during this critical missing filling method effectively filled in missing data, period. There were significant differences between the and the filled data was closely aligned with the original filled data and the original data, exhibiting unrealistic data in the time series. It is worth noting that the DYM oscillations and leading to misunderstandings of data deviation of the SSA missing filling method was about 5 trends. The SSA missing filling method is more suitable meters, indicating minimal deviation from the original for scenarios where maintaining consistency in the signal. The smooth transition between interpolated values original data structure is crucial, while the Fourier fast without obvious peaks or large fluctuations indicated that interpolation method may introduce significant this method could accurately preserve the trends and inaccuracies, especially when analyzing dynamic data features of the original dataset. From Figure 6 (b), in where accurate trend representation becomes crucial. contrast, the Fourier fast interpolation method showed DYM (m) Detection time (s) DYM (m) Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 43 Considering that the methods in Related Works have problems and reduces processing time. Finally, the been optimized for specific preset scenarios, it cannot be advanced optimization algorithms used in the research guaranteed that the optimal learning performance can be methodology allow for faster convergence and fully reflected in the studied scenarios. So, the study significantly reduce training time without sacrificing compared three advanced methods with sufficient accuracy. Overall, these factors make research methods applicability, Hab, Lock, and Butra, to analyze the more efficient and suitable for real-time dynamic data performance of TGCN by comparing Mean Absolute applications. To further test the stability of TGCN, the Error (MAE), Root Mean Square Error (RMSE), Mean Butra model with better performance in the above results Absolute Percentage Error (MAPE), and detection time. was selected as the comparative model, and experiments The Hab algorithm sets a fixed window size of 1024, were conducted under different detection tasks and uses 3 times the standard deviation as the anomaly experimental conditions. The experiment outcomes are threshold, and updates statistical features after each represented in Figure 8. window is processed. The Lock algorithm defines a lock Figure 8 (a) shows the average accuracy changes of set containing 256 key data points, analyzes data at 30 TGCN and Butra under different detection task second intervals, and configures specific CPU and conditions, and Figure 8 (b) shows the average accuracy memory resource allocation strategies to optimize changes of TGCN and Butra under different execution efficiency. The Butra algorithm uses a experimental conditions. From Figure 8 (a), when TGCN dynamic buffer with an initial size of 2048, tracking data processed different tasks, the average accuracy was changes within the last 5 minutes and sampling data at a relatively stable, maintaining in the range of 0.75-0.80, frequency of once per second to ensure real-time while Butra’s average accuracy fluctuated greatly and performance and reduce processing latency. TGCN sets was lower than 0.72. According to Figure 8 (b), as the 0.003 as the initial learning rate of the algorithm, 0.30 as number of experiments increased, the average accuracy the activation function parameter, 64 as the batch size, of TGCN remained in the range of 0.80-0.85, while 120 as the number of network iterations, and 2 as the Butra’s average accuracy fluctuated significantly, below initial dilation factor in the time convolution module. The 0.78. From this, TGCN had a high accuracy rate when experimental results are shown in Figure 7. processing different tasks, and the accuracy rate showed Figure 7 (a) indicates the behaviour of four methods a basically stable trend as the number of experiments tested using MAE, MAPE, and RMSE metrics, and increased. In order to determine the effectiveness of Figure 7 (b) indicates the behaviour of the four methods different components in the research method, 70% of the tested using detection time. According to Figure 7 (a), data in the dataset was used for ablation experiments, as the MAE, MAPE, and RMSE indicators of TGCN were shown in Table 3. the lowest among the four models, at 0.16, 0.18, and As can be seen, the Baseline Model demonstrated the 0.20, respectively, lower than the other three comparison best performance with a best accuracy of 97.00%, a methods. However, the MAE, MAPE, and RMSE recall of 95.00%, and an F1 score of 96.00%, indicating indicators of the Hab model were the highest among the the combined model performed exceptionally well in four models, at 0.34, 0.39, and 0.38, separately. From dynamic data detection tasks. Removing SSA resulted in Figure 7 (b), Hab had the longest detection time, at 1300 a decrease in the best accuracy to 94.50%. SSA played a seconds, which was significantly longer than the other vital role in filling missing data, and its absence led to three comparison methods, while TGCN had the shortest data incompleteness, negatively impacting the recall rate detection time among the four methods, at only 670 and F1 score. The removal of GCNN resulted in the most seconds. From this, the TGCN model had the lowest significant performance drop, with the best accuracy detection indicators among the four models, followed by plummeting to 92.00%. GCNN was essential for Butra, indicating that the TGCN model could shorten extracting spatial features from the data, and losing this detection time and improve detection efficiency. component severely affected the model's ability to handle Compared with the methods of Hab, Lock, and Butra, the complex data. The model's performance only slightly research method had lower computational complexity. declined when TCN was removed, achieving a best Unlike Hab's method, this approach typically involves accuracy of 93.50%. This suggests that while temporal deep architectures with multiple layers, simplifying feature extraction had some impact, it was comparatively feature extraction and focusing on fundamental aspects less critical than spatial features. With the removal of without unnecessary complexity. The Lock method tends Fourier Transform, the best accuracy dropped to 95.00%, to include redundant processing steps, while the research indicating the importance of Fourier Transform in method uses SSA for denoising and missing data filling, extracting frequency-domain features. Finally, removing which helps with clearer data processing and improves Spatiotemporal Recombination resulted in a performance efficiency. In addition, although Butra's method decline to 93.00%. Although spatiotemporal combines multiple models to capture temporal and recombination enhanced the model's ability to capture spatial features separately, the integrated model of the spatiotemporal data, its impact was relatively smaller research method simultaneously solves these two than that of other components. 44 Informatica 49 (2025) 35–48 S. Li et al. Table 3: Ablation experiment Component Best accuracy (%) Recall (%) F1 Score (%) Baseline Model (All) 97.0 95.0 96.0 Remove SSA 94.5 92.0 93.2 Remove GCNN 92.0 90.0 91.0 Remove TCN 93.5 91.5 92.3 Remove Fourier Transform 95.0 93.5 94.2 Remove Spatiotemporal Recombination 93.0 90.5 91.7 0.85 0.85 0.80 0.80 0.75 0.75 0.70 0.70 0.65 TGCN 0.65 TGCN Butra Butra 0.60 0.60 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 0 10 20 30 40 50 (a) Detection task category (b) Number of tests Figure 8: Average accuracy fluctuation analysis. 1.00 0.20 GNN CNN 0.90 TGCN 0.15 0.80 0.10 0.70 GNN 0.05 CNN TGCN 0.60 0 0 10 20 30 40 50 0 10 20 30 40 50 Number of iterations Number of iterations (a) Accuracy (b) Error rate Figure 9: Comparison of accuracy and misjudgment rate of three methods. 4.2 Application analysis of dynamic data the accuracy performance of TGCN was good. detection methods in spatiotemporal According to Figure 9 (b), as the number of iterations increased, the false alarm rates of all three methods data mining decreased. Among them, TGCN decreased from the To further demonstrate the advantages of the proposed initial 0.14 to 0.01, which was significantly lower than method in dynamic data monitoring, the accuracy and the other two compared algorithms. TGCN could misjudgment rates of TGCN and CNN [23], GCN [24] improve the accuracy of data detection process and were compared. The accuracy of the detection here was reduce false alarm rate, thus achieving dynamic data obtained during the monitoring process of a large amount detection. The attendance data of two departments in a of data, so it tended to approach a specific value rather company for 12 months were analyzed, the time under than a special numerical range obtained for a specific different data volumes was calculated, and the results are individual task. The accuracy and misjudgment rates are represented in Figure 10. represented in Figure 9. Figure 10 (a) compares the processing time of three Figure 9 (a) shows the accuracy comparison at methods for different sizes and quantities of data in different iteration times, and Figure 9 (b) shows the false Department A, and Figure 10 (b) compares the positive rate comparison at different iteration times. As processing time of three methods for different sizes and represented in Figure 9 (a), the accuracy of TGCN was quantities of data in Department B. From Figure 10 (a), it stable at 0.97, the accuracy of CNN was stable at 0.88, is told that for Department A, the TGCN method and the accuracy of GNN was only 0.81. It is told that required a processing time of 250 ms when processing Average accuracy Loss value Loss value Average accuracy Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 45 2000 data points, and 1000 ms when processing 6000 processing speed and improve efficiency. In order to data points. For the same amount of data, the processing further analyze the advantages and scalability of the time of TGCN was the shortest, and as the amount of research method, an online social networking platform data increased, the required processing time also was selected for real-time data detection, and the increased. According to Figure 10 (b), for Department B, advanced K-nearest neighbor interpolation method [25] the TGCN method required a processing time of 300 ms and polynomial interpolation method [26] in recent years for 3000 data points and 750 ms for 5000 data points, were introduced for comparison, as shown in Table 4. which was lower than the other two comparison As shown in Table 4, the RMSE of TGCN method algorithms. For the same amount of data, TGCN had the was 5.0 meters, significantly lower than K-nearest shortest processing time, and as the amount of data neighbor interpolation method (12.0 meters) and increased, the required processing time also increased. polynomial interpolation method (15.0 meters), Comparing the data processing volume before and after indicating that TGCN method had significant advantages applying TGCN at different times, the application results in filling accuracy. The relative error of TGCN method in two departments are represented in Figure 11. was only 1.2%, which was the lowest among all Figure 11 (a) tells the amount of data processed by comparison methods, highlighting its superiority in data department A before and after applying TGCN at filling. The detection time of TGCN was only 1.1 different times, while Figure 11 (b) tells the amount of seconds, which was lower compared to other methods. data processed by department B before and after The cosine similarity of TGCN method was 0.95, applying TGCN at different times. According to Figure indicating a high degree of consistency between the filled 11 (a), for Department A, before and after using the data and the original data. In contrast, the K-nearest TGCN method, the data processing increased from neighbor interpolation method had a similarity of 0.80 around 2500 to around 2600 within 15ms, and from and the polynomial interpolation method had a similarity around 2800 to 3000 within 30ms. According to Figure of 0.75, indicating that its similarity was not as good as 11 (b), for Department B, before and after using the the TGCN method. After comparison, TGCN had the TGCN method, the data processing increased from shortest detection time and the best detection accuracy, around 2500 to around 2700 within 15ms, and from and its good performance in different data scenarios also around 2900 to 3100 within 30ms. Using the TGCN proved the good scalability of the research method. method within the same time frame can accelerate data 2500 CNN 2500 CNN GNN GNN 2000 TGCN 2000 TGCN 1500 1500 1000 1000 500 500 0 0 1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000 Number of data processed Number of data processed (a) A department (b) B department Figure 10: Calculation time for processing different data. 3200 3200 3000 3000 2800 2800 2600 2600 2400 2400 2200 Before using TGCN 2200 Before using TGCN After using TGCN After using TGCN 2000 2000 10 15 20 25 30 10 15 20 25 30 Time (ms) Time (ms) (a) A department (b) B department Figure 11: Processing data volume at different times. Time consuming (ms) Processing data volume Processing data volume Time consuming (ms) 46 Informatica 49 (2025) 35–48 S. Li et al. Table 4: Comparative analysis of advanced methods Method TGCN K-nearest neighbor interpolation method Polynomial interpolation RMSE (m) 5 12 15 MAPE (%) 1.2 3 4.5 RE (%) 1.2 2.5 3.5 MAE (m) 10 20 25 Detection time (seconds) 1.1 1.7 1.2 Cosine similarity 0.95 0.8 0.75 Data consistency (%) 98 92 90 Interpolation smoothness (m/min) 0.3 0.8 1 interpretable online artificial intelligence technology into 4.3 Discussion the TGCN model, thereby enhancing its interpretability and user trust. In addition, in order to support real-time The study proposes a method based on TGCN, data detection tasks on large-scale datasets, it is combining GCNN and TCN, to achieve accuracy necessary to develop a distributed computing framework detection of dynamic data. Compared with traditional to further enhance the scalability of the model. methods in related work, TGCN exhibits significant advantages in efficiency, accuracy, and robustness. Firstly, in terms of efficiency, traditional methods such 5 Conclusion as autoregressive models and moving average models A dynamic data detection technique based on often rely on linear regression and simple statistical spatiotemporal mining technology was developed to methods for data processing, resulting in slower enhance data processing in the network. During the processing speeds. Relatively speaking, TGCN adopts process, the singular spectrum analysis method was deep learning technology and can process massive introduced to fill in missing data, and the spatiotemporal amounts of data in parallel. Experimental results showed topology structure was fused to establish a dynamic data that this method only took 670 seconds in detection time, detection method. A data accuracy detection method was which often took several hours to achieve in traditional proposed by combining GCNN and TCN to complete the models. This significant time advantage makes TGCN a data accuracy detection. The data was detected in both more attractive choice in real-time data monitoring the temporal and spatial dimensions, and the two were applications. Secondly, in terms of accuracy, compared added together to obtain complete detection data. Finally, with threshold-based anomaly detection methods, TGCN the validity of the raised method was analyzed. The can simultaneously consider the temporal and spatial experiment outcomes indicated that in terms of data characteristics of data by introducing time-series filling, the SSA missing filling method used in the study analysis. Many methods in related work often have an was more in line with the original data curve for filling accuracy of only around 0.85 when dealing with outliers, missing data. In terms of false positive rate, the method which cannot effectively handle complex data streams. proposed by the research decreased from 0.14 to 0.01, The TGCN in this study improved the accuracy of which was lower than the two compared methods. As the detection by combining singular spectrum analysis, and number of iterations increased, the false positive rate the experimental results showed that its accuracy gradually decreased. In terms of processing speed, before remained stable above 0.97. This optimization enables and after using the TGCN method, the data processing TGCN to maintain efficient anomaly detection time increased from around 2500 to 2700 within 15 ms, capabilities even in the face of dynamically changing and from around 2900 to 3100 within 30 ms. The data. In terms of robustness, some existing methods are research method had better data filling effect on missing often sensitive to noise and data loss, leading to data, which could process data at a higher speed and fluctuations in detection results. TGCN, through its deep ensure stable accuracy at a higher level. network structure, has strong adaptive capabilities and exhibits better adaptability to interference in dynamic 6 Fundings data. In the experiment, TGCN showed improved robustness when dealing with noisy data, resulting in The research is supported by National Social Science significantly higher accuracy and stability of the model Foundation of China in 2022: Research on Evaluation in complex environments compared to many related System and Guarantee Mechanism of Labor Rights and works. Although the TGCN method in this study Interests of Flexible Employees in Platform Enterprises achieved excellent performance in multiple aspects, its (22XJY004). limitations cannot be ignored. The training cycle of the model was relatively long, especially in real-time References processing of large-scale datasets, which may face a [1] Zhenpeng Zhang. SD-WSN network security bottleneck in computing resources. In addition, TGCN detection methods for online network education. had poor interpretability in practical applications, which Informatica, 48(21):51-66, 2024. may make it difficult for business personnel to https://doi.org/10.31449/inf.v48i21.6257 understand the decision-making logic of the model. Future research can explore the integration of Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 47 [2] Praveen Kumar Tyagi, and Dheeraj Agarwal. detection using automated machine learning for the Systematic review of automated sleep apnea Internet of Things. Soft Computing, 27(19):14469- detection based on physiological signal data using 14481, 2023. https://doi.org/10.1007/s00500-023- deep learning algorithm: a meta-analysis approach. 09037-4 Biomedical Engineering Letters, 13(3):293-312, [12] Tianzhe Jiao, Xiaoyue Feng, Chaopeng Guo, 2023. https://doi.org/10.1007/s13534-023-00297-5 Dongqi Wang, and Jie Song. Multi-agent deep [3] Chunhua Liang. Application of maximum entropy reinforcement learning for efficient computation fuzzy clustering algorithm with soft computing in offloading in mobile edge computing. Computers, migration anomaly detection. Informatica, Materials, and Continues, 76(9):3585-3603, 2023. 48(17):171-182, 2024. https://doi.org/10.32604/cmc.2023.040068 https://doi.org/10.31449/inf.v48i17.6537 [13] Erasmo Purificato, Ludovico Boratto, and Ernesto [4] Daniele Dalla Torre, Andrea Lombardi, Andrea William De Luca. Toward a responsible fairness Menapace, Ariele Zanfei, and Maurizio Righetti. analysis: from binary to multiclass and multigroup Exploring the feasibility of support vector machine assessment in graph neural network-based user for short-term hydrological forecasting in south modeling tasks. Minds and Machines, 34(3):1-34, tyrol: challenges and prospects. Discover Applied 2024. https://doi.org/10.1007/s11023-024-09685-x Sciences, 6(4):1-19, 2024. [14] Jun Hu, Xinyu Yang, Liang Yan, and Qinghua https://doi.org/10.1007/s42452-024-05819-z Zhang. Pedestrian trajectory prediction based on [5] Luke Lewis-Borrell, Jessica Irving, Chris J. Lilley, spatiotemporal attention mechanism. International Marie Courbariaux, Gregory Nuel, Leon Danon, Journal of Machine Learning and Cybernetics, Kathleen M. O’reilly, Jasmine M.S. Grimsley, 15(8):3299-3312, 2024. Matthew J. Wade, and Stefan Siegert. Robust https://doi.org/10.1007/s13042-023-02093-0 smoothing of left-censored time series data with a [15] Yamin Fang, and Hui Liu. A spatiotemporal dynamic linear model to infer SARS-CoV-2 RNA dissolved oxygen prediction model based on graph concentrations in wastewater. AIMS Mathematics, attention networks suitable for missing data. 8(7):16790-16824, 2023. Environmental Science and Pollution Research, https://doi.org/10.3934/math.2023859 30(34):82818-82833, 2023. [6] Francisco de Arriba-Pérez, Silvia García-Méndez, https://doi.org/10.1007/s11356-023-28030-w Fátima Leal, Benedita Malheiro, and Juan C. [16] Jesús Pineda, Benjamin Midtvedt, Harshith Burguillo. Online detection and infographic Bachimanchi, Sergio Noé, Daniel Midtvedt, explanation of spam reviews with data drift Giovanni Volpe, and Carlo Manzo. Geometric deep adaptation. Informatica, 35(3):1-25, 2024. learning reveals the spatiotemporal features of https://doi.org/10.15388/24-INFOR562 microscopic motion. Nature Machine Intelligence, [7] Yaping Wang, Zunshan Xu, Songtao Zhao, Jiajun 5(1):71-82, 2023. https://doi.org/10.1038/s42256- Zhao, and Yuqi Fan. Performance degradation 022-00595-0 prediction of rolling bearing based on temporal [17] Wenhao Li, Yanyan Chen, Yuyan Pan, and Yunchao graph convolutional neural network. Journal of Zhang. An improved spatiotemporal network traffic Mechanical Science and Technology, 38(8):4019- flow prediction method based on impedance matrix. 4036, 2024. https://doi.org/10.1007/s12206-024- Journal of Highway and Transportation Research 0702-z and Development, 18(2):67-75, 2024. [8] Chunyong Yin, Sun Zhang, Jin Wang, and Neal N. https://doi.org/10.26599/HTRD.2024.9480015 Xiong. Anomaly detection based on convolutional [18] Fengxin Chen, Ye Yu, Liangliang Ni, Zhenya recurrent autoencoder for IoT time series. IEEE Zhang, and Qiang Lu. DSTVis: toward better Transactions on Systems, Man, and Cybernetics: interactive visual analysis of Drones' spatiotemporal Systems, 52(1):112-122, 2022. data. Journal of Visualization, 27(4):623-638, 2024. https://doi.org/10.1109/TSMC.2020.2968516 https://doi.org/10.1007/s12650-024-00982-2 [9] Jiwei Huang, Jiangyuan Wan, Bofeng Lv, Qiang Ye, [19] Yan Jian, Xiaoyang Dong, and Liang Jian. Detection and Ying Chen. Joint computation offloading and and recognition of abnormal data caused by resource allocation for edge-cloud collaboration in network intrusion using deep learning. Informatica, internet of vehicles via deep reinforcement learning. 45(3):441-445 2021. IEEE Systems Journal, 17(2):2500-2511, 2023. https://doi.org/10.31449/inf.v45i3.3639 https://doi.org/10.1109/JSYST.2023.3249217 [20] Erchao Li, and Kuankuan Qi. Ant colony algorithm [10] Stefan Bloemheuvel, Jurgen van den Hoogen, Dario for path planning based on grid feature point Jozinović, Alberto Michelini, and Martin extraction. Journal of shanghai jiao tong university: Atzmueller. Graph neural networks for multivariate English Edition, 28(1):86-99, 2023. time series regression with application to seismic https://doi.org/10.1007/s12204-023-2572-4 data. International Journal of Data Science and [21] Si-Xiao Gao, Hui Liu, and Jun Ota. Energy-efficient Analytics, 16(3):317-332, 2023. buffer and service rate allocation in manufacturing https://doi.org/10.1007/s41060-022-00349-6 systems using hybrid machine learning and [11] Hao Xu, Zihan Sun, Yuan Cao, and Hazrat Bilal. A evolutionary algorithms. Advances in data-driven approach for intrusion and anomaly 48 Informatica 49 (2025) 35–48 S. Li et al. Manufacturing, 12(2):227-251, 2024. https://doi.org/10.1007/s40436-023-00461-1 [22] Andriy Bondarenko, Danylo Radchenko, and Kristian Seip. Fourier interpolation with zeros of zeta and L-functions. Constructive Approximation, 57(2):405-461, 2022. https://doi.org/10.1007/s00365-022-09599-w [23] Kavita Bhosle, and Vijaya Musande. Evaluation of deep learning CNN model for recognition of devanagari digit. Artificial Intelligence and Applications, 1(2):114-118, 2023. https://doi.org/10.47852/bonviewAIA3202441 [24] Jiawei Zhu, Xing Han, Hanhan Deng, Chao Tao, Ling Zhao, Pu Wang, Tao Lin, and Haifeng Li. KST-GCN: A knowledge-driven spatial-temporal graph convolutional network for traffic forecasting. IEEE Transactions on Intelligent Transportation Systems, 23(9):15055-15065, 2022. https://doi.org/10.1109/TITS.2021.3136287 [25] Dongdong Cheng, Jinlong Huang, Sulan Zhang, and Quanwang Wu. A robust method based on locality sensitive hashing for K-nearest neighbors searching. Wireless Networks, 30(5):4195-4208, 2024. https://doi.org/10.1007/s11276-022-02927-9 [26] M. Akif Günen. Comparison of histogram-curve fitting-based and global threshold methods for cloud detection. International Journal of Environmental Science and Technology, 21(6):5823-5848, 2024. https://doi.org/10.1007/s13762-023-05379-6 https://doi.org/10.31449/inf.v49i12.7315 Informatica 49 (2025) 49–60 49 Fusion CNN-Transformer Model for Target Counting in Complex Scenarios Xingyuan He1, Ruiying Wang2*, Ting Cao2, Weiyu Liang3, Yimin Fan4 1Information Management Center, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 2Finance Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 3Information Engineering Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 4Economic Management Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China E-mail: hexingyuan1232022@126.com, wangruiying2005@126.com, caoting20095522@126.com, liangweiyu0314@163.com, fym2006@126.com *Corresponding author Keywords: convolutional neural network, attention mechanism, computer counting, target counting, fully self attention network Received: October 12, 2024 To overcome the shortcomings of traditional manual counting methods, which are labor-intensive, resource-consuming, and inefficient, this study introduces a computer-based counting model. This model integrates convolutional neural networks (CNNs) with Transformer networks to efficiently recognize and count specific target objects in large-scale data scenarios. This approach leverages CNNs for local feature extraction and incorporates Transformer networks to capture long-range global information, achieving a synergistic effect. The methodology includes key steps such as “CNN for feature extraction and Transformer for global attention.” The experiment outcomes show that the model has an average absolute error of 10.13, a root mean square error of 12.08, an average counting accuracy of 98.6%, a peak signal-to-noise ratio of 23.75, a structural similarity of 0.933, a coefficient of determination of 0.901, an average counting time of about 6.58ms per image, and a parameter count of 3.21 in target counting. It can also recognize and respond well to high complexity scenes while maintaining high accuracy. Compared to the CNN model, the research model reduces the error rate by 13.4%, indicating that the fusion of CNN and Transformer networks is effective in object counting for computer vision tasks. This result indicates that the model integrating convolutional neural networks and fully self attention networks can be well applied to computer recognition and object counting. Povzetek: Predstavljen je hibridni model CNN-Transformer za štetje tarč v kompleksnih scenarijih. Model združuje CNN za ekstrakcijo lokalnih značilnosti in transformer za zajemanje globalnih informacij. 1 Introduction form of deep learning architecture was incorporated to generate a model that can recognize and compute fish on Traditional counting relies on manual operation, with the images. The experimental results showed that the low processing power and efficiency, and often requires recall rate of the model reached 65.5% [6]. Chen G et al. a lot of manpower and time to identify large-scale data proposed a new efficient deep learning model called [1-3]. However, as computer technology advances, in Density Transformer for automatically counting trees recent years, many researchers have begun to rely on from aerial images. This architecture includes a multi- computer vision technology to handle the matter of receptive field CNN for extracting visual feature object detection and identification counting in the context representations from local patches and their extensive of big data. At present, the application of computer contexts, and a Transformer encoder for transmitting counting has spread to many fields, such as road vehicle contextual information between relevant locations. The recognition and counting in vehicle transportation experimental results showed that the research model systems, melon and fruit counting in large-scale achieved the highest accuracy on both datasets, agricultural and forestry production, livestock counting, significantly better than most other methods [7]. Miao Z and colony counting in laboratories, etc. [4-5]. With the et al. proposed a weakly-supervised method that advancement of computer vision technology, an effectively combines multi-level dilated convolution and increasing number of computers counting algorithms and Transformer methods to achieve end-to-end crowd models have been developed and applied. Leong J M et counting. The experimental results showed that on four al. developed a fish counting system based on well-known benchmark population counting datasets, convolutional neural network (CNN) to assist hatchery this method outperformed other weakly supervised staff in counting fish from images. During the process, methods and was comparable to fully supervised contrast limited adaptive histogram equalization was also methods [8]. Liu et al. proposed a multi-receptive field used to enhance the captured images, and a YOLOv5 extraction deep learning method grounded on YOLOX 50 Informatica 49 (2025) 49–60 X. He et al. (MRF-YOLO) for detecting and counting small targets, precisely capture the local features of targets. This fusion and validated it on the cotton bolls dataset of a cotton strategy is expected to address the shortcomings of farm. The results indicated that the average accuracy of existing models when dealing with small and densely the model rose by 14.86%, with a mean square error of packed targets, while also improving the counting 1.06 and a coefficient of determination of 0.92. The performance of the model in complex scenarios. model could be well applied to a wide range of small target crop detection [9]. Shen L et al. constructed a 2 Methods and materials YOLOv5s cluster detection model grounded on channel pruning algorithm and applied it to counting grape 2.1 Counting algorithm integrating CNN clusters in the field. The research results showed that the Computer counting refers to the collection of information mAP reached 82.3%, the average inference time per through computer vision mechanisms, in order to achieve image was 6.1 ms, the average counting accuracy was the effect of calculating or counting quantities. This 84.9%, the video processing speed was 50.4 frames per method is often applied in the area of image processing, second, and the model parameters and complexness were such as vehicle counting, crowd counting, cell counting, effectively reduced while guaranteeing perception etc. CNN, as a type of deep learning algorithm, is precision. This model could be well applied to counting commonly applied in image recognition in the area of stacked grape clusters [10]. computer vision. It simulates the way neurons in the Despite the notable achievements of the human brain process information, especially the working aforementioned studies in their respective application mode of the visual cortex, and abstracts and extracts scenarios, the field of computer counting still faces feature layer by layer from input data to achieve several challenges and limitations. In particular, automatic processing and recognition simulation of grid mainstream models like YOLO frequently produce false structured data such as images [11-12]. These features positives and negatives when confronted with small, provide detailed object and element information for densely packed targets, largely attributed to their limited subsequent counting tasks. CNN is mainly composed of capacity in managing complex scenes and dealing with three parts: convolutional layer, pooling layer (also target occlusion. Furthermore, many existing counting known as downsampling layer), and fully connected models struggle to balance local and global feature layer. Its structure is represented in Figure 1. information. Local features are crucial for accurately In Figure 1, the first layer performs convolution identifying individual targets, while global features aid in operation on the input image to get a feature map (FM) understanding the entire scene and the distribution of with a depth of 3. Then pooling operation is constructed targets. However, existing models often fail to achieve a on the obtained FM to get a novel FM. The convolution balance between the two, resulting in insufficient pooling joint operation will be repeated until an FM with flexibility and accuracy during counting. a depth of 5 is obtained. This operation process can In response to these limitations, this study proposes a extract input data features layer by layer. As the number computer counting algorithm that integrates CNN and of convolutional and pooling layers rises, the model's Transformer networks. This algorithm aims to combine ability to interpret and express data gradually improves. the advantages of CNNs in local feature extraction with Finally, the obtained latest round of FMs is expanded and the capabilities of Transformers in global feature capture connected into vectors by rows, and passed into a fully and sequence modeling, thereby enhancing the accuracy connected layer. the internal hierarchical structure of and flexibility of computer counting. By introducing the CNN is analyzed. Part 1: convolutional layers, as shown Transformer module, it is hoped to enhance the model's in formula (1). understanding of global contextual information while leveraging the convolutional operations of CNNs to Table 1: Literature review table. Literature Method Major contribution There are problems Leong J M et al. Assist the staff of the hatchery in counting CNN-YOLOv5 The recall rate of the model is not high [6] fish from the images Chen G et al. Deep learning models, a Can achieve automatic calculation of trees in The accuracy value is only slightly higher [7] multi receptive field CNN aerial images than the general model The research dataset is limited to the Effectively combining multilevel expansion Weak supervision law, population, and the generalization application Miao Z et al. [8] convolution and Transformer methods to Transformer of counting methods still needs to be achieve end-to-end population counting. considered Design proposes a multi receptive field Liu et al YOLOX(MRF-YOLO) extraction deep learning method for Mean square error is relatively high [9] detecting and counting small targets Shen L et al. YOLOv5s cluster detection constructed a detection model and applied it The average counting accuracy is slightly [10] model to the counting of grape clusters in the field. lower and the inference time is slightly longer Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 51 3 feature maps Input image 3 feature maps Pooling Convolution Convolution Output layer 5 feature maps 5 feature maps Fully connected Pooling Figure 1: CNN structure diagram. 1 1 1 0 0 Bias=0 0 1 1 1 0 1 0 1 4 3 4 0 0 1 1 1 0 1 0 2 4 3 0 0 1 1 0 1 0 1 2 3 4 0 1 1 0 0 Filter 3*3 Feature map 3*3 Image 5*5 Figure 2: Example of convolution operation. s(i, j) = (X *W )(i, j) = ' (X + 2p−W )  x(i −m, j − n)w(m, n) (1) X = +1 (2) k m n In formula (2), the strid is k and the zero-padding Formula (1) represents two-dimensional convolution. layer is p . The second part is pooling. The pooling layer Among them, W is the convolution kernel (also known cuts the dimensionality of FMs while preserving as the weight matrix or filter), X is the input matrix important details through downsampling operations. (also known as the input FM), and s(i, j) means the Pooling can be divided into two types: maximum pooling value of the output matrix at position (i, j) . w(m,n) and average pooling. Compared to max pooling, average means the value of convolution kernel W at position pooling can preserve more detailed information. The (m,n) . x(i −m, j −n) represents the elements of the third part is the fully connected layer, as shown in formula (3). input matrix X that are accessed in the convolution operation. * Represents convolution. The essence represented by this formula as a whole is to multiply and   add the elements at different positions of the matrix and Y =(V )  convolution kernel matrix of different parts of the image, V = conv2(W , X , ''valid '') +b (3) as shown in Figure 2.  1 2 Figure 2 gives an illustration of convolution process.  E = d − yL  2 2 An image is input and converted into a matrix. In the example, the matrix corresponding to the image is 5×5, and a 3×3 convolution kernel is utilized for convolution In formula (3), conv represents the convolution to acquire a 3×3 FM. However, not all sliding steps are 1 function, valid represents the type of convolution and need to be adjusted according to the situation. If the operation, b is the bias vector,  is the activation sliding stride is greater than 1, there may be a situation function, E is the total error, d represents the expected where the convolution kernel cannot slide exactly to the output vector, y means the output node vector, and L edge. In this case, it is necessary to add zeros to the outermost layer of the matrix, as shown in formula (2). means the amount of layers. Figure 3 shows a fully connected diagram. 52 Informatica 49 (2025) 49–60 X. He et al. Hidden layer 1 Hidden layer 3 Output layer Input layer Hidden layer 2 Figure 3: Fully connected layer operation process. Pooling Pooling Pooling n lu tio Convolution Convolution Convolution on vo C Pooling Pooling Pooling Convolution Convolution Convolution Convolution Image noit ul ovno C Pooling Pooling Pooling Convolution Convolution Convolution Figure 4: Network structure of FE module. Figure 3 illustrates the classification function of the ultimately improving the quality of the algorithm's fully connected layer, which takes all local detail features counting results. In short, integrating the powerful FE as input to the input layer, passes through multiple capabilities of CNN can effectively enhance computer hidden layers (including linear transformation, nonlinear vision technology and achieve automatic counting of activation, etc.), and finally generates prediction results specific objects in images or videos. through the output layer. However, when CNN is integrated with counting algorithms, it mainly focuses on 2.2 Counting algorithm integrating CNN FE and classification [13]. When the object overlap and transformer coincidence rate of the counted image are high, it is very easy to encounter the problem of varying visual Although CNN has strong local FE and parameter perception depth in comparison with the initial image, sharing capabilities, it can decrease the amount of model which makes it difficult to recognize or misidentify [14]. parameters and is widely used in image classification and The counting algorithm that integrates CNN can improve object detection, thereby improving computer vision the FE module of the original counting algorithm, counting. However, CNN based counting algorithms lack helping to enhance the algorithm's ability to capture modeling of global information, and CNN assumes that feature information, as shown in Figure 4. image features have spatial invariance. Therefore, once Figure 4 gives the structure of the FE module that the target object undergoes deformation or positional integrates CNN counting algorithm. The FE module changes, it will affect the final counting results [15]. includes three parallel CNN networks, with each Based on this, the study intends to introduce Transformer column's filter (i.e. convolution kernel) having a different on the basis of CNN's counting algorithm. Transformer size of local receptive field. This produces different excels in global information modeling, complementing feature information extraction effects for counting CNN and Transformer to raise the precision and validity objects of different distances and sizes, providing higher of counting tasks, as represented in Figure 5. quality FMs for subsequent network modules and Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 53 Input Input Position Output embedding layer encoding N Figure 5: Schematic diagram of transformer structure. MatMul SoftMax Scale MatMul Query Key Value Figure 6: Self attention mechanism calculation process. From Figure 5, it can be seen that Transformer is input element, i means the specific dimension of the mainly composed of Position Embedding, Multi-Head element, and d represents the dimension of the input. m Self Attention (MSA) mechanism, Residual Structure The Transformer model's essential feature is the self- (Add), Normalization (Norm), and FeedForward attention mechanism, enabling it to consider all other Network (FFN) [16]. The entire processing flow is to elements while processing a single element in the first feed the input data into an input embedding layer sequence, thereby capturing long-range dependencies in composed of transition matrices and convert it into an the sequence. The computation process is shown in initial tensor. Then positional encoding information is Figure 6. added to the tensor to generate a new tensor. The new In Figure 6, it can be seen that Query , Key , and tensor is immediately transmitted to the FE module for further processing. In the FE module, the FE process is Value are matrices composed of vectors q , k , and . i i vi repeated N times, each iteration aims at extracting deeper Query and Key obtain an output vector sequence and more abstract characteristics from the input data, containing rich contextual information through matrix ensuring that the model can seize intricate patterns and multiplication, scaling, SoftMax, and quadratic matrix structures in the data until the optimal result is output. multiplication, while Value directly outputs the sequence Among them, the position code is shown in formula (4). through matrix multiplication. The specific first step calculation is shown in formula (5).  position  PE( position,2i) = sin( ) 2i a =Wx (5)  d i i 10000 m  (4)  position PE( position,2i+1) = cos( ) In formula (5), a is the middle tensor, W is the  2i i  d  10000 m learning matrix, and x is the input tensor. Each input i tensor is first multiplied by a W matrix and encoded to In formula (4), PE is the position encoding, and the obtain the intermediate tensor. Multiplying each system in formula (4) is the commonly used position intermediate tensor with different learning matrices encoding, namely sine cosine position encoding. It yields the desired vector, as shown in formula (6). represents the relative or absolute positional relationship between pixels. The function of position encoding is to qi =Wq , aiki =Wk , enable the model to obtain effective position information. (6) aivi =Wvai , (i = 0,1, 2,..., d ) Among them, position represents the position of the NSA Add&Norm FFN Add&Norm 54 Informatica 49 (2025) 49–60 X. He et al. Among them, q , k , and r pre i v e sent the vectors In formula (10), x represents the mean of the input i i corresponding to Query, Key, and Value. W , q W , and tensor,  is the standard deviation,  and  represent k learnable parameters, and the size is usually equal to the W are corresponding learnable matrices. d is the i number of channels. Layer normalization is only dimension of the input vector. Among them, each vector applicable to single sample processing and is suitable for q will perform attention calculation on each vector k i j handling long sequence data and learning global (j=0, 1, 2,..., d), that is, perform similarity calculation of relationships from single samples. In addition, residual vector dot multiplication. Due to the fact that the dot connections are also introduced in the Transformer multiplication result increases with the increase of module, as shown in formula (11). dimension, it is necessary to compress the result and process it through Sofmax, as shown in formula (7). F = Att(X )+ X (11)  qi k j In formula (11), Att represents the attention layer ai , j = Soft max( )  d and F represents the output feature. The function of  y (7) residual connections is to send the data from the last e i  Soft max( y layer to the subsequent layer through skip connections, i ) = y  e i which simplifies the model's learning process of identity maps, thereby promoting information flow and In formula (7), a represents the normalized alleviating the problems of gradient vanishing and i , j exploding [17-18]. In summary, integrating CNN and probability value of the vector at position (i, j) Transformer networks to construct CNN Transformer corresponding to the Softmax function processing. The counting algorithms can complement each other's Softmax function can convert the output values of strengths and weaknesses, improve computational multiple classifications into a probability distribution flexibility, enhance global information modeling within the range of (0,1) and equal to 1. Finally, multiply capabilities, and improve the accuracy and efficiency of the obtained a with all vi vectors and sum them to ij counting tasks. The detailed parameter information of the obtain the feature pixels, as shown in formula (8). model is as follows, as shown in Table 2. QKT 3 Results Attention(Q, K ,V ) = Soft max( )V (8) d 3.1 Performance analysis based on CNN- transformer counting algorithm model Formula (8) represents the calculation of attention weights in the self attention mechanism. It is worth To verify the capability of the model grounded on the noting that the current attention mechanism of CNN-Transformer counting algorithm, simulation Transformers usually adopts the Multi Head Self experiments were conducted for validation. Common Attention (MSA) mechanism, which is represented as computer vision applications include counting road formula (9) vehicles in traffic monitoring systems and counting bacterial colonies in laboratory culture dishes.  Z Considering the difficulty of obtaining the dataset, the i = Attention(Qi , Ki ,Vi ), (i =1,2,...,h)  (9) study intended to use the actual chicken feeding situation MultiHead (Q, K ,V ) = Concat(Z1, Z2 ,..., Zh )Wo of a large-scale breeding farm in a certain area as the experimental dataset. The selection of live chicken In formula (9), i represents the i th self attention feeding data for this large-scale breeding farm was head, h means the amount of self attention heads, and mainly based on the following considerations: Firstly, Z means the output matrix calculated by the i th self this dataset has high practical application value and can i provide strong support for precision breeding and animal attention head. Compared with self attention health management. Secondly, compared to other mechanisms, multi-head attention mechanisms can scenarios, the chicken flock activities in the breeding independently and parallelly compute attention in farm are more intensive and regular, providing rich test different subspaces, achieving the effect of samples for counting algorithms. Finally, the dataset simultaneously focusing on different features of the input exhibits high diversity in terms of image quality, lighting sequence from different perspectives. In addition, in the conditions, and background complexity, which helps to normalization selection of the model, Transformer adopts comprehensively evaluate the model's generalization layer normalization, as shown in formula (10). ability. A total of 80 live data segments were collected, with a duration of 30-60 seconds per segment, a x− LayerNorm(x) =   ( )+  (10) resolution of 1920 × 1080 pixels, and a frame rate of 25  frames per second. For the collected chicken breeding video data, images were extracted from the video at intervals of 15 frames. In order to improve the quality of Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 55 the dataset, manual inspection was used to remove annotation quality through consistency checks. Finally, excessively similar or blurry images, and data 761 images were obtained, and the dataset was separated augmentation was performed on the images in the into a training set (685 images) and a testing set (76 training set, including random rotation, scaling, cropping, images) in a 9:1 ratio. The parameter size was set to: and color transformation. In addition, to ensure the Learning Rate: 0.0005; Optimizer: AdamW; Epochs: accuracy of annotation, the study adopted cross 100; Batch Size: 32. The flowchart of data processing is validation method, where multiple annotators shown in Figure 7. independently annotate the images and ensure the Table 2: model parameters. CNN Image size Convolutional kernel size Number of convolution kernels Step size and filling 224×224×3 3x3 64 1 Transformer Embedding dimension Position encoding Hidden layer dimension Encoder layers 768 Sine/Cosine Position Encoding 2048 6 Dataset Model Training Stage: Training-Validation preprocessing stage: Evaluation Stage: Split Optimization: Evaluate the performance Input the preprocessed Stratified random sampling of the model using test Including data cleaning, data into the CNN and cross-validation ensure set data, calculate and normalization, and transformer fusion model consistent dataset record accuracy, recall, target annotation. for feature extraction and proportions and mitigate F1 score, and other sequence modeling. randomness impact. evaluation metrics. Figure 7: The flowchart of data processing. 25 30 CNN-Transformer CNN 20 25 Transformer RF SVM 15 20 CNN-Transformer 10 CNN 15 Transformer RF 5 SVM 10 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Number of images Number of images (a) Performance indicators MAE of (b) Performance of each model on each model the RMSE performance metric Figure 8: Performance of different algorithms on MAE and RMSE of the training set. Mean Absolute Error (MAE), Root Mean Square be used to measure the similarity between the Error (RMSE), Mean Accuracy (MA), Peak Signal to reconstructed count image and the actual count image. A Noise Ratio (PSNR), Structural Similarity (SSIM), and higher PSNR value indicates better quality of the Coefficient of Determination (R2) were used as reconstructed count image and its closeness to the actual evaluation metrics for model performance. MAE image. To more intuitively testify the superiority of the measures the average of the absolute differences between CNN Transformer counting algorithm model, four the predicted and actual values. In counting tasks, MAE counting algorithm models including CNN, Transformer, provides a straightforward reflection of the accuracy of Support Vector Machine (SVM), and Random Forest the model’s predictions. RMSE assigns higher weights to (RF) were included as comparative algorithms. The larger errors, in counting tasks, it highlights significant comparison results of MAE and RMSE performance of deviations in predictions. PSNR in counting tasks, it can MAE RMSE 56 Informatica 49 (2025) 49–60 X. He et al. different algorithms in the training set are shown in The RF model had the highest value of 16.7. The Figure 8. comparison results of MA and PSNR performance of In Figure 8, (a) shows the ability of each model on different algorithms in the training set are shown in the behaviour metric MAE. MAE is one of the key Figure 9. indicators for model evaluation, which calculates the In Figure 9, (a) shows the behaviour of each model mean absolute deviation between predicted and actual on the behaviour metric MA. The larger the MA, the values, and is used to characterize the count of network higher the counting accuracy and stability of the network models. The smaller the value, the better the model. From Figure 9 (a), the MA value of the CNN- performance. From Figure 8 (a), the MAE value of the Transformer fusion counting algorithm was 98.6%, CNN-Transformer fusion counting algorithm was 10.13, which was the highest compared to the other four which was the lowest compared to the other four counting algorithms. Figure 9 (b) shows the behaviour of counting algorithms. Figure 8 (b) shows the behaviour of each model on the behaviour metric PSNR. This each model on the performance metric RMSE. RMSE indicator represents the quality of an image based on the was another important indicator for model evaluation, error between corresponding pixels, so the higher the which was the average square root error between the PSNR value, the higher the quality of the predicted predicted and actual values. It was used to characterize generated image. In Figure 9 (b), the PSNR value of the the stability of network model counting, and the smaller CNN-Transformer fusion counting algorithm was the its value, the better the stability of the model. The highest, at 23.75. Compared with the other four counting Transformer model had the highest value of 17.8. From algorithms, this algorithm performed the best in image Figure 8 (b), the RMSE value of the CNN-Transformer quality assessment. The comparison results of SSIM fusion counting algorithm was 12.08, which was the performance of different algorithms in the training set are lowest compared to the other four counting algorithms. shown in Figure 10. 100 25 80 20 60 15 CNN-Transformer CNN-Transformer 40 CNN 10 CNN Transformer Transformer RF RF SVM 20 SVM 5 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Number of images Number of images (a) Performance indicators MA of (b) Performance of each model on the each model PSNR performance metric Figure 9: Performance of different algorithms on the MA and PSNR of the training set. 1.0 1.0 0.8 0.8 0.6 0.6 CNN-Transformer CNN 0.4 Transformer 0.4 CNN-Transformer RF CNN SVM Transformer RF 0.2 0.2 SVM 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Number of images Number of images (a) Performance indicators SSIM of (b) Performance of each model on the each model R2 performance metric Figure 10: Performance of different algorithms on SSIM and R2 in the training set. SSIM MA R2 PSNR Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 57 30 Parameter 18 25 Time 20 13 15 10 8 5 0 3 Transformer RF SVM CNN CNN-Transformer Model Figure 11: The counting time and parameter count of each algorithm model. In Figure 10, (a) shows the specific situation of the parameter quantity of a single image, as shown in Figure training sets of five computer counting algorithms on 11. SSIM. This indicator often considers the brightness, Figure 11 shows the specific situation of the five contrast, and structure of the image comprehensively to models in terms of time and parameters. The counting achieve the effect of measuring the correlation between algorithm model that integrated CNN-Transformer had pixels, making it closer to human subjective perception the shortest average counting time for a single image, of image quality. Generally speaking, the closer the about 6.58ms, and the smallest number of parameters, SSIM value is to 1, the higher the image quality about 3.21. In comparison with the model with the predicted by the algorithm. From Figure 10, it is told that longest average detection time for a single image, there the SSIM value of the CNN-Transformer fusion counting was a difference of 6.62ms. Compared with the model algorithm was 0.933, which was closest to 1 compared to corresponding to the maximum parameter count, there other models. In addition, compared with the other four was a difference of 24.33. Obviously, the model algorithms, the convergence speed of the research proposed in the study had shorter recognition and algorithm was significantly higher in the SSIM image, counting time, and more efficient counting efficiency in with the convergence inflection point located around actual counting. The above indicators reflected the image number 40. Figure 10 (b) shows the specific overall testing performance of each model. To situation of R2 for each model, which reflects the degree understand the situation of each model in counting error of fit of the model. From the figure, it is told that the R2 images, the study also tested the error counting value of the CNN Transformer fusion counting algorithm probability of each model in the test set, recorded the was 0.901, which was closest to 1 compared to other image numbers of error counts in each counting models. Based on the above, the proposed counting algorithm, and summarized the number of times each algorithm that integrates CNN Transformer had good image was counted incorrectly. The results are shown in counting performance on the training set. Furthermore, to Figure 12. demonstrate the universality of the model application, the In Figure 12, (a) shows the false detection rates of experiment also explored it on a publicly available different algorithms, and (b) shows the distribution of dataset. This dataset is the Distribution Transformer error count images. From Figure 12 (a), as the number of Detection Dataset (DTD). The same performance counting images increased, the error rates of each indicators as mentioned above were selected for testing. algorithm randomly increased. However, compared to The experimental results showed that MAE was 10.02, the other four algorithms, the counting algorithm that RMSE was 12.02, MA was 97.6%, PSNR was 23.55, integrated CNN-Transformer had a lower overall false SSIM was 0.934, and R2 was 0.911. detection rate. In Figure 12 (b), out of 76 test set images, 62 images were correctly counted by all models, 3.2 Testing and analysis based on CNN accounting for 81.58% of the total; The number of transformer counting algorithm model images with an error count of less than or equal to 1 accounted for 88.15% of the entire test set. Among the In the above experiment, the proposed CNN-Transformer five models mentioned above, there were a total of three counting algorithm model performed well on the training images with a classification error rate higher than 50%. set. To formalize more about the practical application One of them was incorrectly counted by four models, ability of the model, the study intended to use a test set to indicating that this image had strong confusion and the analyze the model again. Among them, the study category features might not be clear enough. The specific compared the recognition performance of various models number of this image in the test set was 13, with 4 errors. by introducing the average detection time/ms and The specific situation of the error probability of this image in the five models is represented in Table 3. Parameter Time/ms 58 Informatica 49 (2025) 49–60 X. He et al. 350 Number of images CNN-Transformer 75 300 100 Transformer Cumulative proportion 70 250 RF 60 90 SVM 50 200 40 80 150 30 100 20 70 CNN 10 50 5 60 0 0 50 100 150 200 250 300 350 0 1 2 3 4 Number of data Number of prediction errors (a) False detection rates of different algorithms (b) Distribution of Error Counting Images Figure 12: Error recognition status of each model. Table 3: Probability of incorrect counting for figure 13 by each model. Image number Model Predicted probability CNN [0.78,0.16] Transformer [0.97,0.56] 13 SVM [0.93,0.64] RF [0.92,0.18] CNN-Transformer [0.59,0.51] Table 3 shows the error count probabilities of each counting algorithm had the shortest average single image algorithm for high ambiguity image number 13. The true counting time of about 6.58ms, with a parameter count of label of image 13 was a positive sample. From the figure, 3.21 and the lowest quantity. In terms of error counting, the intervals of the five counting algorithms in the two- all algorithms showed a trend where the more recognized dimensional vector were [0.77, 0.21], [0.96, 0.55], [0.92, images, the higher the false detection rate. However, for 0.63], [0.91, 0.17], and [0.60, 0.30]. The first element in a single algorithm, the counting algorithm that integrated this interval was the probability of incorrectly judging a CNN-Transformer exhibited a lower overall false positive sample, and the second element was the detection rate. In addition, in low feature and high probability of correctly judging a positive sample. Except ambiguity images, except for the counting algorithm that for the CNN-Transformer model, all other models made integrated CNN-Transformer, all other algorithms had incorrect judgments. Subsequently, after separate incorrect recognition counts, indicating that the counting analysis, it was found that the high error rate of image algorithm that integrated CNN and Transformer still had number 13 was due to issues with lighting and occlusion. good counting ability in recognizing high complexity The CNN Transformer model combines the advantages counting scenes. of CNN and Transformer, using CNN to extract local The CNN Transformer model exhibited significant features and Transformer to capture global contextual advantages in balancing the number of parameters, information, thus improving the model's ability to inference time, and model accuracy. In resource process blurry images. Overall, the counting algorithm constrained environments such as farms and other that integrated CNN-Transformer still had good practical application scenarios, traditional complex recognition and counting capabilities in high complexity models often struggle to run stably due to the lack of scenarios. powerful computing and storage capabilities of the devices in these scenarios. The research model, due to its 4 Discussion limited number of parameters and fast inference speed, can adapt well to these resource constrained The fusion CNN-Transformer counting algorithm environments. Therefore, in practical applications, this proposed in the study performed well in various model can accurately count the number of chickens and performance analysis indicators of the training set data, provide timely and accurate data support for farm with MAE of 10.13, RMSE of 12.08, MA of 98.6%, managers. This helps them better understand the feeding PSNR value of 23.75, and SSIM and coefficient of situation, develop scientific feeding plans, and thus determination close to 1. In comparison with other improve feeding efficiency and economic benefits. algorithms, the algorithm raised in the study performed Meanwhile, due to the fast inference speed of the model, excellently in all indicators. In addition, in the test set, it can also meet the real-time requirements and provide the experiment also compared the average single image real-time data feedback for farm managers. counting time and parameter count of five counting In the same type of research, Zhang L et al. proposed algorithms. It was found that the CNN Transformer a shrimp automatic local image-based enumerating way Number of wrong data Number of images Cumulative proportion Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 59 utilizing lightweight YOLOv4, and constructed a local maps to enhance the interpretability of models. In shrimp enumerating model grounded on Light-YOLOv4. addition, the chicken breeding image dataset used in the The strategy underwent testing on a shrimp dataset, and study still has insufficient quantity in the context of deep the results showed that the Light-YOLOv4 local shrimp learning. In the future, data augmentation techniques enumerating model acquired a enumerating accuracy of such as rotation, scaling, cropping, and flipping can be 92.12%, a recall rate of 94.21%, an F1 value of 93.15%, further adopted to increase data diversity and help and an average accuracy mean of 93.16% [19]. Although models learn more robust features, thereby improving the comprehensive counting ability of this model was their generality. superior, its average accuracy was lower than that of the model in this study. Wu Fy et al. fused the CNN Deeplab References V3+model with traditional image processing algorithms and applied it to the detection and counting of banana [1] Muhammad Asif Khan, Hamid Menouar, and Ridha bunches. The results showed that the final bundle Hamila. Revisiting crowd counting: State-of-the-art, perception precision was 86%, the accuracy of bacterial trends, and future perspectives. Image and Vision colony detection during harvesting was 76%, and the Computing, 129(1):104597, 2023. overall bacterial colony counting accuracy was 93.2% https://doi.org/10.1016/j.imavis.2022.104597 [20]. The results of this model were lower than the [2] Wim Bernasco, Evelien M. Hoeben, Dennis Koelma, comprehensive behaviour of the model in this study. Lasse Suonperä Liebst, Josephine Thomas, Joska The results of this study have significant advantages Appelman, Cees G. M. Snoek, and Marie over existing technology, which may be attributed to the Rosenkrantz Lindegaard. Promise into practice: ability of CNN to handle local features and the modeling Application of computer vision in empirical of global dependencies by Transformer. CNN can research on social distancing. Sociological Methods effectively extract local features of images, while & Research, 52(3):1239-1287, 2023. Transformer captures global dependencies in images https://doi.org/10.1177/00491241221099554 through its self attention mechanism. The combination of [3] N Krishnachaithanya, Gurdit Singh, Smita Sharma, the two enables more accurate counting when dealing Rangisetti Dinesh, Sumeet Ramsingh Sihag, Kamna with complex scenes. However, this fusion also brings Solanki, Abhishek Agarwal, Mrinalini Rana, and certain complexity, such as an increase in the number of Ujjwal Makkar. People counting in public spaces parameters. However, this research model achieved fast using deep learning-based object detection and inference time while maintaining a low number of tracking techniques. 2023 International Conference parameters, indicating a good balance between on Computational Intelligence and Sustainable complexity and efficiency. Engineering Solutions (CISES), 21(1):784-788, 2023. https://doi.org/10.1109/CISES58720.2023.1018350 5 Conclusion 3 Traditional counting relies on manual operation, with [4] Li Zhang, Leilei Yan, Mengqian Zhang, and Jingang low processing power and efficiency, and often requires Lu. T2 CNN: a novel method for crowd counting a lot of manpower and time to identify large-scale data. via two-task convolutional neural network. The However, with the prosperity of Internet technique, Visual Computer, 39(1):73-85, 2023. computer vision technique can effectively solve this https://doi.org/10.1007/s00371-021-02313-0 problem for object detection and counting. CNN and [5] Shashi Bhushan Jha, and Radu F. Babiceanu. Deep Transformer are representative models of deep learning. CNN-based visual defect detection: Survey of The former has good local FE ability, while the latter has current literature. Computers in Industry, a non cyclic structure based on attention mechanism and 148(1):103911, 2023. processes the entire input sequence in parallel. Based on https://doi.org/10.1016/j.compind.2023.103911 this, the study integrated CNN with Transformer to [6] Leong J M, Hijazi M H A, Saudi A, On C K, Fui C F, construct a CNN-Transformer model, and explored its Haviluddin H. The development and usability test performance in target counting through simulation of an automated fish counting system based on training and testing. The results showed that the model CNN and contrast limited histogram equalization. performed well in performance analysis. In testing Bulletin of Electrical Engineering and Informatics, analysis, the counting time and parameter count of the 13(2):1128-1137, 2024. model were significantly lower than other models of the https://doi.org/10.11591/eei.v13i2.5840 same type. However, it still performed well in low [7] Chen G, Shang Y. Transformer for tree counting in feature and high confusion image counting recognition. aerial images. Remote Sensing,14(3):476 2022. Although the research achieved good results, there were https://doi.org/10.3390/rs14030476. still some limitations, such as the lack of clear input- [8] Miao Z, Zhang Y, Peng Y, Peng H, Yin B. DTCC: output mapping in the Transformer model compared to Multi-level dilated convolution with transformer for other models, which increased the difficulty of internal weakly-supervised crowd counting. Computational interpretation. In the future, efforts can be made to Visual Media, 9(4): 859-873, 2023. incorporate interpretable artificial intelligence https://doi.org/10.1007/s41095-022-0313-5 technologies such as attention visualization or salinity 60 Informatica 49 (2025) 49–60 X. He et al. [9] Qianhui Liu, Yan Zhang, and Gongping Yang. Small 2022. unopened cotton boll counting by detection with https://doi.org/10.1016/j.biosystemseng.2022.05.01 MRF-YOLO in the wild. Computers and 1 Electronics in Agriculture, 204(1):107576, 2023. [20] Fengyun Wu, Zhou Yang, Xingkang Mo, Zihao Wu, https://doi.org/10.1016/j.compag.2022.107576 Wei Tang, Jieli Duan, and Xiangjun Zou. Detection [10] Lei Shen, Jinya Su, Runtian He, Lijie Song, Rong and counting of banana bunches by integrating deep Huang, Yulin Fang, Yuyang Song, and Baofeng Su. learning and classic image-processing algorithms. Real-time tracking and counting of grape clusters in Computers and Electronics in Agriculture, the field based on channel pruning with YOLOv5s. 209(1):107827, 2023. Computers and Electronics in Agriculture, https://doi.org/10.1016/j.compag.2023.107827 206(1):107662, 2023. https://doi.org/10.1016/j.compag.2023.107662 [11] Yao Liu, Hongbin Pu, and Da-Wen Sun. Efficient extraction of deep image features using convolutional neural network (CNN) for applications in detecting and analysing complex food matrices. Trends in Food Science & Technology, 113:193-204, 2021. https://doi.org/10.1016/j.tifs.2021.04.042 [12] Jinzhu Lu, Lijuan Tan, and Huanyu Jiang. Review on convolutional neural network (CNN) applied to plant leaf disease classification. Agriculture, 11(8):707, 2021. https://doi.org/10.3390/agriculture11080707 [13] Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. Learning a sparse transformer network for effective image deraining. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 46(1):5896-5905, 2023. https://doi.org/10.48550/arXiv.2303.11950 [14] Guy Farjon, Liu Huijun, and Yael Edan. Deep- learning-based counting methods, datasets, and applications in agriculture: A review. Precision Agriculture, 24(5):1683-1711, 2023. https://doi.org/10.1007/s11119-023-10034-8 [15] Nourhan T.A. Abdelnaiem, Hossam M.A. Fahmy, and Anar A. Hady. DC-PHD: multitarget counting and tracking using binary proximity sensors. International Journal of Wireless and Mobile Computing, 16(1):44-59, 2022. https://doi.org/10.1504/IJWMC.2023.135383 [16] Xin Man, Deqiang Ouyang, Xiangpeng Li, Jingkuan Song, and Jie Shao. Scenario-aware recurrent transformer for goal-directed video captioning. ACM Transactions on Multimedia Computing Communications and Applications, 35(1):11079- 11091, 2022. https://doi.org/10.1145/3503927 [17] Matteo Polsinelli, Luigi Cinque, and Giuseppe Placidi. A light CNN for detecting COVID-19 from CT scans of the chest. Pattern Recognition Letters, 140(1):95-100, 2020. https://doi.org/10.1016/j.patrec.2020.10.001 [18] Diksha Moolchandani, Anshul Kumar, and Smruti R. Sarangi. Accelerating CNN inference on ASICs: A survey. Journal of Systems Architecture, 113(1):101887, 2021. https://doi.org/10.1016/j.sysarc.2020.101887 [19] Lu Zhang, Xinhui Zhou, Beibei Li, Hongxu Zhang, and Qingling Duan. Automatic shrimp counting method using local images and lightweight YOLOv4. Biosystems Engineering, 220(1):39-54, https://doi.org/10.31449/inf.v49i12.7392 Informatica 49 (2025) 61–76 61 Advanced Optimal Cross-Modal Fusion Mechanism for Audio-Video Based Artificial Emotion Recognition Himanshu Kumar*, Martin Aruldoss Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, India. E-mail: himanshukphd20@students.cutn.ac.in, martin@cutn.ac.in2 *Corresponding author Keywords: multimodal fusion, cross-modal fusion, emotion recognition, artificial emotion intelligence, fusion mechanism Received: October 22, 2024 The advanced technology of artificial emotional intelligence has greatly contributed to multimodal emption recognition task. Emotion recognition has played a crucial role in many domains, like communication, e- learning, mental healthcare, contextual awareness, and customer satisfaction. As real-time data continues to expand, addressing the problem of emotion recognition has become critical and complex. A key challenge lies in recognizing emotions from multimodal heterogeneous input sources, aligning extracted features, and developing robust emotion recognition models. In this study, we explore a cross-modal (audio and video modality) fusion mechanism for emotion recognition, effectively addressing the associated feature complexities. We have used 2D-CNN and 3D-CNN deep learning models for audio and video feature extractions and developed robust models for emotion recognition. This study emphasizes the importance of Compact Bilinear Gated Pooling (CBGP) cross-modal fusion mechanism and highlights the contribution of fusing the features from audio and video modalities for emotion recognition. It also discusses the working principle and comparison performance with other peer cross-modal fusion techniques such as FBP and CBP. The performance of advanced cross-modal fusion is compared to baseline traditional cross-modal fusion mechanisms including EF-LSTM, LF-LSTM, Graph-MFN, hybrid fusion and transformer model based fusion mechanisms such as, attention fusion and transformer fusion. This experiment is performed on benchmark datasets CMU-MOSEI and achieves an accuracy of 80.3%, F1-score of 79.2%, and MAE of 54.2%. Povzetek: Predstavljen je napredni mehanizem optimalne fuzije med modalnostmi za umetno prepoznavanje čustev na podlagi avdio-video posnetkov. Študija uporablja 2D- in 3D-CNN za ekstrakcijo značilnosti, poudarja pomen CBGP fuzije in dosega odlične razultate na naboru podatkov CMU-MOSEI. 1 Introduction (FBP) [2], Compact bilinear pooling (CBP) [3], and Compact Bilinear Gated Pooling (CBGP) [4]. Emotion recognition is being successfully used in many Emotion recognition from audio and video modalities are domains and applications. The adoption of this very crucial because audio and video (collection of image technology has grown rapidly in healthcare, e-learning frames) gives a wide range of information regarding, and advertising [1]. Initially, emotion recognition was pitch, tone, image texture, facial movements, and facial limited to with unimodal approaches, but it has now expressions [5]. To train a model it is easy to extract gained more popularity with the advancement of features within the same modality and from another multimodal approaches and enhanced techniques. Its modality. This type of feature extraction leads to training growing demand has expanded the scope for exploring a deep learning model to fine grained emotion various directions of research in emotion recognition. classification tasks [6]. To work with different modalities, Multimodal data inherently contains rich information and the most important and primary step is to extract the has the potential to learn meaningful patterns from features from both the modalities. After preprocessing extracted features. In our study, we intend to achieve and cleaning the features, it is required to align those emotion recognition by combining features extracted features, and combine only those features which have from audio and video modalities and employing a fusion essential information and can help to train a deep learning mechanism. This study explores the cross-modal fusion model [7]. This study uses two different deep learning approach, where the term ‘cross modal fusion’ refers to models, one is 2D-CNN [8] for audio modality and other integrating essential features from heterogeneous input is 3D-CNN [9] for video modality. As per the previous sources, further this integration helps in training deep studies, this study aims to explore the advanced fusion learning models and classifying emotions effectively. mechanism such as Factorized bilinear pooling (FBP), Advanced cross-modal fusion mechanisms are Compact bilinear pooling (CBP), and Compact Bilinear categorized in three types: Factorized bilinear pooling Gated Pooling (CBGP). This study compares the 62 Informatica 49 (2025) 61–76 H. Kumar et al. advanced fusion approaches with state-of-the-art fusion 2 Literature review approaches such as early fusion, late fusion, and hybrid fusion, as well as transformer model based fusion This section offers an overview of the features of audio- techniques such as attention fusion and transformer video modalities, and the existing fusion mechanisms in fusion. multimodal emotion recognition, along with a detailed review. Table 1 summarizes the related work and some The research contribution of the proposed work baseline cross-modal fusion mechanisms, particularly for are as follows: emotion recognition in audio-video modalities using the • Highlights the limitations of traditional fusion CMU-MOSEI dataset. mechanisms, such as high dimensionality, 2.1 Feature extraction suboptimal interdependency modeling, and challenges in fine-grained emotion classification. Before feature extraction, the raw input dataset is pre- processed to ensure it is free from noise, missing values • Addresses a critical gap to reduce the computational and other inconsistencies [10]. Feature extraction is a errors and improve the sustainability of audio-video crucial part of feature engineering in any classification emotion recognition systems. model, which yields critical information from the input • Introduces a novel gating unit and cross-modal data. Feature sets act as input vectors for a deep learning fusion approach using factorized bilinear pooling model, containing all the necessary information about the and compact bilinear pooling, addressing the modalities that help the model learn patterns [11]. This inefficiencies in traditional fusion methods. This section reviews the features and feature sets of audio and solution enhances feature interaction and reduces video modalities utilized in previous research studies. computational complexity. i. Audio features • Employs lightweight 2D-CNN and 3D-CNN To effectively train deep learning models with audio architectures for audio and video modalities, features, feature extraction tools and libraries such as respectively, avoiding the need for pruning and LibROSA [12], OpenSMILE [13], and pyAudioAnalysis quantization while maintaining network simplicity. [14] has proven indispensable. These tools are essential • This design minimizes computational overhead to process and extract the meaningful features, offering a associated with insignificant weights and neurons. robust foundation for building a deep learning model. The Validates the model’s accuracy and compares the process begins with raw audio data undergoing a performance of all three advanced cross-modal preprocessing step. After preprocessing, audio features fusion mechanisms using the benchmark dataset are extracted using these tools and libraries. These CMU-MOSEI. features contain the information about acoustic properties [15] of audio utterances embedded within the video track. • Validates the model’s accuracy and compares the The extracted feature provides crucial information about performance with baseline, and traditional state-of- various feature segments such as pitch, tone, energy, the-art fusion approaches: early fusion, late fusion rhythm, and spectral attributes [16]. These properties and hybrid fusion. capture many useful insights from raw audio data to train • Comprehensive discussion with transformer model the deep learning model, which drives to classify the based fusion approaches: attention fusion and emotional state. Some most widely used extracted key transformer fusion. features include: • The proposed approach ensures scalability and • Mel-Frequency Cepstral Coefficients (MFCCs) [17]: sustainability, contributing to the development of Derived from spectrograms to represent the audio signal more resource-efficient deep learning models for in a form humans perceive. real-world applications. • Spectral features [18]: Attributes such as spectral centroid, roll-off, bandwidth that highlight energy The rest of the paper is organized as follows: section 2 distribution across frequencies. reviews the literature on feature extraction and traditional • Variations in pitch, frequencies, amplitude [19]: fusion mechanism and highlights the related work and Capturing changes in voice that are indicative of different research gap. Section 3 introduces the advanced cross- emotions. modal fusion approaches. Section 4 presents the training • Energy and intensity levels [19]: it represents the changes model and experimental setup, section 5 provides the in signal strength, where low intensity often refers to ‘sad’ result and discussion, and finally, Section 6 concludes the and high intensity correlates with ‘excitement or happy’ paper and suggests future scope. emotions. Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 63 by a deep learning classification model for emotion Video features recognition. Extracting video features is an essential step to train a Late fusion deep learning model for emotion recognition. This To address the limitations of early fusion, another basic process takes multiple sub-steps like extracting frames fusion mechanism, late fusion [37], was introduced. A from the video, setting the frames per second, and significant amount of research has shifted towards this extracting per frame features. After extracting frames, fusion mechanism to develop more robust emotion it is required to preprocess the entire frames as per classification models. In late fusion, each modality is standards for emotion recognition. first pre-processed, analyzed, and fed into a deep neural This preprocessing includes tasks such as frame network model as input. The outputs from these sampling [20], facial feature alignment [21], discarding classification models are then combined at a later stage. irrelevant frames and reducing variations. The advantage of this fusion mechanism lies in its ability to fuse features with low dimensionality and Previous studies have explored two broad approaches accurately classify emotions. to extracting the features from frames: appearance- based features and geometric-based features [22]. Hybrid fusion Appearance-based features: These features describe the Hybrid fusion [38] is hybridization of early and late visual characteristics as features of a picture within a fusion, integrating the feature properties of both fusion specific frame, such as the face, facial expression, principles. It is considered superior to early and hybrid expression textures, sharpness, and facial movements fusion in emotion classification. This fusion is [23]. These features provide pure cues and essential particularly useful for addressing the challenges information for recognizing emotions. associated with the complexity of early and late fusion. Geometric-based features: These features are determined Hybrid fusion can be applied in two phases; first, based on the calculation of facial landmarks, jaw during the initial feature interaction, and second, after movements, eyebrow movements, expression the model has been trained. However, this fusion coordinates, relative positions, distance, arcs, shape technique fails to manage large parameters and angles, texture angles, and other facial action parameters complex features, where extracting and combining [24]. correlation based spatiotemporal feature information and identifying patterns are critical in multimodal These features are extracted using machine learning emotion recognition. Hence, hybrid fusion needs algorithms [25]–[28], traditional feature extraction further improvements to deal with complex multimodal techniques [29]–[32], and currently deep neural network datasets. models [12], [33]–[35]. Python libraries and frameworks are now widely used for feature extraction processes, Attention fusion enabling the development of more robust models for Attention fusion [39] is a mechanism that focuses on emotion recognition. fusing only the most relevant and crucial features after 2.2 Feature fusion mechanism extracting all the features and generating feature maps from multimodal inputs. The advantage of this After extracting features from both the audio and video approach is to excel in handling both inter-modality and modalities, an integration process is required to combine intra-modality interactions effectively. However, a them effectively. This process, known as information major drawback of this fusion mechanism arises when fusion or feature fusion, involves aligning the key features from each modality obtained during the feature extraction feature alignment errors occur in spatiotemporal and fusing them into a unified representation [36]. The datasets or when sequence synchronization is lacking. goal is to synchronize the features of both modalities to Such issues lead to weak attention scores, increasing collaboratively recognize emotions with higher accuracy. data complexity and computational burden [40]. There In this fusion process, the integrated features are first used are two types of attention fusion mechanisms: self- to train a deep learning model. The model is then attention [41] and multi-head attention [13]. Self- validated to ensure its accuracy and reliability in emotion attention fusion sequentially captures interactions recognition. within a single modality, while multi-head attention Early fusion focuses on every aspect of feature representation and captures interactions as output from multiple heads in Early fusion [5] is one of the simplest and most parallel. fundamental mechanisms for multimodal fusion. In this fusion mechanism, features from different modalities are Transformer fusion first aligned and integrated after extraction and then fed Transformer fusion [42] is an advanced approach of into a deep neural network model as input. This method fusion mechanism that leverages pre-trained combines audio and features into a single unifies feature transformer models, which scales well on long vector, by applying the concatenation or elementwise sequencing data due to their ability to perform parallel operations such as addition, multiplication the, processed 64 Informatica 49 (2025) 61–76 H. Kumar et al. computations. This fusion approach is particularly trade-off between audio and video frame intervals, and suitable for text-based emotion recognition tasks and positional embedding segments can lead to a loss of natural language processing (NLP) applications, as it critical information and feature correlations in these processes all token embeddings simultaneously. modalities. Furthermore, the process results in However, transformer fusion is less efficient when imbalanced classification, complex computations, and applied with audio and video modalities together. This high memory usage, making it less ideal for fusing limitation arises from the tokenization-synchronization spatiotemporal features and datasets. Table 1: Summary of audio-video based traditional fusion and other fusion’s related work. Fusion Feature Modality Datasets Remarks extraction model Early fusion LSTM Audio-video CMU- Sensitive to noise and misalignment [43] MOSEI between audio and video signals Late fusion [43] LSTM Audio-video CMU- High computational cost; less effective in MOSEI modeling complex interactions between modalities Hybrid fusion VGG-net Audio-video IIT-R SIER Increased model complexity; risk of [44] overfitting with limited data Multimodal Bayesian Audio-video CMU- Computationally expensive; less scalable Factorization network MOSEI for large datasets Model (MFM) [43] Graph-MFN (G- LSTM Audio-video CMU- Limited scalability MFN) [45] MOSEI Multiplicative LSTM Audio-video IEMOCAP, Prone to overfitting fusion (M3ER) CMU- [46] MOSEI Cross-Attention Attention & Audio-video RAVDESS Requires large amounts of data for fusion [39] concatenation effective attention training; sensitive to missing modality information Transformer Transformer- Audio-video MELD, High memory consumption; needs fusion [42] based pre- IEMOCAP, extensive pretraining and large datasets trained model CMU- MOSEI Multimodal CNN Audio-video AVEC2017 Limited ability to capture temporal fusion [47] relationships; Model level 2-layer LSTM Audio-video RECOLA fine-tuning requires careful parameter fusion [48] tuning. Tensor fusion Three-fold Audio-video CMU- Tensor-based fusion can be network TFN Cartesian MOSEI computationally prohibitive; sensitive to [49] product missing or noisy data. Multimodal Bi-directional Audio-video IEMOCAP, Complex training process; BiGRUs can Dynamic gated recurrent MELD suffer from vanishing gradient problems Fusion Network unit (BiGRU) with long sequences. [50] 2.3 Research gap interdependencies of features, and struggles with fine- Problem: Through a comprehensive review of the literature, grained emotion classification. However, a critical research we have gained crucial insights into audio and video feature gap still needs to be addressed to improve further, extraction, various traditional cross-modal feature fusions specifically to reduce the computational error in traditional (such as early, late hybrid, attention, and transformer fusion mechanisms for audio-video based emotion fusion), and deep learning models, along with their recognition systems and enhance their sustainability. comparative performances on benchmark multimodal datasets. Traditional fusion faces challenges with high Solution: To address this gap, we propose a gating unit, and dimensionality in large datasets, fails to optimize the advanced cross-modal fusion mechanism (factorized Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 65 bilinear pooling and compact bilinear pooling) as an Cross-modal fusion is an effective technique for emotion alternative to traditional methods. This approach employs recognition that involves extracting meaningful and 2D CNN and 3D CNN simple deep neural network essential features from two or more heterogeneous input architectures to avoid pruning and quantizing the mode sources or modalities using feature extraction processes, while managing insignificant weights and neurons. This integrating these features, and subsequently training a solution can optimize the computational efficiency while deep learning model. This technique has contributed to maintaining high performance, contributing to the many applications including emotion recognition and has development of more sustainable and scalable emotion continually evolved, demonstrating its versatility and recognition systems. effectiveness. Notably, cross-modal fusion has been successfully applied in many applications such as object 3 Material and methods detection [51], night pedestrian detection [52] , low light image semantic segmentation [53], and depression In this section, we first describe the cross-modal fusion detection [54]. Cross-modal fusion mechanism intends to mechanism and its architecture. Next, we introduce three develop a joint representation that gathers all the advanced cross-modal fusion mechanisms and its collective essential features from all the modalities and algorithm to enhance audio-video based emotion feeds into a single vector while retaining each modality’s recognition from audio and video modality. Finally, we contributions. discuss the comparative performance of these techniques While traditional cross-modal fusion mechanisms are against state-of-the-art fusion mechanisms. discussed in the literature review section, this section focuses on three advanced cross-modal fusion 3.1 Cross-modal fusion mechanism mechanisms for emotion recognition: Factorized bilinear pooling (FBP), Compact bilinear pooling (CBP), and Compact Bilinear Gated Pooling (CBGP). Figure 1: Basic architecture of Audio-video based cross-modal fusion mechanism 3.2 Factorized bilinear pooling (FBP) 𝑍 = ∑𝑚 𝑖= (𝑀𝑇 1 𝐴) . (𝑁𝑇𝑉) (1) Factorized Bilinear Pooling (FBP) is a method that Where, Z: Pooled feature vector, M and N are bilinear enhances the standard bilinear pooling technique by interaction matrices, A and V are feature vectors from audio factorizing the bilinear interaction tensor into lower-rank and video, respectively. Algorithm 1 illustrates the step-by- approximations [55]. Traditional bilinear pooling involves step factorized bilinear pooling fusion process computing the outer product of two feature vectors from implementation. different modalities, resulting in a high-dimensional feature Training method: Let 𝐴′ represents the Audio and 𝑉′ representation. While this method captures rich interactions represents the Video modality. The feature extraction between the modalities, it is computationally expensive and prone to overfitting due to the large number of parameters. functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video FBP mitigates these issues by factorizing the interaction modality. It generates the feature vectors: tensor into a product of two lower-rank matrices, 𝐹𝐴 = 𝑓𝑎(𝐴′) 𝑎𝑛𝑑 𝐹𝑉 = 𝑓𝑣(𝑉′) (2) significantly reducing the number of parameters while preserving the expressive power of bilinear interactions. 66 Informatica 49 (2025) 61–76 H. Kumar et al. Where 𝐹𝐴 𝑎𝑛𝑑 𝐹𝑉 are the extracted feature vectors from A prediction function f (F’) is then applied to the feature audio and video, 𝐷𝐴 𝑎𝑛𝑑 𝐷𝑉 are dimensionality spaces of vector 𝐹′ to predict the target emotion category value, Z’ so, the audio and video feature spaces. 𝑍′ = 𝑓(𝐹). Here, 𝑓(𝐹) is a 2D-CNN deep neural network acting as a classifier. The model is trained on labelled If the features from audio and video need to be combined, a dataset so it is calculated as follows: fusion mechanism 𝛴 can be used to integrate these feature vectors into a unified representation 𝐹′. It can be calculated (𝐹𝑖𝑦𝑖)𝑁 𝑖=1 (4) Where 𝑦𝑖 is the as: true prediction label and N is the size of the sample. 𝐹′ = Σ(𝐹𝐴, 𝐹𝑉) 𝑜𝑟 𝐹 = 𝐹𝐴 ⨁ 𝐹𝑉 (3) Where, Z: Pooled feature vector, A and V are feature Algorithm 1: Factorized Bilinear Pooling (FBP) vectors from audio and video, respectively. 𝑝𝑟𝑜𝑗𝑎 , and 𝑝𝑟𝑜𝑗𝑣 Input: Factorize audio features: 𝐹𝐴 = 𝑓𝑎(𝐴′) are projection matrix of audio and video features. Factorize video features: 𝐹𝑉 = 𝑓𝑣(𝑉′) Output: Predict the emotion class for new inputs 1. 1. Compute the bilinear interaction between the factorized audio and video features: 𝐹′ = Σ(𝐹𝐴, 𝐹𝑉) 𝑜𝑟 𝐹 = 𝐹𝐴 ⨁ 𝐹𝑉 Training Method: Let 𝐴′ represents the Audio and 𝑉′ represents the Video modality. The feature extraction 2. Feed the compact bilinear pooled vector ZFBP into a deep neural network classifier: (𝐹𝑖𝑦𝑖)𝑁 𝑖=1 functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video modality. It generates the feature vectors: 3. Calculate the loss function, minimize, and 𝐹𝐴 = 𝑓𝑎(𝐴′) 𝑎𝑛𝑑 𝐹𝑉 = 𝑓𝑣(𝑉′) (6) evaluation metrics CBP uses random projections to project the high- 4. 4. Use the trained model to predict the emotion dimensional feature vectors into a lower-dimensional space class 5. for new inputs. before combining them. Random projection for audio (ZA) 6. and video features (ZV): This factorization reduces the computational burden and 𝑍𝐴 = (𝑃𝐴𝐹𝐴) and 𝑍𝑉 = (𝑃𝑉𝐹𝑉) (7) allows the model to generalize better, especially when Where, ZA/V: Projection of audio /video, PA, and Pv: dealing with limited data. FBP has been successfully applied in tasks such as Visual Question Answering (VQA) Projection matrix of audio /video features. To maintain the and image-text matching, where the interaction between information during projection, random sign vectors are modalities is crucial. applied to the projected features. 3.3 Compact bilinear pooling (CBP) 𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 and 𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉 (8) Compact Bilinear Pooling (CBP) further refines the bilinear SA and SV are random sign vectors for audio and video pooling approach by employing compact representations of features, ° 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑤𝑖𝑠𝑒 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛. Then the bilinear interactions. Unlike standard bilinear pooling, we applied the random permutation to the elements of the which directly computes the outer product of two feature signed vectors to further scramble the features. vectors, CBP leverages approximations based on the Tensor Sketch technique to produce a compact representation of the 𝑍𝐴" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝐴, ℎ𝐴), and 𝑍𝑉" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝑉 , ℎ𝑉) (9) outer product. This method dramatically reduces the where, ℎ𝐴, ℎ𝑉 is a permutation vector applied to the indices dimensionality of the resulting feature vector without losing of Z’A and Z’V. the key interactions between modalities. Algorithm 2 illustrates the compact bilinear pooling fusion process The core of CBP involves computing the circular implementation. convolution of the two permuted feature vectors: In CBP, the outer product of the feature vectors A and V is 𝑍𝐶𝐵𝑃 = 𝐹𝐹𝑇−1 (𝐹𝐹𝑇(𝑍𝐴)) ° (𝐹𝐹𝑇(𝑍𝑉)) (10) approximated by projecting both vectors into a higher- dimensional space using random projections, followed by 𝐹𝐹𝑇−1 ∶ 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 𝑓𝑎𝑠𝑡 𝑓𝑜𝑢𝑟𝑖𝑒𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚, and element-wise multiplication and summation. Presented equation represents how to implement CBP for audio-video 𝐹𝐹𝑇: 𝑓𝑎𝑠𝑡 𝑓𝑜𝑢𝑟𝑖𝑒𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 emotion recognition using a deep neural network: After this we normalized the obtained CBP feature vector 𝑍 = ∑𝑚 𝑖=1(𝑝𝑟𝑜𝑗𝑎(𝐴)𝑖) . (𝑝𝑟𝑜𝑗𝑣(𝑉𝑖)) (5) and classified the categories of emotions by using deep neural networks. Calculated with the following formula: Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 67 𝑍′𝐶𝐵𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃) (11) gating mechanism that adjusts or selectively emphasizes features based on their relevance, using a learned Softmax where 𝑍′𝐶𝐵𝑃 predicts the emotion class, and function to modulate feature interactions before pooling. 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃) represents the function of the deep neural network. In CBGP, the feature vectors A and V undergo compact bilinear pooling as described in CBP, but before the final Algorithm 2: Compact Bilinear Pooling (CBP) summation, the resulting interaction vector is element-wise multiplied by a gating vector 𝐺′ ∈ 𝑅𝑑 Where, d is the Input: Project audio features, 𝑍𝐴 𝑎𝑛𝑑 dimensionality of the compact representation. The gating Project video features, 𝑍𝑉 vector is computed as: Output: Predict the emotion class for new inputs 𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺) (12) 1. Generate projection matrix, 𝑍𝐴′ Where 𝜎: 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛, 𝑊𝐺 : weight matrix, 𝑏𝐺: bias 2. Apply sign vectors to the projected audio features: vector, 𝐴′, 𝑉′: audio, video feature vectors. 𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 Training Method: Let 𝐴′ represents the Audio and 𝑉′ 3. Apply sign vectors to the projected video features: represents the Video modality. The feature extraction 𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉 functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video 4. Apply permutation to the audio features: modality. It generates the feature vectors: 𝑍𝐴" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝐴, ℎ𝐴) 𝐹𝐴 = 𝑓𝑎(𝐴′) and 𝐹𝑣 = 𝑓𝑣(𝑉′) (13) 5. Apply permutation to the video features: 𝑍𝑉" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′ 𝑉 , ℎ𝑉) CBP uses random projections to project the high- 6. Compute the circular convolution of the two dimensional feature vectors into a lower-dimensional space permuted feature before combining them and calculates the random vectors: 𝑍𝐶𝐵𝑃 = 𝐹𝐹𝑇−1 (𝐹𝐹𝑇(𝑍𝐴)) ° (𝐹𝐹𝑇(𝑍𝑉)) projection for audio (𝑍𝐴) and video features (𝑍𝑉): 𝑍𝐴 = 𝑃𝐴𝐹𝐴 𝑎𝑛𝑑 𝑍𝑉 = 𝑃𝑉𝐹𝑉 (14) 7. Feed the compact bilinear pooled vector ZCBP Where ZA 𝑎𝑛𝑑 𝑍𝑉: Projection of audio and video, PA, and into a deep neural network classifier: Pv: Projection matrix of audio and video features. Then, we 𝑍′𝐶𝐵𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃) compute element-wise multiplication of the projected 8. Calculate the loss function, minimize, and vectors: evaluation metrics 𝑍′ = 𝑍𝐴 ∘ 𝑍𝑉 (15) 9. Use the trained model to predict the emotion class for new inputs. Gated pooling: (i) compute the introduced gating vector 𝐺′ ∈ 𝑅𝑑 Where, d is the dimensionality of the compact representation. The gating vector is computed as: 3.4 Compact bilinear gated pooling (CBGP) 𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺) (16) Compact Bilinear Gated Pooling (CBGP) enhances and builds upon Compact Bilinear Pooling (CBP) by adding a Here, 𝜎 is a Softmax function. 𝑍 = 𝑆𝑢𝑚(𝑍′′) (18) (ii) then we apply the gating mechanism to the element-wise This entire mechanism can be summarized by an equation, multiply vector: Where, Z: pooled feature vector, ZA/𝑎𝑛𝑑 𝑍𝑉: Projection of audio and video. 𝑍′′ = 𝐺′ ∘ 𝑍′ (17) 𝑍 = ∑𝑚 𝑖=1 (𝜎(𝑊𝐺 (𝐴, 𝑉) + 𝑏𝐺))𝑖 . (𝑍𝐴)𝑖) . (𝑍𝑉(𝑉𝑖)) (19) Finally, we sum the elements of the gated interaction vector to obtain the final pooled vector by the following equation- Algorithm 3: Compact Bilinear Gated Pooling (CBGP) Input: Project audio features: ZA, and Project video features: 𝑍𝑉 Output: Predict the emotion class for new inputs. 1. Compute gating vectors for audio and video features: 𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺) 68 Informatica 49 (2025) 61–76 H. Kumar et al. 2. Apply the gating vectors to the projected features: 𝑍′ = 𝑍𝐴 ∘ 𝑍𝑉 3. Apply sign vectors to the gated audio features: 𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 4. Apply sign vectors to the gated video features: 𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉 5. Apply permutation to the gated and signed audio features: 𝑍𝐶𝐵𝐺𝑃 = 𝐹𝐹𝑇−1 (𝑃(𝐹𝐹𝑇(𝑍𝐴 ∙ 𝐺))) 6. Apply permutation to the gated and signed video features: 𝑍𝐶𝐵𝐺𝑃 = 𝐹𝐹𝑇−1 (𝑃(𝐹𝐹𝑇(𝑍𝑉 ∙ 𝐺))) 7. Compute the circular convolution of the two permuted feature vectors: 𝑍′′ = 𝐺′ ∘ 𝑍′ 8. Normalize the pooled feature vector: 𝑍 = 𝑆𝑢𝑚(𝑍′′) 9. Feed the compact bilinear gated pooled vector ZCBGP into a deep neural network classifier: 𝑚 𝑍𝐶𝐵𝐺𝑃 = ∑ (𝜎(𝑊𝐺 (𝐴, 𝑉) + 𝑏𝐺))𝑖 . (𝑍𝐴)𝑖) . (𝑍𝑉(𝑉𝑖)) 𝑖=1 10. Calculate the loss function, minimize, and evaluation metrics 11. Use the trained model to predict the emotion class for new inputs Through this mathematical analysis, CBGP has been able 4.1 Evaluation dataset to identify the optimal fusion approaches that can be applied to audio-video-based emotion recognition systems, CMU-MOSEI [37]: CMU-MOSEI dataset comprises ultimately contributing to the development of more robust over 23,259 annotated video clips collected from more and accurate emotion recognition technologies. The gating than 1,000 speakers across a diverse range of topics. Total number of videos is 3228, video clips contain mechanism allows to control the flow of information naturally occurring monologues in English, making the between the layers while selecting and rejecting the dataset a realistic representation of human relevant or non-relevant (based on correlation feature communication. The dataset is annotated with six score) inputs. As we know, not all the features are equally categorical emotions: happy, sad, angry, fear, disgusted, important at every step or time frame, so the gating and surprised. Additionally, CMU-MOSEI provides mechanisms dynamically assign weights to features to intensity scores for each emotion, allowing for a fine- capture complex regions more effectively. grained analysis of emotional expressions. After preprocessing, 20,323 samples are processed for feature 4 Model training and experiments extraction. The dataset is divided into three sets; 80% for training, 10% for testing, and 10% for validation. The Our experiments are conducted on a system equipped with performance is evaluated on Accuracy, F1-score, and an AMD Ryzen 7 processor, 16GB of RAM, and an mean absolute error, (MAE). NVIDIA GeForce RTX GPU. The code was implemented using Jupyter Notebook IDE and the PyTorch framework. 4.2 Deep learning model implementation For audio and video preprocessing, we utilized the Librosa details and OpenCV Python libraries. a. 2D-CNN for Audio feature extraction and training model Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 69 We used 2D-cnn to extract and capture inter-modal We used a simple 3D-CNN model because emotion feature dependencies from the CMU-MOSEI dataset. To recognition requires synchronized feature relations in generate spectrograms from raw audio files, we used the each frame of a video, and a compact bilinear gated LibROSA library, which converts the raw audio fusion mechanism can increase computational waveform into a time series sampled at 22500 Hz. The complexity. Additionally, our proposed approach aims to waveform is then transformed into a spectrogram using extract spatial and temporal features and incorporates a the Short-Time Fourier Transform (STFT), with a gated filter to fuse features from the audio and video window size of 2048 and a hop length of 512, striking a modalities for each utterance. Therefore, we chose a balance between time and frequency resolution. simple deep learning architecture. The 3D-CNN takes a Spectrograms play a crucial role in audio-video emotion 224x224x3 image as input, which passes through the recognition as they align with video frames, increasing first 3D convolution layer followed by pooling layers, the likelihood of feature correlations due to time and with a filter size of 3x3x3 and a stride of 1. Table 2 frequency samples during fusion mechanism. illustrates the Hyperparameters for 2D-CNN and 3D- CNN model. b. 3D-CNN for video feature extraction and training model Table 2. Hyperparameters for 2D-CNN and 3D-CNN model Hyperparameter (2D-CNN) Audio Hyperparameter (3D-CNN) Video Input size= 224x224 Spectrogram Input size=224x224x3 image frames Kernels (conv layers) =32,64,128,256 Kernels (conv layers) = 64,128,256,512 Stride=1 Stride=1 Activation function= Relu and Softmax Activation function= Relu and Softmax Max Pooling= 2x2 Max Pooling= 3x3x3, 2x2x2 Batch size=32 Batch size=32 Epochs= 30 Epochs= 30-50 Learning rate=0.00003 (cosine decay) Learning rate=0.00003 (cosine decay) Regularization=L2 Regularization= L2 Dropout= 0.3% Dropout=0.2% Optimizer = Adam Optimizer = Adam 5 Result and discussion mechanisms such as bilinear gated pooling, compact bilinear pooling, and compact bilinear gated pooling. We We evaluate the performance of each cross-modal fusion analyse the accuracy of each traditional fusion mechanism (FBP, CBP,CBGP) and compare it with the mechanism such as early fusion, late fusion and hybrid state-of-the-art (early fusion, late fusion and hybrid fusion on the same dataset, CMU-MOSEI. This approach employs 2D CNN and 3D CNN simple deep fusion) mechanisms on the CMU-MOSEI dataset using neural network architectures to avoid pruning and accuracy, F1-score, and MAE. F1-Score is the harmonic quantizing the mode while managing insignificant mean of precision and recall metrics. The results are weights and neurons. The ablation study was carried out summarized in the tables below, highlighting the with a feature extraction process where features are contributions of each fusion method to the overall audio and video modalities that interact through the outer system performance product. The outer product allows the 2D-CNN and 3D- CNN to capture the interactions between every feature 5.1 Ablation study of one modality and every feature of the other modality in a compact manner. Comprehensive analysis and To investigate the specific contributions of compact baseline comparisons show that our proposed CBGP bilinear gated pooling fusion (CBGP) of cross-modal fusion mechanism fuses feature effectively and fusion mechanism, this paper presents a detailed analysis outperforms the state-of-the-art fusion approaches. This of a series of ablation experiments conducted on the study also provides a comprehensive discussion about CMU-MOSEI datasets. These results are presented in transformer model based fusion approaches- attention tables, comparing key performance using accuracy, F1- fusion and transformer fusion. score, and MAE among advanced cross-modal fusion 70 Informatica 49 (2025) 61–76 H. Kumar et al. 5.2 Baseline comparisons a. Comparison of advanced cross-modal fusion mechanism with state-of-the-art FBP, and CBP fusion mechanism. Table 3: Performance comparison of advanced cross-modal fusion mechanisms on CMU- MOSEI dataset, highlighting their accuracy, F1-score, MAE, and specific strengths. Cross-modal Accuracy F1-Score MAE Remarks fusion mechanism (%) (%) 76.9 75.6 59.1 Performs well with sentiment-emotion FBP overlap 78.4 77.1 59.8 CBP Captures diverse emotions effectively CBGP 80.3 79.2 54.2 Best for fine-grained emotion detection Table 3 illustrates that CBGP achieves the highest importance of different feature interactions allows it to scores, particularly excelling in recognizing fine- handle the nuanced and varied expressions found in the grained emotions. Its ability to dynamically adjust the illustrations in the CMU-MOSEI dataset. b. Comparison of advanced cross-modal fusion mechanism with baseline cross-modal fusion mechanism Table 4: Performance comparison of advanced cross-modal fusion mechanism with traditional, and baseline cross- modal fusion mechanism on CMU-MOSEI dataset, highlighting their accuracy, F1-score, and MAE. Fusion Mechanism Accuracy (%) F1-score (%) MAE (%) Early fusion 78.2 77.9 64.2 (EF-LSTM) [43] Late fusion 80.6 80.6 61.9 (LF-LSTM) [43] Graph-MFN [45] 76.9 77.0 - HFU-BERT model [56] 73.2 72.0 86.7 Early Fusion 67.3 65.4 69.7 2D-CNN (Ours) Late Fusion 70.4 69.2 67.4 2D-CNN (Ours) Hybrid Fusion 72.6 71.4 65.8 2D-CNN (Ours) 76.9 59.1 FBP (Ours) 75.6 78.4 59.8 CBP (Ours) 77.1 CBGP (Ours) 81.3 79.2 54.2 Table 4, illustrates that FBP performs well in scenarios emotions. CBGP achieves the highest performance over involving sentiment-emotion overlap. CBP further traditional cross-modal fusion mechanisms due to improves by effectively capturing a diverse range of limited feature interaction and correlation. CBGP excels in fine-grained emotion recognition and setting a benchmark on CMU-MOSEI dataset. Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 71 Figure 2: Accuracy performance of FBP, CBP, and CBGP fusion approaches on CMU-MOSEI Figure 2 illustrates that in the CMU-MOSEI dataset The progression from FBP to CBP, and from CBP to emotion categories, CBP consistently outperforms FBP. CBGP, emphasizes the strength and effectiveness of the The accuracy of 'Happy' emotion recognition increases fusion model in capturing emotional feature cues. This from 76% (FBP) to 78% (CBP), and 'Sad' improves from fusion leads to meaningful results that help classify 70% to 72.5%. CBGP provides higher accuracy than all emotion categories more accurately. other fusion mechanisms across all emotion categories. c. System complexity analysis Table 6: Performance comparison of accuracy and p-value for cross-modal fusion mechanism. Table 5: Computational costs comparison (in floating point operations) for FBP, CBP, and CBGP Cross-modal fusion Accuracy p-value approaches across CMU-MOSEI Datasets. mechanism (%) Datasets FBP CBP CBGP FBP 76.9 0.004 CMU- 4.5 × 106 3.8 × 106 4.0 × 106 CBP 78.4 0.003 MOSEI CBGP 80.3 0.002 Table 5 presents the computational cost comparison, and highlights the relative efficiency of the FBP, CBP, and CBGP approaches on the CMU-MOSEI dataset. Despite Table 6 presents the accuracy and p-value of Full Bilinear the apparent efficiency of CBP, the marginal difference Pooling (FBP), Compact Bilinear Pooling (CBP), and in computational costs, particularly the 0.2 × 106 FLOP Compact Bilinear Gated Pooling (CBGP), where FBP gap between CBP and CBGP, raises questions about the achieves the lowest accuracy of 76.9%. CBP improves trade-offs in performance. Lower computational costs accuracy to 78.4% by introducing compact bilinear may come at the expense of reduced accuracy or pooling. CBGP achieves the highest accuracy of 80.3% by robustness in multimodal emotion recognition tasks. The incorporating the gating mechanism, which selectively slight increase in CBGP’s computational load may emphasizes relevant features. The p-value decreases across reflect the additional overhead required to manage bi- the methods, indicating improved statistical significance modal interactions and graph-based modeling, with increasing accuracy. The values (0.004 for FBP, 0.003 potentially leading to enhanced performance and for CBP, and 0.002 for CBGP) demonstrate that the interpretability. performance improvements are statistically significant. 72 Informatica 49 (2025) 61–76 H. Kumar et al. d. Comparison of CBGP fusion mechanism with Traditional fusion: Traditional fusion typically attention fusion and transformer fusion concatenate or aggregate features from multiple modalities, which can result in linear combinations of Transformer fusion: Transformer fusion is an advanced features, whereas attention and transformer fusion approach of fusion mechanism with the help of a pre- trained transformer model, which scales well to large enhance inter-modality interactions by learning feature datasets and long sequences due to parallel computations. weights, but they still rely on additive or multiplicative This fusion is suitable for text-based emotion recognition relationships between modalities. They often struggle tasks and natural language processing-based (NLP) with complex feature interactions and fail to capture applications because transformer fusion model such as higher-order dependencies effectively. BERT [57], RoBERTa [40] performs on all token embeddings parallelly which is not efficient to work with Advanced fusion: Factorized bilinear and compact audio and video modalities together. Audio and video have bilinear pooling can capture non-linear and higher-order large interdependencies of features and long sequences, as interactions between features across modalities, which a result, the computational cost will be very high, training allows richer representations. These methods compress and testing will need more memory and computational the high-dimensional feature space into a lower- burdens. Transformer fusion will also face challenges to dimensional representation while preserving inter-modal extract, fuse and learn complex spatiotemporal features relationships, addressing the curse of dimensionality in without architectural modifications in the model. traditional bilinear pooling. Transformer fusion works by dividing the word sequences into tokens, which is feasible but if we divide long audio Computational efficiency signals and high frame rate videos can lead to loss of important features, fine-grained temporal information, Traditional fusion: Simple concatenation or weighted tokenization can reduce the effectiveness and increase the aggregation methods are computationally inexpensive but biases in SoftMax function. may lead to redundant or over-complex representations. Transformer-based fusion, although effective, can be Attention fusion: In our proposed work, we opted for computationally expensive due to quadratic complexity in CBGP over attention fusion to reduce the computational multi-head attention over long sequences or large cost because the CMU-MOSEI dataset is largest dataset, modalities. and our proposed solution uses 2D-CNN for audio and 3D-CNN for video modalities to avoid pruning and Advanced fusion: Compact bilinear pooling and gated quantizing the mode while managing insignificant pooling introduce compact representations by leveraging weights and neurons. If we apply an attention fusion approximations (e.g., Random Fourier Transform or Count mechanism, we would need to apply self-attention fusion Sketch). These methods significantly reduce computational separately for both models and then integrate their outputs using multi-head attention fusion. This entire and memory overhead compared to traditional bilinear process would likely result in high dimensionality and an pooling without losing important interaction features. increased number of trainable parameters, leading to Dimensionality reduction high memory usage and expensive computation. Attention mechanism relies on element-wise scale dot Traditional fusion: These methods often rely on post- products, which may cause high variance during training fusion dimensionality reduction techniques (e.g., PCA) to Since our implementation employs a simpler CNN manage high-dimensional outputs. However, these architecture, in that case the model could predict approaches are not integrated into the fusion process, unbalanced attention scores. The extreme parameters potentially leading to loss of modality-specific could further cause exponential computation issues, as information. unbalanced attention implies that the model may focus excessively on some regions while ignoring others. In Advanced fusion: Methods like compact bilinear and gated conclusion, while attention fusion is an effective fusion pooling perform dimensionality reduction implicitly mechanism, it is not a suitable fit for our employed deep during fusion, ensuring that only the most relevant and learning emotion recognition model that’s why we have informative interactions are preserved. excluded it from the experiment. It may perform better with architectures such as fit well in ResNet [12], Modality-specific challenges DenseNet [58], MobileNet [59], and other transformer- Traditional Fusion: Early and late fusion assume based models, where its capabilities can be better utilized. modalities contribute equally, potentially underperforming in scenarios where modalities have asymmetric importance 5.3 Why CBGP outperforms better? or varying quality. Transformers address some modality- specific issues but may fail in noisy or sparse input Representation capacity scenarios without sufficient modality-specific pretraining. Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 73 Advanced fusion: Compact bilinear and gated pooling are and work effectively on smaller datasets due to efficient robust to modality-specific variations. For example: Gated feature compression. Factorized approaches reduce pooling introduces selective weighting mechanisms that overfitting by limiting parameter count, improving dynamically prioritize certain modalities or features based scalability to complex multi-modal systems. on their relevance. Factorized pooling ensures that noisy or less-relevant features are naturally down-weighted during Interpretability fusion. Traditional fusion: Approaches like attention fusion or Generalization and scalability transformer-based fusion are somewhat interpretable due to explicit weighting schemes or attention maps. Traditional fusion: Simple techniques like early and late However, early and hybrid fusion methods lack fusion can generalize well but may not scale effectively to interpretability since features are often combined in a high-dimensional, multimodal, or diverse datasets. black-box manner. Transformer-based fusion can scale better but may require large datasets and pretraining to perform effectively. Advanced fusion: Compact bilinear pooling and gated pooling methods often lack explicit interpretability Advanced fusion: Advanced techniques like compact because the transformations (e.g., random projections, bilinear pooling generalize well to high-dimensional data Fourier transforms) are more abstract. Table 7: Comparison of FBP, CBP and CBGP based on various parameters. Cross- Feature Feature map Computation Advantage Limitation Modal interaction dimensionality cost Fusion level FBP Element-wise Reduced Low Efficient Introduces small product 𝑘 ≪ 𝑑2 approximation of approximation errors. bilinear pooling CBP Tensor Compact Medium Balances efficiency It does not capture the full sketching 𝑘 ≪ 𝑑2 and expressiveness bilinear interactions CBGP Selective Compact Medium Best for fine- Require extensive Second order 𝑘 ≪ 𝑑2 grained hyperparameter tuning. interaction classification, emphasizes key features Table 7 discusses the performance of FBP, CBP and significantly in audio-video-text based real-time CBGP based on various parameters such as, feature applications as well. CBGP is a computationally interaction level, feature map dimensionality, effective and robust fusion mechanism, making it crucial computational cost, advantage and limitation various to capture high correlation and relevant features for parameters. Here, 𝑑2 represents the input feature fusing heterogeneous modalities. Here are some real- dimensionality, and 𝑘 is the dimensionality of the output time applications where CBGP can be applied in, representation in bilinear pooling. In CBP and CBGP, the Computer vision and pattern recognition, Natural value of 𝑘 is important as it directly affects the trade-off language Process based language interactions, between computational efficiency and model Recommendation systems for customer, Healthcare and expressiveness. If 𝑘 is lower than small memory needed but model may lose some effectiveness. Conversely, if 𝑘 medical applications, Robotics and automation system, is higher, the model acts more expressively but the Banking and E-commerce based digital applications, computational cost increases. Security and surveillance based human safety application. 5.4 Real-time application As we have seen in the above sections, CBGP has proven 6 Conclusion & future scope to be an effective fusion mechanism over traditional This study investigates the effectiveness of three fusion mechanisms. This comprehensive study has advanced cross model fusion mechanisms; factorized demonstrated its full capability as cross-modal based bilinear pooling, compact bilinear pooling, and compact emotion recognition. In real-time application, CBGP can bilinear gated pooling for audio-video based emotion extend beyond audio and video fusion. It can contribute recognition. This comprehensive experiment is 74 Informatica 49 (2025) 61–76 H. Kumar et al. conducted on a widely recognized dataset; CMU- experimental results clearly demonstrate that the MOSEI. The gating mechanism integrated within CBGP compact bilinear gated pooling (CBGP) mechanism enables the model to selectively emphasize relevant outperforms the other fusion techniques across feature interactions, which is crucial for accurately benchmark dataset, consistently achieving higher recognizing complex and nuanced emotional accuracy, F1-score, and MAE. Overall, the findings from expressions. We evaluated the performance of each this study suggest that incorporating a gating mechanism fusion technique across various emotional categories, in multimodal fusion processes can significantly including happy, sad, fear, anger, neutral and disgust. enhance the performance of emotion recognition The performance of advanced cross-modal fusion is systems, making CBGP a promising approach for future compared to traditional cross-modal fusion mechanisms developments in this field. like early fusion, late fusion and hybrid fusion and transformer model based fusion mechanisms like, attention fusion and transformer fusion. The References [1] O. El Hammoumi, F. Benmarrakchi, N. [9] E. S. Salama, R. A. El-Khoribi, M. E. Shoman, and Ouherrou, J. El Kafi, and A. El Hore, “Emotion M. A. Wahby Shalaby, “A 3D-convolutional neural Recognition in E-learning Systems,” 6th Int. network framework with ensemble learning Conf. Multimed. Comput. Syst., pp. 1–6, 2018. techniques for multi-modal emotion recognition,” [2] Y. Zhang, Z. R. Wang, and J. Du, “Deep Fusion: Egypt. Informatics J., vol. 22, no. 2, pp. 167–176, An Attention Guided Factorized Bilinear 2021, doi: 10.1016/j.eij.2020.07.005. Pooling for Audio-video Emotion Recognition,” [10] M. M. Hassan, M. G. R. Alam, M. Z. Uddin, S. in International Joint Conference on Neural Huda, A. Almogren, and G. Fortino, “Human Networks (IJCNN), IEEE, 2019. doi: emotion recognition using deep belief network 10.1109/IJCNN.2019.8851942. architecture,” Inf. Fusion, vol. 51, no. October 2018, [3] Y. Li, X. Zheng, M. Zhu, J. Mei, Z. Chen, and pp. 10–18, 2019, doi: 10.1016/j.inffus.2018.10.009. Y. Tao, “Compact bilinear pooling and multi- [11] L. Wang and J. Qiao, “Research and Application of loss network for social media multimodal Deep Belief Network Based on Local Binary classification,” Signal, Image Video Process., Pattern and Improved Weight Initialization,” in 3rd vol. 18, no. 11, pp. 8403–8412, 2024, doi: International Symposium on Autonomous Systems, 10.1007/s11760-024-03482-w. ISAS 2019, IEEE, 2019, pp. 1–6. doi: [4] D. Kiela, E. Grave, A. Joulin, and T. Mikolov, 10.1109/ISASS.2019.8757780. “Efficient large-scale multi-modal [12] K. L. Lakshmi et al., “Recognition of emotions in classification,” 32nd AAAI Conf. Artif. Intell. speech using deep CNN and RESNET,” in Soft AAAI 2018, pp. 5198–5204, 2018, doi: Computing, Springer Berlin Heidelberg, 2023. doi: 10.1609/aaai.v32i1.11945. 10.1007/s00500-023-07969-5. [5] W. A. Khan, H. ul Qudous, and A. A. Farhan, [13] N. H. Ho, H. J. Yang, S. H. Kim, and G. Lee, “Speech emotion recognition using feature “Multimodal Approach of Speech Emotion fusion: a hybrid approach to deep learning,” Recognition Using Multi-Level Multi-Head Fusion Multimed. Tools Appl., vol. 83, no. 31, pp. Attention-Based Recurrent Neural Network,” in 75557–75584, 2024, doi: 10.1007/s11042-024- IEEE Access, IEEE, 2020, pp. 61672–61686. doi: 18316-7. 10.1109/ACCESS.2020.2984368. [6] C. Yu, X. Zhao, Q. Zheng, P. Zhang, and X. [14] M. Sharafi, M. Yazdchi, R. Rasti, and F. Nasimi, You, “Hierarchical Bilinear Pooling for Fine- “A novel spatio-temporal convolutional neural Grained Visual Recognition,” Lect. Notes framework for multimodal emotion recognition,” Comput. Sci. (including Subser. Lect. Notes Biomed. Signal Process. Control, vol. 78, no. Artif. Intell. Lect. Notes Bioinformatics), vol. June, p. 103970, 2022, doi: 11220 LNCS, pp. 595–610, 2018, doi: 10.1016/j.bspc.2022.103970. 10.1007/978-3-030-01270-0_35. [15] G. Sharma, K. Umapathy, and S. Krishnan, [7] X. Peng, “Research on emotion recognition “Trends in audio signal feature extraction based on deep learning for mental health,” methods,” Appl. Acoust., vol. 158, p. 20, 2020, Inform., vol. 45, no. 1, pp. 127–132, 2021, doi: doi: 10.1016/j.apacoust.2019.107020. 10.31449/inf.v45i1.3424. [16] F. M. Alamgir and M. S. Alam, “Hybrid multi- [8] B. Mocanu, R. Tapu, and T. Zaharia, “Multimodal modal emotion recognition framework based on emotion recognition using cross modal audio-video InceptionV3DenseNet,” Multimed. Tools Appl., fusion with attention and deep metric learning,” vol. 82, no. 26, pp. 40375–40402, 2023, doi: Image Vis. Comput., vol. 133, p. 104676, 2023, doi: 10.1007/s11042-023-15066-w. 10.1016/j.imavis.2023.104676. [17] S. K. Panda, A. K. Jena, M. R. Panda, and S. Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 75 Panda, “Speech emotion recognition using emotion recognition: Enhancing film sound multimodal feature fusion with machine design using audio features, regression models learning approach,” Multimed. Tools Appl., vol. and artificial neural networks,” Pers. Ubiquitous 82, no. 27, pp. 42763–42781, 2023, doi: Comput., vol. 25, no. 4, pp. 637–650, 2021, doi: 10.1007/s11042-023-15275-3. 10.1007/s00779-020-01389-0. [18] S. W. Byun and S. P. Lee, “A study on a speech [29] V. K. Sharma, “Designing of face recognition emotion recognition system with effective system,” Int. Conf. Intell. Comput. Control Syst. acoustic features using deep learning ICICCS 2019, pp. 459–461, 2019, doi: algorithms,” Appl. Sci., vol. 11, no. 4, pp. 1–15, 10.1109/ICCS45141.2019.9065373. 2021, doi: 10.3390/app11041890. [30] S. Sahoo and A. Routray, “Emotion recognition [19] H. Aouani and Y. Ben Ayed, “Speech Emotion from audio-visual data using rule based decision Recognition with deep learning,” in 24th level fusion,” 2016 IEEE Students’ Technol. International Conference on Knowledge-Based Symp. TechSym 2016, pp. 7–12, 2017, doi: and Intelligent Information & Engineering 10.1109/TechSym.2016.7872646. Speech Emotion Recognition with deep learning [31] J. K. J. Julina and T. S. Sharmila, “Facial Systems, Elsevier B.V., 2020, pp. 251–260. doi: Emotion Recognition in Videos using HOG and 10.1016/j.procs.2020.08.027. LBP,” in 2019 4th IEEE International [20] E. S. Agung, A. P. Rifai, and T. Wijayanto, Conference on Recent Trends on Electronics, “Image-based facial emotion recognition using Information, Communication and Technology, convolutional neural network on emognition RTEICT 2019 - Proceedings, IEEE, 2019, pp. dataset,” Sci. Rep., vol. 14, no. 1, pp. 1–22, 56–60. doi: 2024, doi: 10.1038/s41598-024-65276-x. 10.1109/RTEICT46194.2019.9016766. [21] Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, and A. [32] A. Vinay, V. S. Shekhar, K. N. B. Murthy, and Košir, “Audio-Visual Emotion S. Natarajan, “Face Recognition Using Gabor Fusion(AVEF):A Deep Efficient Weighted Wavelet Features with PCA and KPCA - A Approach,” Inf. Fusion, vol. 46, pp. 184–192, Comparative Study,” in 3rd International 2019, doi: 10.1016/j.inffus.2018.06.003. Conference on Recent Trends in Computing [22] D. Ghimire, J. Lee, Z. N. Li, and S. Jeong, 2015 (ICRTC-2015), Elsevier Masson SAS, “Recognition of facial expressions based on 2015, pp. 650–659. doi: salient geometric features and support vector 10.1016/j.procs.2015.07.434. machines,” Multimed. Tools Appl., vol. 76, no. [33] S. Kakuba, A. Poulose, and D. S. Han, “Deep 6, pp. 7921–7946, 2017, doi: 10.1007/s11042- Learning Approaches for Bimodal Speech 016-3428-9. Emotion Recognition: Advancements, [23] X. Yan, “A Face Recognition Method for Sports Challenges, and a Multi-Learning Model,” IEEE Video Based on Feature Fusion and Residual Access, vol. 11, pp. 113769–113789, 2023, doi: Recurrent Neural Network,” Inform., vol. 48, 10.1109/ACCESS.2023.3325037. no. 12, pp. 137–152, 2024, doi: [34] X. Lu, “Deep Learning Based Emotion 10.31449/inf.v48i12.5968. Recognition and Visualization of Figural [24] S. R. Sanku and B. Sandhya, “Multi-Modal Representation,” Front. Psychol., vol. 12, no. Emotion Recognition Feature Extraction and January, pp. 1–12, 2022, doi: Data Fusion Methods Evaluation,” Int. J. Innov. 10.3389/fpsyg.2021.818833. Technol. Explor. Eng., vol. 3075, no. 10, pp. 18– [35] M. Zielonka, A. Piastowski, A. Czyżewski, P. 27, 2024, doi: 10.35940/ijitee.J9968.13100924. Nadachowski, M. Operlejn, and K. Kaczor, [25] T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Recognition of Emotions in Speech Using “Multimodal Machine Learning: A Survey and Convolutional Neural Networks on Different Taxonomy,” IEEE Trans. Pattern Anal. Mach. Datasets,” Electron., vol. 11, no. 22, 2022, doi: Intell., vol. 41, no. 2, pp. 423–443, 2019, doi: 10.3390/electronics11223831. 10.1109/TPAMI.2018.2798607. [36] K. Zhang, Y. Li, J. Wang, Z. Wang, and X. Li, [26] E. Ivanova and G. Borzunov, “Optimization of “Feature fusion for multimodal emotion machine learning algorithm of emotion recognition based on deep canonical correlation recognition in terms of human facial analysis,” IEEE Signal Process. Lett., vol. 28, expressions,” Procedia Comput. Sci., vol. 169, no. September 2022, pp. 1898–1902, 2021, doi: no. 2019, pp. 244–248, 2020, doi: 10.1109/LSP.2021.3112314. 10.1016/j.procs.2020.02.143. [37] C. Dixit and S. M. Satapathy, “Deep CNN with [27] J. Zhang, Z. Yin, P. Chen, and S. Nichele, late fusion for real time multimodal emotion “Emotion recognition using multi-modal data recognition,” Expert Syst. Appl., vol. 240, no. and machine learning techniques: A tutorial and November 2023, p. 122579, 2024, doi: review,” Inf. Fusion, vol. 59, pp. 103–126, Jul. 10.1016/j.eswa.2023.122579. 2020, doi: 10.1016/j.inffus.2020.01.011. [38] Y. Cimtay, E. Ekmekcioglu, and S. Caglar- [28] S. Cunningham, H. Ridley, J. Weinel, and R. Ozhan, “Cross-subject multimodal emotion Picking, “Supervised machine learning for audio recognition based on hybrid fusion,” IEEE 76 Informatica 49 (2025) 61–76 H. Kumar et al. Access, vol. 8, pp. 168865–168878, 2020, doi: multimodal sentiment analysis,” EMNLP 2017 - 10.1109/ACCESS.2020.3023871. Conf. Empir. Methods Nat. Lang. Process. [39] R. G. Praveen, E. Granger, and P. Cardinal, Proc., pp. 1103–1114, 2017, doi: “Cross Attentional Audio-Visual Fusion for 10.18653/v1/d17-1115. Dimensional Emotion Recognition,” Proc. - [50] D. Hu, X. Hou, L. Wei, L. Jiang, and Y. Mo, 2021 16th IEEE Int. Conf. Autom. Face Gesture “Mm-Dfn: Multimodal Dynamic Fusion Recognition, FG 2021, 2021, doi: Network for Emotion Recognition in 10.1109/FG52635.2021.9667055. Conversations,” ICASSP, IEEE Int. Conf. [40] D. Sharma, M. Jayabalan, N. Sultanova, J. Acoust. Speech Signal Process. - Proc., vol. Mustafina, and D. N. L. Yao, “Multimodal 2022-May, pp. 7037–7041, 2022, doi: Emotion Recognition Using Attention-Based 10.1109/ICASSP43922.2022.9747397. Model with Language, Audio, and Video [51] A. R. Pathak, M. Pandey, and S. Rautaray, Modalities,” Lect. Notes Data Eng. Commun. “Application of Deep Learning for Object Technol., vol. 191, pp. 193–210, 2024, doi: Detection,” Procedia Comput. Sci., vol. 132, no. 10.1007/978-981-97-0293-0_15. Iccids, pp. 1706–1717, 2018, doi: [41] Z. Fu et al., “A cross-modal fusion network 10.1016/j.procs.2018.05.144. based on self-attention and residual structure for [52] Y. Tian, P. Luo, X. Wang, and X. Tang, multimodal emotion recognition,” pp. 2–6, “Pedestrian detection aided by deep learning 2021, [Online]. Available: semantic tasks,” Proc. IEEE Comput. Soc. Conf. http://arxiv.org/abs/2111.02172 Comput. Vis. Pattern Recognit., vol. 07-12- [42] V. John and Y. Kawanishi, “Audio and Video- June, pp. 5079–5087, 2015, doi: based Emotion Recognition using Multimodal 10.1109/CVPR.2015.7299143. Transformers,” Proc. - Int. Conf. Pattern [53] A. H. Abdulwahhab, N. T. Mahmood, A. A. Recognit., vol. 2022-Augus, no. August, pp. Mohammed, I. Myderrizi, and M. H. Al-Jumaili, 2582–2588, 2022, doi: “A Review on Medical Image Applications 10.1109/ICPR56361.2022.9956730. Based on Deep Learning Techniques,” J. Image [43] Y. H. H. Tsai, P. P. Liang, A. Zadeh, L. P. Graph. Kingdom), vol. 12, no. 3, pp. 215–227, Morency, and R. Salakhutdinov, “Learning 2024, doi: 10.18178/JOIG.12.3.215-227. factorized multimodal representations,” 7th Int. [54] V. Adarsh, P. Arun Kumar, V. Lavanya, and G. Conf. Learn. Represent. ICLR 2019, 2019. R. Gangadharan, “Fair and Explainable [44] P. Kumar, S. Malik, and B. Raman, Depression Detection in Social Media,” Inf. “Interpretable multimodal emotion recognition Process. Manag., vol. 60, no. 1, p. 103168, using hybrid fusion of speech and image data,” 2023, doi: 10.1016/j.ipm.2022.103168. Multimed. Tools Appl., vol. 83, no. 10, pp. [55] H. Zhou, J. Du, Y. Zhang, Q. Wang, Q. F. Liu, 28373–28394, 2024, doi: 10.1007/s11042-023- and C. H. Lee, “Information Fusion in Attention 16443-1. Networks Using Adaptive and Multi-Level [45] P. P. Liang and R. Salakhutdinov, Factorized Bilinear Pooling for Audio-Visual “Computational Modeling of Human Emotion Recognition,” IEEE/ACM Trans. Multimodal Language : The MOSEI Dataset and Audio Speech Lang. Process., vol. 29, pp. 2617– Interpretable Dynamic Fusion,” in Proceedings 2629, 2021, doi: of the 56th Annual Meeting of the Association 10.1109/TASLP.2021.3096037. for Computational Linguistics, 2018. doi: [56] S. Lee, D. K. Han, and H. Ko, “Multimodal 10.18653/v1/P18-1208. Emotion Recognition Fusion Analysis Adapting [46] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, BERT with Heterogeneous Feature and D. Manocha, “M3ER: Multiplicative Unification,” IEEE Access, vol. 9, pp. 94557– Multimodal Emotion Recognition using Facial, 94572, 2021, doi: Textual, and Speech Cues,” in Proceedings of 10.1109/ACCESS.2021.3092735. the AAAI Conference on Artificial Intelligence, [57] S. Siriwardhana, T. Kaluarachchi, M. 2020, pp. 1359–1367. doi: Billinghurst, and S. Nanayakkara, “Multimodal 10.1609/aaai.v34i02.5492. emotion recognition with transformer-based self [47] N. Singh, N. Singh, and A. Dhall, “Continuous supervised feature fusion,” IEEE Access, vol. 8, Multimodal Emotion Recognition Approach for pp. 176274–176285, 2020, doi: AVEC 2017,” Comput. Vis. Pattern Recognit., 10.1109/ACCESS.2020.3026823. 2017, doi: 10.48550/arXiv.1709.05861. [58] M. A. H. Akhand, S. Roy, N. Siddique, M. A. S. [48] L. Schoneveld, A. Othmani, and H. Abdelkawy, Kamal, and T. Shimamura, “Facial emotion “Leveraging recent advances in deep learning recognition using transfer learning in the deep for audio-Visual emotion recognition,” Pattern CNN,” Electron., vol. 10, no. 9, 2021, doi: Recognit. Lett., vol. 146, pp. 1–7, 2021, doi: 10.3390/electronics10091036. 10.1016/j.patrec.2021.03.007. [59] N. A. S. Badrulhisham and N. N. A. Mangshor, [49] A. Zadeh, M. Chen, E. Cambria, S. Poria, and L. “Emotion Recognition Using Convolutional P. Morency, “Tensor fusion network for Neural Network (CNN),” J. Phys. Conf. Ser., Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76 77 vol. 1962, no. 1, 2021, doi: 10.1088/1742- 6596/1962/1/012040. https://doi.org/10.31449/inf.v49i12.7041 Informatica 49 (2025) 77–90 77 Deep Learning-Based Involution Feature Extraction for Human Posture Recognition in Martial Arts Desheng Chen1*, Sifang Zhang2* 1School of Physical Education, Anyang Preschool Education College, Anyang 456150, China 2Department of Physical Education, Wuhan Sports University, Wuhan 430205, China Email: ayysbgscds@126.com, z1234567sf@sina.com *Corresponding author Keywords: human action recognition, deep learning, long- and short-term memory neural networks, lightweight networks, feature extraction Received: August 30, 2024 With the development of computers in recent years, human body recognition technology has been vigorously developed and is widely used in motion analysis, video surveillance and other fields. This study is based on deep learning to improve human pose estimation. Firstly, Involution's feature extraction network was proposed for lightweight human pose estimation, and this feature extraction network was combined with existing human pose estimation models to recognize human pose. Label and classify each joint point of the human body separately, add weights to each different part, extract feature between joint points at different times, and then input the extracted feature into long short-term memory neural networks for recognition. The experimental results show that the improved human pose estimation model reduces the parameter and computational complexity by about 40% compared to the original model, while also slightly improving accuracy. Comparing the performance of models under various algorithms with the proposed model in this study, the accuracy under the Eigen method is 81.3%, the accuracy under the STOP method is 82.5%, the accuracy under the DMM&HOG method is 85.3%, the accuracy under the Actionlet method is 87.6%, and the accuracy under the JAS&HOG2 method is 83.5%. The accuracy of the InNet LSTM method is 90.6%. The results indicate that the proposed model has good performance and can recognize different martial arts movements. Povzetek: Za prepoznavanje človeške drže v borilnih veščinah so porabljene involucijske ekstrakcije značilk za globoko učenje. 1 Introduction extract and accurately recognize and classify human feature. The research is divided into four main sections, With the development of computers, artificial the first of which is a brief review of other research topics intelligence has become increasingly relevant to on human recognition. The second part is a review of the people's lives. The advent of computer vision enables main methods used in this research, and the third part is computers to automatically recognize human actions the results of the model obtained by applying the methods and classify them [1]. Initially human movement to the research and analysing the results. The fourth part recognition relied on decomposing video frame by is a summary of all the above studies and an outlook for frame and then acquiring information from it, and then future research. recognizing human movements through image processing. This approach requires manually designing motion feature to represent the human body and then 2 Literature review modelling the motion feature to achieve the recognition effect, but manually acquiring the feature With the development of computers, human body requires a lot of time and effort [2]. In this study, a recognition technology has been vigorously developed human skeleton network was created by using Deep and is widely used in motion analysis, video surveillance, Residual Networks (ResNet) combined with etc. Liu et al. proposed a method for estimating the 3D Involution's improved algorithm for feature extraction. pose of a single person in two views without camera The human posture at each moment is represented by parameters in order to cope with the problem of needing the human skeleton, so that the human posture feature to know the camera parameters to obtain coordinate can be quantified by the human skeleton network. The accuracy in the camera's two views. It extracts the joint extracted feature is then fed into the Long Short-Term points from two different views through 2D estimation Memory (LSTM) neural network for processing and and inputs them into a 3D regression network to generate recognition. This model is designed to efficiently 3D joint point coordinates. The coordinates are then combined with a 3D human pose recognition model to identify the human pose. The results of the study 78 Informatica 49 (2025) 77–90 D. Chen et al. indicated that this method extracted a high accuracy visible, but the model still has high recognition accuracy. rate for human pose action recognition [3]. Ferreira et The model consists of two components, namely a al. proposed a skeleton structure and deep semantic cascaded feature network and a graph structure network. feature based on human pose estimation network to The results show that the model has excellent recognition train a repetition counting and validation system, accuracy [7]. Zhang et al. found that existing 3D human which is able to make detection of human activities pose estimation methods focus on overall joint error and quickly identify the function of invalid repetition reduction, which leads to large errors in endpoint and information. The results show that the system is able to bone length. To address this problem, the group proposed accurately identify human movements and remove a human structure-aware network that can extract feature invalid repetitive information from them [4]. Liu et al. data from existing 2D joints to repair the positions of 3D propose a new elliptical distribution coding method in joint points. The results show that this method can order to help computers to accurately identify human effectively reduce the error between endpoint and bone movements. The method first describes the human length, resulting in a high improvement in recognition skeleton by elliptical Gaussian coordinate coding, then accuracy [8]. measures the difference between the predicted heat Ht et al. found that traditional human action map and the ground truth heat map, and finally the recognition uses manual feature from traditional human pose images for recognition. The results of the classifiers and is unable to make recognition of complex study show that the method has a good performance in human actions using advanced spatio-temporal feature. both datasets of the experiment and can provide high To address this problem, the research team proposed a recognition accuracy [5]. Vishwakarma proposes a coding technique that converts poses into feature images, method for recognizing human actions in videos that extracts high-level feature from the feature images and can be identified by deterministic actions, which uses a feeds them into a feature recognition system for double transform of wavelets to perform feature recognition. The results show that the method is able to extraction of human actions. The extracted feature is recognize human actions with high recognition accuracy then recognized. The results show that the method has [9]. Silva and Marana argue that existing human pose high recognition accuracy in different datasets [6]. extraction uses straight lines to represent body parts in a Tian et al. argue that the key points of the human body two-dimensional human model. The team proposes an under many images in the video may produce improved method based on existing human pose unreasonable prediction results from the human pose extraction, which maps each segment of a 2D pose to a estimation method due to issues such as illumination, point to extract spatial feature. The results of the study occlusion, etc. To address this problem, the team indicate that the method is effective in improving the designed a new generative adversarial network to recognition rate [10]. address the situation where some keypoints are not Table 1: Literature review Performance Study Method Application Key Findings References Comparison Extracts joint Dual-view 3D pose Human pose coordinates from High accuracy in Liu L et al. estimation without estimation in dual dual 2D images, human pose [3] camera parameters views inputs to 3D recognition regression network Accurate Skeleton and deep Human activity Detects activities and recognition with Ferreira B et semantic feature detection and removes redundant effective [4] al. training system filtering repetitions redundancy filtering Uses heatmap Action High recognition Elliptical Gaussian differences for Liu H et al. recognition in accuracy on [5] coordinate encoding precise pose skeletal models various datasets identification Vishwakarma Dual-wavelet Human action Extracts motion Consistently high [6] Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 79 D K transformation recognition in feature using wavelet accuracy across videos transform datasets Cascade and High accuracy Generative graph-based Pose estimation even with Tian L et al. Adversarial Network networks handle [7] with occlusion occluded (GAN) lighting and keypoints occlusion Enhanced 3D joint Reduces endpoint Structure-aware accuracy with Zhang X et al. correction in and bone length [8] network reduced joint skeletal models errors errors Pose encoding to Converts pose to Complex human High accuracy in feature images for feature images for Ht A et al. behavior complex activity [9] high-level feature advanced feature recognition recognition extraction recognition Spatial feature 2D human pose extraction from Maps 2D segments to Improved Silva V et al. representation [10] mapped pose extract spatial feature recognition rates improvement segments In summary shown in Table 1, many scholars 3 Martial arts movement recognition have conducted research in the field of human pose recognition and achieved significant results, but there based on human posture estimation are still some limitations. Firstly, many methods rely With the development of the Internet, human body on multi view inputs or high-quality data, and the recognition technology has been vigorously developed recognition accuracy may decrease in single view or and is widely used in motion analysis, video surveillance complex backgrounds. Secondly, encoding methods and other fields. In this study, Involution's feature based on skeleton or feature images have limited extraction network is first proposed for lightweight performance in dealing with large occlusions or human pose estimation, which is combined with existing complex non repetitive actions. Some methods have human pose estimation models to recognize human pose. high computational complexity and are not The extracted feature is then fed into a longand short term user-friendly for real-time applications, and models memory neural network. such as generative adversarial networks rely heavily on training data, increasing the complexity of model construction and training. In addition, information loss 3.1 Involution feature extraction network during the encoding process may affect recognition based human pose recognition performance, especially in situations where there are In the field of computer imaging, the main indicator of rich pose details or diverse pose changes, limiting the the strength of a neural network's performance is the applicability and accuracy of these methods. The deep strength of its feature extraction performance. By residual network combined with the improved analysing the existing convolutional kernels, two algorithm of Involution is used for feature extraction, drawbacks are found, one is that the perceptual field is creating a human skeleton network to recognize and difficult to capture feature dependencies over long classify human actions. distances due to the limitation of the convolutional kernel size. 79 80 Informatica 49 (2025) 77–90 D. Chen et al. The other is that the information between channels is of channels. Channel invariance allows neural networks rather complex and redundant. To solve this problem, to share in terms of channel dimensions, thus solving the this research proposes a new neural network operator problem of complex redundancy of information between Involution to assist feature extraction [11]. Involution channels. The main function of Involution is the is spatially specific, spatially specific in that it reallocation of arithmetic power, which allows the increases the receptive field by increasing the size of computer to perform optimally. the convolution kernel, and channel invariant in terms  H 11K 2 11C C K K 1 K K C Q Figure 1: Generate convolutional kernel diagram Figure 1 shows the process of generating a convolution kernel by Involution. Firstly, a x x multi-channel feature map is input and the feature vector of any point in the feature map is selected. Weight layer Weight layer Multiplying this kernel with the feature vector adjacent to the point gives the K K C feature map, and relu x relu finally the K K C kernel is superimposed to obtain the final output feature map, with Involution Weight layer Weight layer generating different kernels for different locations and F(x) relu F(x) sharing a single kernel at the same location on the relu channel [12]. The traditional Convolution kernel H(x)=F(x) H(x)=F(x)+x counts and Involution counts are shown in Equation (a)General neural network (b)ResNet (1). Figure 2: Neuron learning feature map 11C0 Ci K K  (1) H QK K G Figure 2(a) shows the process of learning feature in the fully-connected layer of a general neural network, which can be seen to be learning directly on the mapping In Equation (1), 11 denotes the convolution between input and output. Figure 2(b) represents the kernel shared at H Q pixel points, C0 denotes process of learning feature in the fully-connected layer of the number of channels in the output, Ci denotes the ResNet, which can be seen to learn the residuals between number of channels in the input, K denotes the size the input and output. The InNet unit still has the same of the convolution kernel, and G denotes the number structure as the ResNet unit, with three convolutional of groupings. The number of channels is usually larger, layers in series, the first layer still reduces the dimension the number of groups is usually much smaller than the of the input channel, and the second layer uses the number of channels, and the size of the Involution convolutional kernel generated by Involution to replace convolution kernel does not have a number of channels, the original the second layer uses the convolution kernel so the ability to capture long distance feature can be generated by Involution to replace the original enhanced by increasing the size of the convolution convolution kernel. The third layer is to expand the kernel. Involution is able to increase the accuracy of reduced-dimensional feature to the desired size. This the model by this method while reducing the number improvement improves the feature extraction capability of model parameters and the amount of computation of InNet and also reduces the number of parameters and [13]. computational effort [14]. Convolution and Involution are This research uses a deep residual network shown in Equation (2). combined with an Involution modified algorithm for feature extraction. The neuron learning feature maps of the general neural network and ResNet are shown in Figure 2. Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 81 Equation (2) shows the number of parameters for K 2C2 Convolution and Involution and the amount of calculation  C 2  + K 2GC for Convolution and Involution. Where H is the height  of the input feature map, Q is the width of the input r  (2) feature map, C is the number of input feature map HQK 2C 2 channels, and r is the channel reduction ratio. The  C 2 + K 2GC  Involution Pose Estimation Net (IPEN) is used as the HQ + HQK 2C  r basis for extracting feature for the convolution kernel by Involution as shown in Figure 3. onv2Deconv3 Conv1 Layer1 Deconv1Dec Layer2 Layer3 Final Layer 512 256 128 256 256 256 3 20 64 64 Figure 3: Convolutional kernel feature extraction graph As shown in Figure 3, firstly, the input is a 3.2 Research on martial arts movement 3-channel image, and after passing through the first recognition based on human posture convolutional layer Conv1, the number of channels Since traditional neural networks often fail to increases to 64. Next, after passing through three achieve the desired results when processing data with consecutive convolutional layers, Layer1, Layer2, and temporal information such as video and audio, Recurrent Layer3, the number of channels in the feature map Neural Network (RNN) was introduced to process the increases to 128, 256, and 512, respectively. After data. Recurrent Neural Networks are capable of completing the three convolutional layers, it enters the outputting information that is dependent on both present deconvolution stage (Deconv1, Deconv2, and input and historical records. The structure of an RNN is Deconv3). In the deconvolution stage, each shown in Figure 4. deconvolution layer gradually reduces the number of channels in the feature map from 512 to 256, resulting in a final output of 20 channels. The human pose Ot O1 O O 2 3 recognition network uses InNet as the feature V V V V extraction network of the recognition network, and after expanding it by ordinary convolutional layers, A A W A W A W Involution is used to extract feature information from tanh W tanh tanh tanh the image, and the nodes are obtained by three U U U U convolutional layers that act as regressors. The metrics It I1 I2 I3 used to evaluate the model are Object Keypoint Similarity metrics, as shown in Equation (3). Figure 4: Structure diagram of recurrent neural network  exp−d 2 2S 2 2 l pl p I  (vpl =1) Figure 4 represents the structure of an RNN, where Oksp = (3)   (v A represents a single neural network unit, O t l pi =1) represents the output at the time point, and I represents t the input at the time point. U, V and W represent the In Equation (3), p represents the person ID, l different network weights respectively. The Long Short represents the number of keypoints S represents the p Term Memory Neural Network is an improvement on the current person's scale factor, v represents whether pl RNN, which can process time series like the RNN and the l th key point of the p th person is observable, has a similar structure to the RNN, but the recurrent d represents the rated Euclidean distance between pl structure in the LSTM network is not the same as that of each person and each person's predicted joint point, the RNN. The recurrent structure consists of three parts  represents the normalisation factor for the I th I respectively three gate structures, one unit state and four skeletal point, and  represents the function that neural network layers [16]. The structure of the LSTM calculates the visible point [15]. neural network is shown in Figure 5. 81 82 Informatica 49 (2025) 77–90 D. Chen et al. Ct−1 Ct × + tanh × × A A sigmoid sigmoid tanh sigmoid h h t−1 ht t+1 X t−1 X t X t+1 Figure 5: LSTM neural network structure diagram As can be seen from Figure 5, the entire recurrent articulation points, (x, y, z) represents the coordinates structure consists of a short-term memory module, a of the articulation points, and j represents the j th current memory module and a long-term memory articulation point in the skeleton at t . The state of the module. In the current memory module, there are four human skeleton at each moment is coded into a network, neural network layers, three of which are single-layer and the skeleton joints at each moment change with time sigmoid feed forward neural networks and one is a [18]. Define the interaction network of articulation points single-layer tanh feed forward neural network, and the at different moments in time as shown in Equation (5). LSTM neural network is mainly used to filter the SANt = (Vt , E ) feature information and determine the retention status t (5) through three gate structures: input gate, output gate In Equation (5), V denotes the set of vertices in the and forgetting gate. Each gate structure is composed of t network at the moment of t and E denotes the set of a vector operation and a sigmoid neural network layer. t edges in the network at the moment of t . For the Classify human joints using a human pose recognition skeleton state at the same moment, the joints are network combined with human joint data, as shown in connected to each other and the relationship between the Figure 6. joints is expressed by calculating the Euclidean distance between each joint as shown in Equation (6). d (i, j) = (x − 2 2 i x j ) + (yi − y j ) + (zi − z j ) 2 (6) In Equation (6), i is any one of the joints except j . Since the human body completes the action, it is not determined by individual joints, but by the overall coordination of the human body, just using the Euclidean distance cannot highlight the relationship between each joint well, so the human body is divided into different five parts, and different weight coefficients are set according to different parts as shown in Equation (7). Figure 6: Skeleton division diagram of the human body d (i, j)a1,1 i, j  4  d (i, j)a2 ,5  i, j  8 Figure 6 shows the division of the human skeleton.  w(i, j)d (i, j)a ,9  i, j 12 (7) The importance of different bones to the human body 3  varies. If bone joints are divided into 20 joints d (i, j)a4 ,13  i, j 16  according to their importance, a sequence of human d (i, j)a5 ,17  i, j  20 posture can be represented by equation (4) [17]. In Equation (7), a and a represent the weight 1 2  coefficients of the left and right arms, a and a S = K1, K 3 4 2 , , Kt  represent the weight coefficients of the left and right leg K j (4)  t = (x j , y j , z j ),1 j  M parts, a represents the weight coefficient of the torso 5 part. After the skeleton nodes were constructed, the In Equation (4), S represents the sequence of feature information of the image was extracted by CNN human skeletal articulation points, K represents the local convolution. The extracted feature data is then fed t skeleton at t , M represents the number of into the LSTM for processing, and the feature data is Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 83 filtered and judged by various gates. Each LSTM cell vector of the network nodes is analysed as shown in has an input gate, an output gate and an oblivion gate, Equation (13). and the input gate is calculated as shown in Equation (8). N ECi = c A ( j ij EC = i 13) 1 it = g(W xi xt +W hi h t−1 +bi ) (8) In Equation (13), EC represents the eigenvector i centrality and sets its initial value to 1, represents the Equation (8) represents the input gate, x adjacency matrix in the network, A represents the t ij represents the input value of the network at the current connection between nodes i and j , and the initial time and h represents the output value of the vector of EC is cyclically multiplied with A to obtain t−1 i network at the previous time. b denotes the input the value of EC . The stability of the network is usually i i gate constant parameter. The output gate is calculated assessed by the average degree as shown in Equation (14) as shown in Equation (9). [20-21]. ft = g(W N x f xt +W h f h t−1 +b f ) (9)  K K = i=1 i (14) i N Equation (9) represents the output gate, x t represents the input value of the network at the current In Equation (14), K represents the weighted moment, and b represents the output gate constant i i degree of the node i . The topological properties of the parameter. The formula for the forgetting gate is network nodes are combined with the topological shown in Equation (10) [19]. properties of the network to represent the entire skeleton of the action network. A sample skeleton of all actions is ot = g(W xo xt +W ho h t−1 +bo ) (10) shown in Equation (15). Equation (10) represents the forgetting gate and Yinput = [1,2 ,3 , ,u−1,u ] (15) h represents the output value of the network at the t−1 previous moment. b denotes the constant parameter o In Equation (15), Y denotes the input to the input of the forgetting gate. In the IPN recognition technique LSTM and u denotes the number of samples.  for the skeleton, the human skeleton at each moment is u denotes the feature vector in u . The samples are encoded as a network, and the weights of the edges are classified by this method to identify human actions. The calculated based on the distance between any two process of this model is as follows. Firstly, the Involution joints in the network as shown in Equation (11). operation dynamically generates convolutional kernels that adapt to feature maps, enhancing the ability to d (i, j) =1 (xi − x j ) 2 + (yi − y 2 2 capture long-distance feature and reducing information j ) + (zi − z j ) (11) redundancy between channels. By combining this In Equation (11), i and j denote the nodes in network structure with a deep residual network, an the network and improved InNet was formed, which can efficiently extract (x, y, z) denotes the 3D coordinates of the node, it can be seen that the weights are feature and reduce the number of model parameters. expressed as the reciprocal of the Euclidean distance. Subsequently, LSTM was used to process time series data In order to represent the transformation of the nodes in and analyze the dynamic changes of human joint points. the network in time, metrics such as proximity Joint points are classified according to their importance, centrality are introduced for evaluation, as shown in and Euclidean distance is calculated to describe the Equation (12). relationship between joints. The sensitivity of action recognition is improved by setting weights for different parts. N −1 CCi = (12)  d (i, j) jU , ji 4 Performance analysis of martial In Equation (12), N is the number of nodes in arts movement recognition based on the network and U is the set of all nodes in the human posture estimation network. Proximity centrality indicates how close the The first section of this chapter analyses Involution's node is to each of the other nodes in the network; the downsampling capability and then analyses the accuracy closer the node is, the greater its closeness centrality, of the model under different dataset sizes to determine the but the same node will change over time and its best data size to calculate its feature extraction time. The centrality will change as well. The eigencentricity second section provides an analysis of the introduction of 83 84 Informatica 49 (2025) 77–90 D. Chen et al. LSTM networks to compare the models under different Method Input size Param FLOPs algorithms. 256 x 192 28.4M 7.2G ResNet-Q32 384 x 288 28.4M 16.5G 4.1 Performance analysis of human pose 256 x 192 63.9M 14.7G recognition based on involution ResNet-Q48 384 x 288 63.9M 32.5G feature extraction network 256 x 192 17.1M 4.7G To verify the performance of this feature extraction InNet-Q32 384 x 288 17.1M 10.1G network using InNet as the recognition network, InNet 256 x 192 38.7M 7.9G was compared with ResNet. The CPU used in this InNet-Q48 experiment is Intel(R) Xeon® Gold6226@2.7GHz, the 384 x 288 38.7M 20.4G GPU used is NVIDIA GeForce Tesla V100S, and the memory is 32 GB. The learning rate of the model is set In Table 2, Q32 indicates that the number of to 0.001 and decays by 0.1 every 10 epochs to channels for each convolutional layer is set to 32, and gradually reduce the learning rate. The batch size is 32 Q48 indicates that the number of channels for each to ensure efficient memory usage during the training convolutional layer is set to 48. Table 2shows the table of process. The optimizer uses Adam because of its Involution's degree-reducing capacity, InNet for using adaptive learning rate feature, which can handle sparse Involution instead of Convolution, from the table it can gradient problems. The loss function uses cross be seen that ResNet's Param is 28.4M and 63.9M, InNet's entropy loss, which is suitable for multi class Param is 17.1M and 38.7M, ResNet under different classification tasks. Using L2 regularization, the methods, different sizes of The FLOPs of different sizes weight decay parameter is set to 0.0001 to reduce the for ResNet were 7.2G, 16.5G, 14.7G and 32.5G, risk of overfitting. In terms of data augmentation, respectively, and the FLOPs of different sizes for InNet methods such as random cropping, rotation, and were 4.7G, 10.1G, 7.9G and 20.4G, respectively, under translation are applied during training to improve the different methods. The experimental results indicated that model's generalization ability. In terms of feature the InNet method using Involution instead of Convolution extraction network, the number of layers in the reduced the number of parameters and computation by Involution network is set to 5, and the number of about 40%, indicating that Involution has good capability channels is set to 128 to evaluate performance. In of reducing parameters. Compare the computational terms of LSTM configuration, the number of units is complexity of different methods. set to 256 to better capture time series feature. The As shown in Table 3. InNet reduces its dependence training cycle is set to 100 epochs, using 20% of the on large convolution kernels through Involution, while data as the validation set to monitor model ResNet relies on deep residual structures, and LSTM uses performance and prevent overfitting. When the number recursive structures to process time series. InNet has of layers in the network is small, Involution has less relatively low memory usage because it uses smaller compression power, but the accuracy is improved. As feature maps, while ResNet requires more memory due to the number of layers increases, Involution has a good its deep structure. LSTM also increases memory improvement in compression, but with some loss of requirements when processing long sequences. The accuracy. latency of InNet is moderate, influenced by input size and sequence length. ResNet and LSTM can cause high latency when processing large inputs or long sequences. Table 2: Argument reduction capability of revolution Table 3: Comparison of computational complexity Model Processing Flow Memory Usage Latency Utilizes Involution instead of convolution for Low to moderate, depending Moderate, influenced by InNet feature extraction, followed by LSTM for on feature map size and input feature map size and sequence analysis number of channels time steps Employs multiple residual blocks for feature High, especially in deeper High, particularly when ResNet extraction, followed by fully connected networks processing large input sizes layers for classification High, due to the need to store High, especially with long Uses a recurrent structure to handle sequence LSTM hidden states and input sequences and multiple data sequences feature Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 85 1.00 1.00 0.75 0.75 ResNet InNet ResNet 0.50 InNet 0.50 0.25 0.25 0 0 0 100 200 300 400 500 0 10 20 30 40 50 Dataset size Iterations (a)The relationship between dataset size and accuracy (b)The relationship between the number of iterations and accuracy Figure 7: Model accuracy of RseNet and InNet As can be seen from Figure 7(a), the extraction its accuracy, its training time and recognition time is still performance of both methods is better when the dataset an important indicator as shown in Figure 8. is larger and contains more species. Since the number Figure 8(a) shows the change in model performance of Involution parameters and the amount of for both methods as the training time increases. It can be computation in InNet is less compared to that of the seen that the training time for InNet is a little longer than traditional Convolution in ResNet, the accuracy of that for ResNet, the situation is due to the fact that InNet InNet is still increasing when the size of the dataset uses a larger dataset during training and only a large reaches a certain amount, ResNet has levelled off. enough dataset can satisfy InNet to allow it to train to From Figure 7(b), it can be seen that with the selected achieve the best performance. Figure 8(b) shows the dataset size, InNet has been able to achieve the best change in model accuracy as the recognition time recognition performance with a small number of increases for both methods. It can be seen that InNet is iterations, and ResNet has not yet achieved the best able to use a small amount of time to achieve the best performance with the number of iterations where recognition accuracy on images when recognizing. The InNet's performance has reached its best, and reaches a results of the study indicate that the training time for point where when the performance no longer changes, InNet is slightly longer but within acceptable limits and it is still lower than InNet's performance. It can be seen that the overall performance of InNet is better than that of that InNet has good performance in feature extraction. ResNet. Judging the goodness of a model cannot only focus on 1.00 1.00 0.75 0.75 ResNet ResNet InNet 0.50 InNet 0.50 0.25 0.25 0 0 0 5 10 15 20 25 0 1 2 3 4 5 Training time(s) Recognition Time(s) (a)The relationship between training time and accuracy (b)The relationship between recognition time and accuracy Figure 8: Analysis of training time and recognition time for two models 85 Accuracy Accuracy Accuracy Accuracy 86 Informatica 49 (2025) 77–90 D. Chen et al. 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 ResNet 0.2 ResNet InNet InNet 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Limb weight coefficient Body weight coefficient (a)Recognition accuracy under different methods (b)Recognition accuracy under different methods of changing Limb weight coefficient of changing torso weight coefficients 1.0 P<0.01 InNet P<0.01 ResNet P<0.01 0.8 P<0.01 P<0.01 0.6 0.4 0.2 0 Con 1 Con 2 Con 3 Con 4 Con 5 (c)Accuracy under different configurations Figure 9: Model accuracy under different weight coefficients and configurations 4.2 Performance testing of a human coefficients and stabilises when the limb weighting posture-based martial arts coefficients reach 0.8. When the weighting factor of the limbs was changed, the change in accuracy was minimal movement recognition model when the weighting factor of the torso was changed. The A selection of martial arts moves is identified, experimental results show that the influence of the limbs including the lunge punch, punch and pop kick, horse on the accuracy is greater than the influence of the torso stance punch, horse stance frame punch and top stomp on the accuracy. Figure 9 (c) Accuracy rates for five kick. The five movements are renamed as Movement 1, different weighting factors not chosen named Movement 2, Movement 3, Movement 4 and configurations 1 to 5 respectively, at different weighting Movement 5 respectively. The weighting coefficients factors. Configuration 1 has a torso weight coefficient of for the left and right arms, the left and right legs and 0.2 and limbs weight coefficient of 0.8, Configuration 2 the torso were assigned to compare the influence of has a torso weight coefficient of 0.6 and limbs weight each part of the skeleton on the recognition system. coefficient of 0.8, Configuration 3 has a torso weight This is shown in Figure 9. coefficient of 0.6 and limbs weight coefficient of 1.0. Figure 9 (a) shows the recognition accuracy for Configuration 4 has a torso weight coefficient of 0.4 and different methods with a value of 1 for the torso limbs weight coefficient of 0.8. Configuration 5 has a weighting factor a and changing the weighting 3 torso weight coefficient of 0.4 and limbs weight factor for the extremities. Figure 9(b) shows the coefficient of 0.6. It can be seen that the accuracy of recognition accuracy for different methods of varying recognition is maximised when the torso weighting factor the torso weighting coefficients when the values of the is 0.6 and the limb weighting factor is 0.8(P<0.01). A limb weighting coefficients a and a are set to 1. 1 2 comparison of recognition for different actions at It can be seen that when the limb weighting different weighting factors is shown in Figure 10. coefficients are changed, the accuracy rate increases significantly with the increase of limb weighting Accuracy Accuracy Accuracy Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 87 Action 1 Action 2 Action 3 Action 4 Action 5 100 90 80 70 Config 1 Config 2 Config 3 Config 4 Config 5 Figure 10: Accuracy of five different actions under five weight coefficients Table 4: Recognition accuracy of different algorithms on datasets Method Eigen STOP DMM & HOG Actionlet JAS & HOG2 InNet-LSTM Data 1 81.3 82.5 85.3 87.6 83.5 90.6 Accuracy (%) Data2 76.5 81.4 86.2 88.6 87.9 93.6 Accuracy (%) From Figure 10, the accuracy of five different DMM&HOG method is 85.3%, Actionlet method is actions with five weighting factors, it can be seen that 87.6%, and JAS&HOG2 method is 83.5%. The accuracy the action with weighting configuration 2 has the of the InNet LSTM method is 90.6%. In dataset 2, the highest average accuracy and has a more stable accuracy of each method is higher than in dataset 1. The performance. The other weighting configurations all accuracy of the proposed methods in this study was have large fluctuations in accuracy and show unstable higher than the other methods. To further validate the performance. Considering both stability and accuracy, accuracy of the InNet-LSTM method, the accuracy of the the weights used to construct the human skeleton were different methods was verified under different dataset set to a torso weighting factor of 0.6 and an extremity sizes. This is shown in Figure (11). weighting factor of 0.8. Different methods were introduced to compare with the method used in this study, and the MSR Action 3D dataset was chosen for DMM & HOG 100 InNet-LSTM this experiment. This dataset contains 6000 images Eigen Actionlet specifically designed for human action recognition 95 JAS & HOG2 tasks, covering multiple explicit action categories STOP including walking, running, jumping, and sitting. The 90 image size of each sample is 640x480 pixels, ensuring clarity and detail. The sample distribution of these 85 action categories is uneven, with more samples for walking and running, and relatively fewer samples for 80 jumping and sitting, which may affect the training effectiveness and performance of the model. Each 75 1 2 3 4 5 6 7 8 9 10 image is equipped with clear labels to indicate the Relative size of dataset corresponding action category, ensuring the accuracy Figure 11 Accuracy of different methods under different of the training data. In addition, the dataset generates dataset sizes additional samples through data augmentation techniques, including random rotation, flipping, and scaling, to enhance the model's generalization ability. The accuracy of InNet-LSTM is lower than that of Different algorithms were used to divide the dataset Actionlet method when the dataset is small, but when the into Dataset 1 and Dataset 2, and the recognition dataset increases to a certain level, the accuracy of accuracy on different datasets is shown in Table 4. InNet-LSTM is greater than that of other methods. According to Table 4, in dataset 1, the accuracy of Eigen method is 81.3%, STOP method is 82.5%, 87 Accuracy Similarity 88 Informatica 49 (2025) 77–90 D. Chen et al. a high performance. However, there are still shortcomings 5 Discussion in this study. When constructing a human skeleton model, the weights between the joints are determined by the Human motion recognition relies on video frame by distance between the joints, and the evaluation indicators frame decomposition and manually designing motion are too single. And the research was conducted in a feature to achieve recognition. The martial arts action laboratory environment. Future research is considering recognition system based on Involution feature using more indicators to construct human skeleton models extraction network and LSTM proposed in the study and applying them to practical applications to test the optimizes recognition accuracy and efficiency by performance of the models. reducing the computational complexity of traditional convolutional networks. The experimental results show that compared with traditional convolutional networks References such as ResNet, Involution significantly improves accuracy while reducing the number of parameters, [1] S. Yan, Y. Xiong, D. Lin. Spatial temporal graph especially on datasets of different sizes, with an convolutional networks for skeleton-based action average increase of 5% in object keypoint similarity recognition. 2018, 32(1): 56-72. and 8% in accuracy in the test set. This is due to the https://doi.org/10.1609/aaai.v32i1.12328. advantage of LSTM in time series modeling, which [2] W. Luo, W. Liu, S. Gao. Normal graph: Spatial enables the system to better understand the dynamic temporal graph convolutional networks-based changes in action sequences, especially achieving an prediction network for skeleton based video accuracy gain of about 15% in complex martial arts anomaly detection. Neurocomputing, 2021, 444(15): action recognition. The innovation of InNet LSTM lies 332-337. in using Involution instead of traditional convolution https://doi.org/10.1016/j.neucom.2020.08.085. to achieve lightweight and efficient feature extraction, [3] L. Liu, L. Yang, W. Chen, X Gao. Dual-View 3D and combining LSTM for temporal modeling to human pose estimation without camera parameters capture motion dynamics. This method outperforms for action recognition. IET Image Processing, 2021, ResNet in accuracy, resource utilization, and 15(14): 3433-3440. computation time, and is suitable for martial arts action https://doi.org/10.1049/ipr2.12277. recognition and other dynamic scenarios. It has broad [4] B. Ferreira, P. M. Ferreira, G. Pinheiro, N. applicability and efficient real-time processing Figueiredo, F. Carvalho, P. Menezes, J. Batista. capabilities. However, there are still limitations when Deep learning approaches for workout repetition dealing with unstructured random actions. Due to the counting and validation. Pattern Recognition Letters, limitations of existing equipment, higher performance 2021, 151(12):259-266. hardware can be introduced in the future to optimize https://doi.org/10.1016/j.patrec.2021.09.015 training speed and expand the dataset size to enhance [5] H. Liu, Y. Chen, W. Zhao, S. Zhang, Z. Zhang. the system's generalization ability. Human pose recognition via adaptive distribution encoding for action perception in the self-regulated learning process. Infrared Physics and Technology, 6 Conclusion 2021, 114(5): 1036-1045. https://doi.org/10.1016/j.infrared.2021.103660. In response to the problem of manually designing [6] D. K. Vishwakarma. A two-fold transformation motion feature for recognition, which consumes model for human action recognition using decisive energy and has very low recognition efficiency, pose. Cognitive Systems Research, 2020, 61(6): research is conducted on improving human pose 1-13. https://doi.org/10.1016/j.cogsys.2019.12.001. estimation based on deep learning. Firstly, Involution [7] L. Tian, G. Liang, P. Wang, C. Shen. An adversarial is proposed as a feature extraction network for light human pose estimation network injected with graph weighting of human pose estimation, and each joint structure. Pattern Recognition, 2021, 115(2):31-40. point of the human body is labelled and classified https://doi.org/10.1016/j.patcog.2021.107863. separately. The experimental results show that the [8] X. Zhang, Z. Tang, J. Hou, Y. Hao. 3D human pose InNet method, which uses Involution instead of estimation via human structure-aware fully Convolution, decreases the number of parameters and connected network. Pattern Recognition Letters, the computational effort by about 40%. Comparing this 2019, 125(5): 404-410. method with other methods, the accuracy of the Eigen https://doi.org/10.1016/j.patrec.2019.04.007. method is 81.3%, the STOP method is 82.5%, the [9] A. Ht, C. Chh, B. Ttn, B. Dska. Image DMM & HOG method is 85.3%, the Actionlet method representation of pose -transition feature for 3D is 87.6% and the JAS & HOG2 method is 83.5%. The skeleton-based action recognition. Information accuracy of the InNet-LSTM method was 90.6%. It Sciences, 2020, 513(3): 112-126. can be seen that the method proposed in this study has https://doi.org/10.1016/j.ins.2019.12.063. Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 89 [10] V. Silva, N. Marana. Human action recognition 2020(12): 1-12. in videos based on spatiotemporal features and https://doi.org/10.1155/2020/8827468. bag-of-poses. Applied Soft Computing, 2020, [20] F. Daneshdoost, M. Hajiaghaei-Keshteli, R. Sahin. 95(1) 84-93. R. Tabu search based hybrid meta-heuristic https://doi.org/10.1016/j.asoc.2020.106513. approaches for schedule-based production cost [11] B. Sun, D. Kong, S. Wang, L. Wang, B. Yin. minimization problem for the case of cable Joint transferable dictionary learning and view manufacturing systems. Informatica, 2022, 33(3): adaptation for multi-view human action 499-522. https://doi.org/10.15388/21-INFOR471 recognition, ACM Transactions on Knowledge [21] G. Dzemyda, M. Sabaliauskas, V. Medvedev. Discovery from Data (TKDD), 2021, 2-55. Geometric MDS performance for large data ttps://doi.org/10.1145/3418897. dimensionality reduction and visualization. [12] L. Yu, L. Tian, Q. Du, J. Bhutto. Multi-stream Informatica, 2022, 33(2):299-320. adaptive spatial-temporal attention graph https://doi.org/10.15388/22-infor491. convolutional network for skeleton-based action recognition. IET Computer Vision, 2022, 162(2): 143-158. https://doi.org/10.1049/cvi2.12058. [13] M. S. Alsawadi, M. Rio. Skeleton split strategies for spatial temporal graph convolution networks, Computers. Materials and Continuum, 2022, 1(6):4643-4658. https://doi.org/10.32604/cmc.2022.028266. [14] Y. Hou, L. Wang, R. Sun, Y. Zhang, M. Gu, Y. Zhu, Y. Tong, X. Liu, X. Wang, J. Xia, Y. Hu, L. Wei, C. Yang, M. Chen. Crack-across-pore enabled high-performance flexible pressure sensors for deep neural network enhanced sensing and human action recognition. ACS NANO, 2022, 16(5): 8358-8369. https://doi.org/10.1021/acsnano.2c02609. [15] A. Gharahdaghi, F. Razzazi, A. Amini. A non-linear mapping representing human action recognition under missing modality problem in video data. Measurement, 2021, 186(3): 1123-1133. https://doi.org/10.1016/j.measurement.2020.1121 23. [16] W. Xu, M. Wu, J. Zhu, M. Zhou. Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT. Applied Soft Computing, 2021, 104(3):1568-1579. https://doi.org/10.1016/j.asoc.2021.107596. [17] H. B. Naeem, F. Murtaza, M. H. Yousaf, S. A. Velastin. T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition. Pattern Recognition Letters, 2021, 148(8): 22-28. https://doi.org/10.1016/j.patrec.2021.06.012. [18] M. Yang. Research on vehicle automatic driving target perception technology based on improved MSRPN algorithm. Journal of Computational and Cognitive Engineering, 2022, 1(3): 147-151. https://doi.org/10.47852/bonviewJCCE20514 [19] Y. Lin, W. Chi, W. Sun, S. Liu, D. Fan. Human action recognition algorithm based on improved resnet and skeletal keypoints in single image. Mathematical Problems in Engineering, 2020, 89 90 Informatica 49 (2025) 77–90 D. Chen et al. https://doi.org/10.31449/inf.v49i12.7592 Informatica 49 (2025) 91–104 91 Optimization of Elman Neural Network Using Genetic Algorithm for Construction Cost Estimation and Overspending Risk Analysis Qian Wu School of Urban Construction, Anhui Xinhua University, Hefei 230088, Anhui, China E-mail: wu2006qian@126.com Keywords: neural network, construction cost estimation, overspending risk, Elman network, genetic algorithm Received: November 14, 2024 This study proposes a model based on the Elman neural network and improves it using a Genetic Algorithm (GA) to increase the accuracy of construction cost estimation and accurately analyze the overspending risk. First, an index system containing multiple dimensions such as building features, structural features, project positioning, and project environment is constructed to comprehensively capture the key factors affecting construction cost and overspending risk. Second, the Elman neural network’s structure and operation are thoroughly examined, and the GA optimizes the network’s weights and thresholds to improve the model’s predictive power. On the training set, the optimized GA-Elman model demonstrates great prediction accuracy, with relative error (RE) percentages between predicted and true values typically falling within ±1%. On the test set, the GA-Elman model performs better than the original Elman model in both difference and RE, with a Mean Absolute Percentage Error of 2.75%, a decrease of 18.4% compared to the Elman model. These results indicate that the GA-Elman model is more accurate in cost prediction and more effective in identifying potential overspending risks. This study provides a powerful tool for cost control and budget management in the construction industry and a new perspective on the application of neural networks in construction economics. Povzetek: Razvit je model za ocenjevanje stroškov gradnje in analizo tveganja prekoračitve stroškov, ki temelji na Elmanovi nevronski mreži, optimizirani z genetskim algoritmom. Model je močno orodje za obvladovanje stroškov in upravljanje proračuna v gradbeništvu. 1 Introduction which involves creating a building cost model based on the Elman neural network and using a Genetic Algorithm In the construction industry, cost estimation is the core (GA) to optimize it. This approach can increase cost link of project management, directly related to the estimating accuracy while evaluating potential project's economic benefits and risk control. Traditional overspending risk analysis and offering construction cost estimation methods rely on expert experience and project management scientific decision assistance. historical data. Still, such methods are often influenced by The main contribution of this study is the proposal of subjective judgment and are difficult to adapt to the a construction cost estimation model based on the Elman rapidly changing market environment and complex and neural network combined with a GA, specifically: changing engineering conditions [1-3]. Traditional cost Firstly, GA is applied to optimize the Elman neural estimation procedures encounter increasing challenges as network, utilizing GA to improve the weights and building projects get larger and more complicated. As a thresholds of the neural network, thereby enhancing the result, new techniques and methodologies must be model's prediction accuracy and generalization ability. As introduced immediately to increase estimation efficiency a global search optimization tool, GA can avoid the and accuracy [4, 5]. problem of falling into local optimal solutions that is With the advancement of machine learning (ML) and common in traditional training processes. artificial intelligence in recent years, neural networks have Secondly, by constructing a comprehensive index shown to be a valuable tool for tackling challenging system and integrating it with the Elman neural network, forecasting issues. Because of their benefits in processing a more accurate method for construction cost prediction is sequence data, recurrent neural networks (RNNs) are provided compared to traditional models. Furthermore, widely applied across various fields, such as natural the model's applicability in complex construction projects language processing and time series prediction [6-8]. The is effectively improved through further optimization with Elman neural network, as a kind of RNN, enhances the GA. network's memory ability by introducing the context layer, Finally, the study focuses on the prediction of which makes it perform well in dealing with time- construction costs and proposes a new method for dependent sequence data [9]. assessing cost overrun risks. Through the model's dynamic This study explores the application of neural networks memory mechanism, it is possible to analyze the impact in construction cost estimation and overspending risk of historical data on future costs, identify potential risk analysis. A new approach to cost estimating is presented, 92 Informatica 49 (2025) 91–104 Q. Wu factors in advance, and provide decision support for These models could be used as decision-support tools for project management. construction project managers and practitioners to promote the development of automation research in the 2 Related work green building industry. Because neural networks can handle complicated In the construction industry, the accuracy of cost nonlinear interactions, they have emerged as a potent tool prediction is critical for the project's success. With the for cost prediction problems. Pham et al. proposed an ML development of information technology, more and more and optimization framework incorporating artificial neural researchers began to explore how to use advanced networks (ANNs) and gradient boosting models to technical means to improve cost prediction accuracy. estimate construction costs and optimize costs under Mahmoodzadeh et al. forecasted the geological conditions, budget constraints rapidly [12]. Goodarzizad et al. construction duration, and cost of tunnels using Gaussian improved the accuracy of construction labor productivity Process Regression (GPR), Support Vector Regression models for concrete pouring operations through a hybrid (SVR), and decision tree models. Through 50% cross- model developed by combining ANN and Grasshopper validation to evaluate the model's performance, it was optimization algorithms [13]. The study helped to improve found that GPR was superior to SVR and decision trees in project efficiency, increase labor productivity, and reduce prediction accuracy. Hence, GPR was recommended to costs. Kim et al. introduced an autoregressive integrated predict future tunnel projects' geological and construction moving average (ARIMA)-ANN model to predict time costs [10]. Alshboul et al. used an ML algorithm to construction costs. They found that the model provided predict the cost of green buildings, considering the more accurate predictions in most cases, especially for influence of related attributes of soft and hard costs. The long-term forecasting time limits, than standalone evaluation results showed that eXtreme Gradient Boosting ARIMA or ANN models [14]. The main contents of the (XGBoost) performed best in accuracy, followed by the above research are summarized in Table 1. deep neural network (DNN) and random forest (RF) [11]. Table 1: Summary of relevant research contents Model Method Dataset Key results ML method is used to predict GPR has better prediction tunnel geological conditions, accuracy than SVR and GPR, SVR, Decision construction period, and cost. decision tree. Meanwhile, it is Tunnel project data tree The model's performance is recommended for geological evaluated by 5-fold cross- and time cost prediction of validation. future tunnel projects XGBoost performs the best in prediction accuracy, with an By considering soft and hard cost accuracy of 0.96; This Is XGBoost, DNN, and Green building- attributes, ML methods are used followed by DNN (0.91) and RF related data to predict green building costs. RF (0.87), which can provide decision support tools for the green building industry. ANN and gradient boosting algorithms perform the best, 13 ML regression algorithms are estimating construction costs ANN, gradient employed to estimate Construction and required resources with boosting model construction costs and optimize configuration dataset 99% accuracy in less than 1 costs under budget constraints second of training time, and reducing costs by 7% through optimization. Labor productivity The combination of ANN and data for 24 The project efficiency is Hybrid model Grasshopper optimization commercial office improved, labor productivity is (ANN+Grasshopper algorithm improves the labor complex projects increased, and costs are algorithm) productivity model of concrete under construction in reduced pouring operation. Iran In most cases, especially in The ARIMA model is integrated National and city- long-term forecasting, hybrid ARIMA-ANN with ANN to predict construction level construction models have higher prediction model costs. cost index accuracy than ARIMA or ANN models used alone. Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 93 Although significant progress has been made in on the principles of comprehensiveness, scientificity, a construction cost estimation, there remain substantial combination of quantitative and qualitative methods, limitations in terms of generalization ability and dynamics, and operability. These indexes are chosen from overspending risk assessment. Specifically, many models four aspects: architectural features, structural features, rely on specific datasets, making it challenging to maintain project positioning, and project environment. The method prediction accuracy in new construction project scenarios. of literature analysis is used for this selection. The For instance, while models like GPR and XGBoost exhibit quantification of qualitative indexes is carried out [15-17]. high prediction accuracy on particular datasets, their In constructing the cost estimation and overspending risk performance may decline significantly when applied to index system, the selection of each index is based on its cross-dataset scenarios or when handling previously correlation with construction costs and overspending risk. unseen complex situations. Existing research tends to For example, in the case of exterior wall decoration, focus more on cost prediction accuracy, with less significant differences in the price and construction emphasis on the quantification and identification of techniques of different materials exist. Paint is relatively potential overspending risks. For complex construction inexpensive, while materials such as stone and glass projects, such models lacking risk assessment abilities curtain walls are more costly and have longer construction could lead to delayed cost control decisions. To address periods, potentially increasing the overspending risk [18]. these shortcomings, this study proposes a construction Similarly, the technical personnel level directly influences cost estimation model based on the Elman neural network, construction efficiency and quality. Low technical levels optimized with a GA. The GA enhances the model's global may lead to rework and delays, thus increasing both cost search capability by optimizing the initial weights and and the probability of overspending [19]. Architectural thresholds of the Elman neural network, thereby features such as floor area and standard floor height improving its prediction performance across different determine material usage and construction complexity, datasets and complex scenarios. The dynamic memory directly affecting the total project cost. Structural features, mechanism of the Elman neural network enables it to including the prefabrication rate and component capture long-term dependencies in time-series data, differentiation, relate to the efficiency and cost control allowing the analysis of cost trends and forecasting capacity of prefabricated construction. Project potential overspending risks. Moreover, by designing a environmental factors, such as project management level comprehensive overspending risk index system, the model and transportation distance, reflect the impact of can quantitatively identify key factors that lead to cost management efficiency and logistics on cost. These deviations, providing a basis for risk prevention and indexes are validated through literature analysis and control. practical engineering experience, demonstrating their key role in cost control and overspending risk, thereby 3 Construction cost estimation model providing a theoretical foundation for the model's scientific and comprehensive nature. The finalized index based on elman neural network system for assembly construction cost estimation prediction is outlined in Table 2. 3.1 Construction cost estimation and construction of overspending risk index system The study focuses on assembly buildings. The selection of indexes affecting the cost and overspending risk is based Table 2: Construction cost estimation and overspending risk index system and assignment of values Primary index Secondary index Nature of the index Assignment of qualitative index Number of floors A1 Quantitative index - Architectural Building area A2 Quantitative index - features Standard floor height A3 Quantitative index - 1=internally cast and externally hung shear wall structure; 2=stacked shear wall structure; 3=assembled Structure type A4 Qualitative index monolithic frame structure; 4=assembled monolithic shear wall Structural structure features 1 = independent foundation; 2 = pile foundation; 3 = raft slab foundation; Foundation type A5 Qualitative index 4 = pile raft foundation; 5 = box foundation Prefabrication rate A6 Quantitative index - 94 Informatica 49 (2025) 91–104 Q. Wu 1 = laminated panels/air conditioning panels/drift Component type A7 Qualitative index windows/enclosures; 2 = prefabricated stairs; 3 = beams/columns/shear walls Differentiation degree of Quantitative index - components A8 1=paint; 2=real stone paint; 3=glass Exterior wall decoration A9 Qualitative index curtain wall; 4=aluminum panel; 5=stone 1=general plaster; 2=plaster; 3=large Interior wall decoration A10 Qualitative index white; 4=latex paint; 5=wall tiles; 6=wallpaper Project 1=concrete topping; 2=ordinary positioning Ground engineering A11 Qualitative index tiles; 3=flooring; 4=premium tiles 1=plastic steel window + steel door; 2=aluminum alloy window + steel Door and window type A12 Qualitative index door; 3=plastic steel window + fire door; 4=aluminum alloy window + fire door 1=excellent; 2=good; 3=medium; Technical personnel level A13 Qualitative index 4=poor Project 1=excellent; 2=good; 3=medium; environment Project management level A14 Qualitative index 4=poor Transportation distance A15 Quantitative index - In the above index system, the three indexes of controlled. Moreover, indexes in the project environment architectural features are directly related to the building's reflect the efficiency of project management and the physical size and construction complexity, affecting impact of external conditions on costs, which are key material costs and labor requirements. These in turn affect factors in cost control and risk management. This system cost control and the risk of overspending. The indexes of helps to forecast costs more accurately while identifying structural features determine the structural stability and and controlling factors that may lead to overspending. construction methods, significantly impacting material In the above index system, the priority of each index selection and supply chain management, thus correlating varies depending on its impact on costs and overspending with the overspending risk. Project positioning includes risks. To ensure that the indicator system can qualitative indexes such as exterior and interior wall comprehensively and scientifically reflect the risk of cost decorations, ground engineering, and window and door overruns, the Analytic Hierarchy Process is used to assign types. These choices affect the building's aesthetics and weights to each index. The results are exhibited in Table functionality while leading to increased costs, which may 3. increase overspending risk if costs are not properly Table 3: Index system weight Primary index Weight of primary Secondary index Final weight index Architectural features 0.162 Number of floors A1 0.054 Building area A2 0.054 Standard layer height A3 0.054 Structural features 0.409 Structure type A4 0.128 Foundation type A5 0.073 Prefabrication rate A6 0.053 Component type A7 0.069 Differentiation degree of 0.086 components A8 Project positioning 0.290 Exterior wall decoration A9 0.044 Interior wall decoration A10 0.068 Ground engineering A11 0.121 Door and window type A12 0.057 Project environment 0.139 Technical personnel levelA13 0.073 Project management level A14 0.046 Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 95 Transportation distance A15 0.020 In Table 3, structural features hold the highest weight among the primary indexes, accounting for 40.9%, 3.2 Elman neural network modeling indicating their most significant impact on both analysis construction costs and overspending risk. Among these, A4 and A8 have relatively higher weights of 0.128 and The Elman neural network's key feature is the 0.086, respectively, reflecting the crucial role of building incorporation of a context layer, which preserves the structure complexity and differentiation in cost control. hidden layer's state from a previous time step [20]. This The project positioning index ranks second, accounting enables the Elman network to process time-series data, for 29.0%, with A11 having the highest weight of 0.121, capturing the dynamics of the input data and the emphasizing its importance in construction decoration underlying temporal relationships, making it suitable for costs. The weights for architectural features and project time-dependent data prediction tasks such as construction environment are relatively lower. However, among the costs. The network creates a short-term memory secondary indexes, A13 and A2 stand out with weights of mechanism by feeding past information back to the 0.073 and 0.054, respectively, highlighting their influence current moment, which enhances its ability to model on construction efficiency and total cost prediction. This nonlinearities in dynamically changing processes. Unlike weight allocation method enables the index system to traditional feed-forward neural networks, the Elman more scientifically reflect the contribution of various network has feedback connections between the hidden and factors to cost and overspending risk, providing a solid context layers. These feedback signals allow the network foundation for subsequent model predictions and risk to retain information from previous states, providing analysis. valuable contextual input for subsequent computations [21, 22]. Figure 1 depicts the Elman neural network's basic structure. Hidden layer Input layer h1 Output layer h2 h3 · · · · · · · · · hn Context Layer c1 c2 c3 cr Figure 1: Schematic diagram of Elman network structure The core principle of the Elman network is as follows. 𝑦 𝑦(𝑡) = 𝑔(𝑤ℎ𝑦𝑤𝑐𝑗ℎ(𝑡)) (1) First, the output vector 𝑦(𝑡) of the network is obtained 𝑤ℎ𝑦 denotes the weight matrix between the hidden from the output vector ℎ(𝑡) of the implicit layer through and output layers. Secondly, the output ℎ(𝑡) of the the nonlinear transformation function 𝑔(∗)of the output implicit layer is obtained from the current input 𝑣(𝑡 − 1) layer with the expression (1): and the output 𝑐(𝑡) of the context layer through the 96 Informatica 49 (2025) 91–104 Q. Wu nonlinear transformation function 𝑓(∗) of the implicit 𝑐(𝑡) = ℎ(𝑡 − 1) (3) layer with the expression (2): This structure allows the Elman network to capture ℎ(𝑡) = 𝑓(𝑤𝑥ℎ𝑣(𝑡 − 1) + 𝑤𝑐ℎ𝑐(𝑡)) (2) the temporal dynamics of the input data. For construction 𝑤𝑥ℎ refers to the weight matrix from the input to the cost estimation, it means that the network can consider the hidden layer. 𝑤𝑐ℎ denotes the weight matrix from the impact of historical cost data on current cost estimates, takeover layer to the hidden layer. Finally, the output 𝑐(𝑡) thus improving the accuracy of the predictions. of the take-on layer is the output ℎ(𝑡 − 1) of the implicit Furthermore, the computational flow of the Elman layer at the previous time step, that is (3): network is suggested in Figure 2. Start Initialize the weights of each layer Input sample A series of calculation Calculate the input Calculate the output of steps layer output the receiving layer No Does the error meet the Calculate hidden layer Yes requirements or reach End output the maximum number of training steps? Calculate the output of Calculate the output of Calculation the context layer the receiving layer error Figure 2: Elman network computational flow In Figure 2, the network initializes the weights of each error E does not decrease sufficiently, the training cycle layer as a necessary preparation before training starts. The continues, with the weights being adjusted to reduce the initial setup of these weights significantly impacts the prediction error. This process is repeated until the network learning effectiveness and overall performance of the performs adequately or the training reaches the set number network. Network learning is then built on the input of iterations. samples, which include past construction project cost data In the above step, the error E is used to measure the and other pertinent features. The outputs of the input, difference between the predicted output of the network, hidden, and output layers are then computed sequentially. 𝑦(𝑡), and the desired output as ?̂?(𝑡), calculated as (4): Meanwhile, after obtaining the output of the hidden layer, 1 𝑇 𝐸 = (𝑦(𝑡) − ?̌?(𝑡)) (𝑦(𝑡) − ?̂?(𝑡)) (4) the output of the context layer is further computed. In this 2 To adjust the weights, the partial derivatives of the step, the current output of the hidden layer is used as the error E with respect to the weights need to be calculated. input for the context layer in the next time step. This step 𝑦 The partial derivatives of the weights 𝑤 is the key to the short-term memory mechanism of the 𝑗𝑖 for the output Elman network, allowing it to retain information from layer are (5): 𝜕𝐸 𝜕𝑦 (𝑡) previous states while processing sequential data. The 𝑦 = −(?̂? (𝑡)) 𝑖 𝑦 = −(?̂? ) − 𝑦(𝑡)𝑔′ (∗)𝑥 (𝑡)) (5) 𝜕𝑤 𝑑,𝑖(𝑡) − 𝑦 𝜕𝑤 𝑑,𝑖(𝑡 𝑗 𝑖 𝑗𝑖 𝑗𝑖 output layer error is determined by comparing the actual 𝑦 𝑤 cost data with the network's predicted outputs, following 𝑗𝑖 y refers to the weight connecting the ith input unit and the jth output unit; 𝑔′ (∗) represents the derivative of the computation of outputs across all layers. A critical 𝑗 element of supervised learning, this error computation the activation function of the output layer; 𝑥𝑖(𝑡) denotes (denoted as E) provides the network with feedback for the output of the ith input unit at time t. Let 𝜑0 𝑗 = adjusting its parameters. Lastly, the error E is utilized to (?̂?𝑑,𝑖(𝑡) − 𝑦(𝑡)𝑔′ (∗), so (6): 𝑗 check if the maximum number of training steps has been 𝜕𝐸 0 𝑦 = −𝜑 𝑖 = 1,2,⋯ ,𝑚; 𝑗 = 1,2,⋯ , 𝑛 (6) completed or if the predefined requirements are met. If the 𝜕𝑤 𝑗 𝑥𝑖(𝑡), 𝑗𝑖 Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 97 𝑚 is the number of neurons in the input layer and 𝑛 is Algorithm: Elman Neural Network the number of neurons in the hidden layer. Taking 𝐸 as the partial derivative of the input layer Input: weight 𝑤𝑥 𝑗𝑖 , it can get (7): 𝜕𝐸 𝜕𝐸 𝜕𝑥 - Training data = 𝑖(𝑡) = ∑𝑚 𝑖=1(−𝜑 0𝑤𝑥 ′ ( 𝑣 𝜕𝑤𝑥 𝑞(𝑡 − 1) (7) - Learning rate 𝑗𝑖 𝜕𝑥𝑖(𝑡) 𝜕𝑤𝑥 𝑗 𝑗𝑖)𝑓 ∗) 𝑖 𝑗𝑖 𝑓′ (∗) denotes the derivative of the hidden layer - Maximum iterations 𝑖 activation function. Let 𝜑ℎ 𝑚 𝑗 = ∑𝑖=1(−𝜑 0 𝑥 𝑗𝑤𝑗𝑖)𝑓 ′ (∗) L, 𝑖 then get (8): Initialization: 𝜕𝐸 - Randomly initialize weights = −𝜑ℎ𝑣 ,𝑚; 𝑗 = 1,2,⋯ , 𝑛; 𝑞 = 1,2,⋯ , 𝑟 (8) 𝜕𝑤𝑥 𝑗 𝑞(𝑡 − 1), 𝑖 = 1,2,⋯ 𝑗𝑖 - Set initial context layer to zero 𝑟 is the number of neurons in the splice layer. The partial derivative of the connection weight 𝑤𝑐 𝑗𝑙 is Training: obtained (9): Repeat until convergence or maximum iterations: 𝜕𝐸 = ∑𝑚 (−𝜑0 𝑥 𝜕𝑥𝑖(𝑡) 𝜕𝑤𝑐 𝑖=1 𝑗𝑤𝑗𝑖) 1,2,⋯ , 𝑛 (9) 1. Compute hidden layer output 𝑗𝑙 𝜕𝑤𝑐 , 𝑙 = 1,2,⋯ , 𝑛; 𝑗 = 𝑗𝑙 According to the chain rule (10): 2. Update context layer 𝜕𝑥𝑗(𝑡) 𝜕 𝑓 𝑛 𝑟 𝑥 3. Compute network output 𝜕𝑤𝑐 = 𝑗(∑𝑖=1𝑤 𝑐 𝑗𝑖𝑥𝑐,𝑖(𝑡) + ∑𝑖=1𝑤𝑗𝑙𝑣𝑖(𝑡 − 1)) = 𝑓′ (∗)𝑥 (𝑡) + 𝑗 𝑐,𝑖 𝑗𝑙 𝜕𝑤𝑐 𝑗𝑙 ) 4. Calculate error ∑ 𝑦 𝜕𝑥 𝑤 𝑐,𝑖(𝑡 𝑗𝑖 𝑦 (10) 𝜕𝑤 𝑗𝑙 5. Backpropagate and update weights The dependence of 𝑥𝑐(𝑡) on the connection weight 𝑦 𝑤𝑗𝑖 is ignored, and the following results are obtained (11) Prediction: and (12): For each input in test data: 𝜕𝑥𝑗(𝑡) 1. Compute hidden layer output 𝑓′ (∗)𝑥 1) 𝜕𝑤𝑥 = 𝑗 𝑐,𝑙(𝑡) (1 𝑗𝑙 2. Update context layer 𝑓′ (∗)𝑥 = 𝑓′ (∗)𝑥 ∗ 𝑓′ (∗)𝑥 12) 𝑗 𝑐,𝑙(𝑡) 𝑗 𝑙(𝑡 − 1) + 𝛼 𝑗 𝑐,𝑙(𝑡) ( 3. Compute final output 𝛼 refers to the forgetting factor. By substituting Figure 3: The pseudocode for the Elman model equation (12) into equation (11), it can obtain (13): 𝜕𝑥𝑗(𝑡) 𝜕𝑥𝑗(𝑡−1) 𝑦 = 𝑓′ (∗)𝑥 𝑦 (13) 𝜕𝑤 𝑗 𝑙(𝑡 − 1) + 𝛼 ∗ 𝜕𝑤 3.3 Optimization of the Elman model 𝑗𝑙 𝑗𝑙 Elman's equation (14)-(18) is derived from ∆𝑊 = based on GA 𝜕𝐸 −𝜂 : Although the Elman neural network has remarkable 𝜕𝑊 𝑦 ∆𝑤 advantages in processing time series data, its performance 𝑗𝑖 = 𝜂𝜑0 𝑗 𝑥𝑗(𝑡), 𝑖 = 1,2,⋯ ,𝑚; 𝑗 = 1,2,⋯ , 𝑛 (14) is highly dependent on the initial weight settings and the ∆𝑤𝑐 𝑗𝑞 = 𝜂𝜑ℎ 𝑗 𝑣𝑞(𝑡 − 1), 𝑗 = 1,2,⋯ , 𝑛; 𝑞 = 1,2,⋯ , 𝑟 (15) choice of network structure. In addition, the Elman ∆𝑤𝑥 𝑚 0 𝑥 𝜕𝑥𝑗(𝑡) 0 𝑗𝑙 = 𝜂∑𝑖=1(𝜑𝑗𝑤𝑗𝑖) = 1,2,⋯ , 𝑛; 𝑙 = 1,2,⋯ , 𝑛 (16) 𝜕𝑤𝑥 𝜑𝑗 𝑥𝑗(𝑡), 𝑗 𝑗𝑙 network is easily affected by local minimum, which can 𝜂 is the learning rate. Meanwhile, lead to suboptimal solutions and negatively impact 𝜑0 𝑗 = (?̂?𝑑,𝑖(𝑡) − 𝑦(𝑡)𝑔′ (∗) (17) 𝑗 prediction accuracy and generalization ability [23]. To 𝜑ℎ 𝑚 0 𝑥 ′ 𝑗 = ∑𝑖=1(−𝜑𝑗𝑤𝑗𝑖)𝑓 (∗) (18) 𝑖 overcome these limitations, GA is introduced to optimize Through this calculation process, the Elman network the Elman model. Darwin's theory of natural selection and can gradually learn the complex relationship between the global search principle of biogenetics serve as the building cost data and complete cost prediction. This foundation for GA, an optimization algorithm designed to dynamic learning and forecasting mechanism makes the mimic the natural evolution process. Biological evolution Elman network perform well in dealing with time series mechanisms, including natural selection, genetic forecasting problems such as construction cost estimation. variation, and crossover, are simulated by GA, which is The pseudocode for the Elman model is illustrated in Figure 3. extensively used to tackle complicated combinatorial optimization problems by gradually improving the quality of solutions. GA has strong global search ability and adaptability, and can effectively deal with optimization problems under high dimensional, nonlinear, and complex constraints [24]. The basic idea of GA is to simulate natural selection and genetic mechanisms by operating a population composed of multiple individuals to produce better solutions. Although GA possesses global search capabilities and strong adaptability, there are certain limitations in its optimization process. GA may encounter issues of high computational complexity and time costs when dealing with large-scale datasets. Additionally, the 98 Informatica 49 (2025) 91–104 Q. Wu convergence speed of GA can be slow, especially in large members of the current population are chosen to go into search spaces, where there is a risk of premature the next generation based on their fitness values. This step convergence or falling into local optimal solutions [25]. imitates the natural selection process of "survival of the The implementation steps of GA are displayed in Figure fittest" in biology. (4) Cross operation. Individuals are randomly paired 4. from the selected ones and undergo a single-point Start crossover operation according to a set crossover probability (0.6). This involves randomly selecting a position in the chromosome and exchanging the gene Population initialization segments before and after that position, generating new combinations of weights and thresholds. This method improves search efficiency by exploring different Computational Get a new parameter combinations. fitness population (5) Mutation operation. A small probability (0.2) is used to randomly mutate certain genes of the selected Check if individuals. The specific method is to add a random Yes End termination Variation disturbance that follows a normal distribution (e.g., with a condition is met? mean of 0 and a standard deviation of 0.1) to the original weights or thresholds. Thus, it can increase the diversity No of the population and avoid local optimal solutions. Select Intersect (6) Termination conditions. For one thing, the algorithm automatically stops when it reaches the preset Figure 4: GA implementation process maximum number of iterations (200 times). For another, if the optimal fitness value of the population does not This study uses the GA to optimize the adjustment of improve by more than a predetermined threshold (0.001) Elman network weights and thresholds, and the specific over a continuous number of generations (20), it is steps are as follows [26, 27]. considered that the algorithm has converged. In addition, (1) Population initialization. Several initial the optimization process is terminated early. By individuals are randomly generated in the solution space, introducing these clear stopping criteria, the stability of and each individual corresponds to a set of potential the optimization process can be effectively ensured, while Elman network weights and thresholds. Each individual also enhancing the applicability and reliability of the can be regarded as the coding form of Elman network algorithm in practical problems. parameters (real number coding), including the connection Through the aforementioned optimization process, weights between input and hidden layers, hidden and GA can effectively adjust the weights and thresholds of output layers, and the threshold of each neuron. the Elman network, improving the model's generalization (2) Fitness calculation. According to the performance ability and prediction accuracy. The rationality of index of the Elman network (for example, the mean square parameter settings is determined through multiple error of construction cost estimation), the fitness of each experimental tests. Meanwhile, the specific individual is evaluated. The network corresponding to the implementation of crossover and mutation ensures a high individual performs better on a given task the higher the degree of repeatability in the study, providing an effective fitness. modeling tool for complex construction cost estimation (3) Selection of the operation. Using probability tasks. techniques like roulette wheel selection, the fittest Figure 5 shows the calculation flow of the finally formed GA-Elman model based on GA. Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 99 Input sample Start Select Initialize the The error obtained by Encoding the initial weights of each Elman training is used value layer as the fitness value Intersect No Is the evolutionary The best coverage is the Computational iteration worst fitness completed? Variation Yes Obtain the optimal initial weight A series of calculation Calculate the input Calculate the output of steps layer output the receiving layer No Does the error meet the Calculate hidden layer Yes requirements or reach End output the maximum number of training steps? Calculate the output of Calculate the output of Calculation the context layer the receiving layer error Figure 5: Calculation flow of GA-Elman model The GA optimization of the Elman neural network and high, providing decision-makers with a more intuitive can reduce the probability of the model reaching local risk assessment index. optima, and enhance the network's global search ability. Furthermore, the model quantifies the key risk factors Meanwhile, it can accelerate the convergence speed of the through a comprehensive index system. The index system training process and improve the model's prediction designed in this study encompasses four major dimensions: accuracy. This is especially important for complex architectural features, structural features, project construction cost estimation tasks. Especially when faced positioning, and project environment. Within each with time-related data, the optimized GA-Elman network dimension, specific indexes are assigned different weights can better capture the dynamic characteristics of data and to reflect their relative importance in contributing to cost realize more accurate cost estimation and risk prediction. overruns. For instance, in the architectural features dimension, the "number of floors" and "building area" 3.4 Application of cost estimation model in directly influence material and labor costs, with their overspending risk weights determined by principal component analysis. In contrast, in the project environment dimension, In the cost management of construction projects, the "management level" and "technical personnel level" are assessment and control of overspending risk is a crucial quantified using fuzzy comprehensive evaluation methods. link. The assessment of overspending risk relies on the The distribution of risk factor weights follows (20): accuracy of cost estimation while requiring scientific 𝑣 𝑤 𝑖 𝑖 = (20) quantification of risk factors and their weights. The GA- 𝑣 Elman model can accurately capture the time series 𝑤𝑖 represents the weight of the ith risk factor, with a characteristics of cost data through dynamic memory value range of 0 to 1 and a total weight of 1; 𝑣𝑖 refers to mechanisms, offering vital support for the quantitative the contribution of the ith index to the total deviation; 𝑣 assessment of overspending risk. Firstly, the assessment denotes the total deviation. The GA-Elman model can of overspending risk is based on the cost deviation rate 𝑝, identify and predict the primary risk factors leading to and the degree of risk is quantified by the deviation overspending through historical data. For example, the between the model's predicted value 𝑐′ and the actual cost model can use retrospective analysis to determine that value 𝑐. The specific calculation reads (19): material price fluctuations contribute 35% to cost |𝑐−𝑐′| deviations, construction delays account for 25%, design 𝑝 = × 100% (19) 𝑐′ changes contribute 20%, and other factors make up 20%. In this context, the higher the deviation rate, the This detailed quantitative analysis helps managers greater the overspending risk. Based on this deviation rate, pinpoint key risk sources and provides data support for the risk can be classified into three levels: low, medium, formulating targeted risk control strategies. 100 Informatica 49 (2025) 91–104 Q. Wu Additionally, the GA-Elman model simulates the 4 Model Performance verification impact of different cost control strategies on overspending risk. For instance, in the case of significant material price 4.1 Data source and experimental design fluctuations, the model can simulate cost trends for diverse procurement strategies (such as bulk purchasing in To ensure the universality and representativeness of the advance or phased procurement) and assess the mitigation experiment, data are collected from multiple sources, effects of each strategy on overspending risk. This data- ensuring the diversity and reliability of the data. The social driven simulation analysis offers project managers a and economic development level of each region and the scientific decision-making tool. number of prefabricated buildings built are To sum up, the GA-Elman model in overspending comprehensively considered. The basic data are obtained evaluation provides intuitive risk levels through the from professional platforms such as the China quantification of cost deviations. Meanwhile, it offers a Prefabricated Building Market Analysis Report, systematic approach to risk identification, assessment, and Prefabricated Building Network, and Zhongce Big Data control through the weight allocation to key risk factors Website. Additionally, data from 45 groups of and simulation analysis. By applying this model in-depth, prefabricated building projects in cities such as Beijing, project managers can remarkably improve risk Tianjin, Hebei, and Shenyang over the past four years are management efficiency, reduce economic losses caused collected. These data cover many dimensions, such as by overspending, and ultimately enhance the construction architectural features, structural features, project projects' cost-effectiveness and success rate. positioning, and project environment, offering rich information for model training and testing. Taking the indexes A1-A3 of architectural features as an example, the variance analysis of these data is detailed in Table 4. Table 4: Variance analysis of architectural feature indexes Difference Sum of Squares Degrees of Freedom Mean Square F P-value F crit source Row 3,417,030,830 44 77,659,791 1.000 0.488 1.515 Column 4,022,937,273 2 2,011,468,636 25.902 0.000 3.100 Error 6,833,762,444 88 77,656,391 Table 4 shows significant mean differences (P<0.05) model evaluation results are improved. The experimental among variables A1, A2, and A3, while the differences setup and parameter values are shown in Table 5. between samples are not significant. This indicates that different samples have a relatively small impact on the Table 5: Experimental environment and parameter results of variance analysis. These data can more setting comprehensively illustrate the distribution characteristics Hardware/parameter name Parameter/value of architectural feature data, providing data support for Operating system Windows10 model prediction. To enhance the model's generalization CPU AMD R7-5800H ability, the gathered data are normalized to remove the Basic frequency 3.2 GHz impact of varying dimensions and ordering. Specifically, Display card RTX3060 the Min-Max normalization method is adopted to map the Memory 16 GB data values of each index to the interval [0, 1], and the Hard disc 512 G SSD normalization equation is as follows (21): 𝑋−𝑋 Input layer node 15 𝑋′ = 𝑚𝑖𝑛 (21) 𝑋𝑚𝑎𝑥−𝑋𝑚𝑖𝑛 Output layer node 1 X is the original data; 𝑋𝑚𝑖𝑛 and 𝑋𝑚𝑎𝑥 are the Hidden layer node 10 minimum and maximum values of the index, respectively. Maximum number of iterations 200 Through this method, the differences in dimensions and Error tolerance 1×10-5 magnitudes between different indicators have been Evolutionary algebra 20 eliminated, ensuring the stability and accuracy of the Population size 10 model during training and testing. The training set Cross probability 0.6 comprises 36 sets of data; The test set contains 9 sets of Mutation probability 0.2 data, which are randomly selected from the dataset and arranged in a 4:1 ratio. Furthermore, to comprehensively Relative Error (RE) and Mean Absolute Percentage evaluate the performance and reliability of the model, this Error (MAPE) are used as evaluation indexes to evaluate study further adopts the k-fold cross-validation technique the accuracy of prediction results. The calculation (k=5) based on the division of training and testing data. By equations of them are (22) and (23): partitioning the dataset k times to ensure that each subset 𝑦′𝑖−𝑦 participates in training and validation, the potential RE = 𝑖 0 ( 2 𝑦′ ∗ 10 % 2 ) 𝑖 random errors caused by a single partition are effectively 1 |𝑦 −𝑦′ | 𝑀𝐴𝑃𝐸 = ∑𝑛 𝑖 𝑖 𝑁 𝑖=1 (23) reduced. In addition, the stability and credibility of the 𝑦𝑖 Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 101 𝑁 represents the number of samples. 𝑦𝑖 and 𝑦′𝑖 refer thereby enhancing the model's ability to learn nonlinear to the predicted and actual values. In the cost estimation relationships. Moreover, integrating learning methods or model, REP measures the difference between the hybrid model structures can be introduced to combine the predicted and actual costs to evaluate the model's advantages of multiple algorithms and improve the prediction performance. MAPE index can directly reflect model's generalization ability. Lastly, for key features the RE between the actual and predicted values of the such as material prices and construction conditions, model, and it is an important index to measure the model's targeted feature engineering strategies can be designed to prediction performance. ensure that the model can more accurately capture their impacts, thus reducing the occurrence of extreme errors. 4.2 Test results of the GA-Elman model Taking the Elman network, RNN, and SVR as the benchmark model, the test set is tested on the GA-Elman Firstly, the GA-Elman model is trained, and its training and benchmark models, respectively, and the results are result in the training set is presented in Figure 6. revealed in Figure 7. 4,000 True value 26 3,500 Predicted value 24 True value Relative error 22 Predicted value _ Elman 20 Predicted value _ GA-Elman 3,500 Predicted value _ RNN 18 Predicted value _ SVR 16 3,000 14 3,000 12 10 8 6 2,500 2,500 4 2 0 -2 2,000 -4 2,000 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 Sample number Sample number Figure 6: Training results of the GA-Elman model in the Figure 7: Comparison between the GA-Elman model and training set benchmark model Results in Figure 6 demonstrate that the GA-Elman On most test samples, the predicted value of the GA- model has good prediction accuracy. This is because the Elman model in Figure 7 is more similar to the true value. predicted values for most samples are extremely close to The maximum differences between the predicted and the true values and the RE percentage is typically less than actual results for the Elman network, RNN, and SVR are 1%. However, there are also some samples with large 118.99, 117.65, and 102.94, respectively. The GA-Elman prediction errors, such as Samples 14 and 32, with RE model's maximum difference between the predicted and percentages as high as 9.816% and 24.284%. The reasons true values is 87.21. These results show that the GA- for these issues may be attributed to several factors. Firstly, Elman model optimized by GA has higher prediction the data characteristics of these samples may significantly accuracy and robustness in construction cost estimation, deviate from the overall distribution of the training set, thus verifying the effectiveness of GA in neural network such as abnormal fluctuations in key factors like material weight optimization. prices, construction conditions, or design complexity. For instance, Sample 32 may have actual costs that far exceed 4.3 Comparison of cost estimation results the model's predictions due to the use of certain specific before and after Elman model optimization processes or unexpected construction delays. Secondly, the model may exhibit limitations in handling rare features To further compare the cost estimation results before and in small samples, especially when these features are not after the optimization of the Elman model, the difference adequately represented in the training data, making it between the predicted and true value and the RE of the difficult for the model to capture their nonlinear four models are calculated, as denoted in Figure 8. relationships. Additionally, the data preprocessing process may not have eliminated the effects of noise or outliers, which could also amplify errors. To address the aforementioned issues, the following approaches can be taken. Firstly, it is necessary to optimize data preprocessing methods by employing techniques such as denoising and smoothing to improve data quality. Meanwhile, the detection and handling of outliers are strengthened to reduce the noise interference on the model. Secondly, the sample diversity of the training dataset is expanded, particularly for samples with rare or abnormal features, by increasing the proportion of related data, Output result Relative error (%) Output result 102 Informatica 49 (2025) 91–104 Q. Wu Differential value _ Elman RE_Elman In addition, the training time of the GA-Elman and Differential value _ GA-Elman RE_GA-Elman Elman models is compared, and the results are listed in 150 Differential value _ RNN RE_RNN 6 Table 6. Differential value _ SVR RE_SVR 100 4 Table 6: Comparison of training time between GA- Elman and Elman models 2 50 Training dataset Training Model size (number of time 0 0 samples) (seconds) 100 12.36 -2 Elman -50 500 56.47 model 1,000 115.82 -4 100 18.75 -100 GA-Elman 500 72.93 -6 model 1 2 3 4 5 6 7 8 9 1,000 142.68 Sample number Figure 8: Analysis of cost prediction results of four Table 6 indicates that the training time of the GA- models Elman model is slightly higher than that of the traditional Elman model, primarily due to the additional optimization In Figure 8, the differences and REs of the GA-Elman step introduced by the GA. However, this extra model across all test samples are generally lower than computational cost is justified, as the GA-Elman model those of the Elman model. The mean absolute difference optimizes the network's initial parameters and weights between the predicted and actual values for the GA-Elman through GA, significantly improving both prediction model is 70.93, while for the Elman network, RNN, and accuracy and generalization ability. Specifically, when the SVR, they are 86.38, 87.83, and 87.63, respectively. In sample size is small (e.g., 100 samples), the training time some samples, the GA-Elman model still exhibits of the GA-Elman model is 18.75 seconds, only 6.39 relatively large errors. The main reasons for these larger seconds longer than the Elman model. When the sample errors are twofold. First, data irregularity. For instance, size increases to 1,000, the training time becomes 142.68 Sample 8 may have been affected by drastic fluctuations seconds, which is 26.86 seconds longer than the Elman in material prices or abnormal construction environments, model. This increase in training time is acceptable in light leading to actual costs significantly higher than the of the improvements in prediction performance. model's predictions. However, these exceptional From both a construction and economic perspective, situations are not adequately represented in the training the improvements made by the GA-Elman model are data. Second, model limitations. The GA-Elman model significant. In construction management, accurate cost has enhanced its ability to capture nonlinear features forecasting is crucial for budget control and risk through parameter optimization by GA. Nevertheless, it mitigation. The GA-Elman model's high prediction may still be insufficiently responsive to the dynamic accuracy (with a MAPE of only 2.75%) enables it to changes of certain key influencing factors, such as capture the complex nonlinear relationships in unexpected design changes or construction delays. construction costs, thus providing project managers with Meanwhile, the calculated MAPE for the GA-Elman more reliable decision support. This capability is model is 2.75%, which is significantly reduced compared especially beneficial for large and complex projects, as it to the Elman model's 3.37%. The MAPEs for RNN and helps reduce overspending risks and delays due to budget SVR are 3.46% and 3.45%, respectively, higher than that miscalculations. Additionally, by accurately assessing key of the GA-Elman model. This further demonstrates the influencing factors (such as material prices and effectiveness of GA in optimizing neural network construction conditions), the model helps managers parameters and improving prediction accuracy. These identify potential risks earlier, allowing for timely results show that GA-Elman model is more accurate in adjustments in construction plans and financial allocations. capturing the complex relationship of construction cost From an economic perspective, the application of the data, thus providing more reliable support in cost GA-Elman model in budget optimization remarkably estimation and overspending risk assessment of improves resource allocation efficiency. Compared to the construction projects. traditional Elman model and other benchmark models, the GA-Elman model offers a clear advantage in effectively reducing unnecessary financial waste and optimizing financial planning. For example, for cost-sensitive samples (such as Samples 14 and 32), there is still some error. However, the model provides managers with a cost estimate closer to the actual values, laying a foundation for reasonable financial resource distribution and cash flow control. Moreover, the GA-Elman model's ability to identify and quantify overspending risk allows enterprises Differential value Relative Error (%) Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 103 to develop more scientifically-based long-term financial 5 Conclusion strategies, thereby reducing the economic losses caused by uncontrollable costs. This study analyzes the application of the GA-Elman In conclusion, the GA-Elman model has considerable model in construction cost estimation and overspending potential in construction cost estimation and economic risk analysis by constructing a construction cost risk management. It enhances the intelligence level of estimation model based on the Elman network and construction management while providing a reliable tool optimizing the model with GA. It verifies the performance for budget optimization and cost control. The model of the model through experiments. The conclusions are as contributes positively to lean management and improved follows. (1) The GA-Elman model's high prediction economic efficiency in the construction industry. accuracy is demonstrated by the fact that, on the training set, the predicted value on most samples is very near to the true value and the RE percentage is typically within 1%. 4.4 Discussion (2) When compared to the Elman network, the GA-Elman Compared to the existing models summarized in model's projected value is closer to the actual value, and Table 1, the proposed GA-Elman model demonstrates on all test samples, the model's difference and RE are significant advantages in construction cost estimation and typically smaller than those of the Elman model. (3) The overspending risk assessment. In contrast to models such GA-Elman model's MAPE is 2.75%, a considerable as GPR and XGBoost, the GA-Elman model is better decrease from the Elman model's 3.37%. It further proves suited for handling dynamic changes in time series data. the effectiveness of GA in optimizing neural network For instance, while GPR exhibits high accuracy in parameters and improving prediction accuracy. In short, predicting tunnel geological conditions, its sensitivity to by optimizing GA, the GA-Elman model increases the data scale can lead to decreased computational efficiency ability to detect possible overspending, which is crucial when dealing with large-scale complex construction for efficient cost control and budget management, in projects. In comparison, the GA-Elman model, by addition to improving the accuracy of cost prediction. optimizing weights and thresholds through GA, can Although this study has made some progress in process large-scale data more efficiently while fully construction cost estimation and overspending risk capturing dynamic changes, thus enhancing the model's assessment, there are still some limitations. First, the applicability. robustness of the model needs to be enhanced, as extreme The comparison with ANNs and gradient boosting errors occurring on specific data samples indicate models indicates that although these models perform well insufficient stability. Second, the study only selects certain in rapid construction cost estimation, they lack capability regions and prefabricated buildings, and the limitation of in risk assessment. For example, the gradient boosting the sample range may affect the model's generalization model primarily focuses on cost optimization and cannot ability, making it difficult to apply to other regions or effectively identify key risk factors leading to different building types. Additionally, there may be biases overspending. In contrast, the GA-Elman model can in data selection, such as differences between urban and predict costs and identify key drivers of overspending rural projects or the impact of various construction risks (such as fluctuations in material prices and technologies (e.g., traditional construction versus modern construction delays) through its dynamic memory building technologies). These factors could significantly mechanism. As a result, it can provide project managers affect the model's applicability and accuracy. Future with more targeted decision support. research should consider more comprehensive data Compared to hybrid models such as ANN combined collection, covering a wider range of regions, building with the Grasshopper algorithm and ARIMA-ANN types, and different construction technologies, to avoid models, the GA-Elman model performs better in long- biases caused by data limitations, thereby enhancing the term forecasting and modeling complex data relationships. model's generalization ability and adaptability. At the Although the ARIMA-ANN model has certain advantages same time, more advanced data preprocessing techniques in long-term construction cost estimation, its ability to and algorithm optimization methods can be explored to capture nonlinear features is limited. The GA-Elman improve the model's prediction accuracy and stability, model, by optimizing network structure through the global providing stronger support for widespread application. search capability of GA, can better model nonlinear and temporal characteristics. Meanwhile, it can achieve Funding superior prediction accuracy in practical tests, with the MAPE reduced to 2.75%. This work was supported by Key Laboratory of Building In summary, the GA-Elman model outperforms Structures in Anhui Universities (Anhui Xinhua existing models in terms of cost prediction accuracy, University) 2023 school-level scientific research project: overspending risk assessment ability, and adaptability to “Application Research of BIM Technology in Cost complex data. Thus, it offers an innovative solution for Management of Engineering Construction Projects” construction cost management and significant practical (Project batch number: KLBSZD202301, Host: Wu Qian). guidance for budget control and risk management in complex engineering projects. Conflict of interest statement There is no conflict of interest in this study. 104 Informatica 49 (2025) 91–104 Q. Wu Ethical compliance statement Farid Hama Ali, H., Ismail Abdullah, A., & Kameran Al-Salihi, N. (2021). Forecasting tunnel geology, This study does not involve experiments on humans or construction time and costs using machine learning animals and does not require ethical approval. methods. Neural Computing and Applications, 33, 321-348. https://doi.org/10.1007/s00521-020-05006- References 2 [1] Li L. (2023). Dynamic cost estimation of [11] Alshboul, O., Shehadeh, A., Almasabha, G., & reconstruction project based on particle swarm Almuflih, A. S. (2022). Extreme gradient boosting- optimization algorithm. Informatica, 47(2), 173-182. based machine learning approach for green building https://doi.org/10.31449/inf.v47i2.4026 cost prediction. Sustainability, 14(11), 6651. [2] Maya, R., Hassan, B., & Hassan, A. (2023). Develop https://doi.org/10.3390/su14116651 an artificial neural network (ANN) model to predict [12] Pham, T. Q. D., Le-Hong, T., & Tran, X. V. (2023). construction projects performance in Syria. Journal of Efficient estimation and optimization of building King Saud University-Engineering Sciences, 35(6), costs using machine learning. International Journal 366-371. of Construction Management, 23(5), 909-921. https://doi.org/10.1016/j.jksues.2021.05.002 https://doi.org/10.1080/15623599.2021.1943630 [3] Xu, X., & Zhang, Y. (2023). Regional steel price index [13] Goodarzizad, P., Mohammadi Golafshani, E., & forecasts with neural networks: evidence from east, Arashpour, M. (2023). Predicting the construction south, North, central south, northeast, southwest, and labour productivity using artificial neural network and northwest China. The Journal of Supercomputing, grasshopper optimisation algorithm. International 79(12), 13601-13619. Journal of Construction Management, 23(5), 763-779. https://doi.org/10.1007/s11227-023-05207-1 https://doi.org/10.1080/15623599.2021.1927363 [4] GadelHak, Y., El-Azazy, M., Shibl, M. F., & [14] Kim, S., Choi, C. Y., Shahandashti, M., & Ryu, K. R. Mahmoud, R. K. (2023). Cost estimation of synthesis (2022). Improving accuracy in predicting city-level and utilization of nano-adsorbents on the laboratory construction cost indices by combining linear ARIMA and industrial scales: A detailed review. Science of and nonlinear ANNs. Journal of Management in The Total Environment, 875, 162629. Engineering, 38(2), 04021093. https://doi.org/10.1016/j.scitotenv.2023.162629 https://doi.org/10.1061/(ASCE)ME.1943- [5] Abdel-Hamid, M., & Abdelhaleem, H. M. (2023). 5479.0001008 Project cost control using five dimensions building [15] Ye, M., Wang, J., Si, X., Zhao, S., & Huang, Q. (2022). information modelling. International Journal of Analysis on dynamic evolution of the cost risk of Construction Management, 23(3), 405-409. prefabricated building based on DBN. Sustainability, https://doi.org/10.1080/15623599.2021.1880313 14(3), 1864. https://doi.org/10.3390/su14031864 [6] Durstewitz, D., Koppe, G., & Thurm, M. I. (2023). [16] Su, D., Fan, M., & Sharma, A. (2022). Construction Reconstructing computational system dynamics from of lean control system of prefabricated mechanical neural data with recurrent neural networks. Nature building cost based on Hall multi-dimensional Reviews Neuroscience, 24(11), 693-710. structure model. Informatica, 46(3). https://doi.org/10.1038/s41583-023-00740-7 https://doi.org/10.31449/inf.v46i3.3914 [7] Hai, T., & Zhou, J. (2023). Predicting the performance [17] Cao, P., & Lei, X. (2023). Evaluating Risk in of thermal, electrical and overall efficiencies of a Prefabricated Building Construction under EPC nanofluid-based photovoltaic/thermal system using Contracting Using Structural Equation Modeling: A Elman recurrent neural network methodology. Case Study of Shaanxi Province, China. Buildings, Engineering Analysis with Boundary Elements, 150, 13(6), 1465. 394-399. https://doi.org/10.3390/buildings13061465 https://doi.org/10.1016/j.enganabound.2023.02.013 [18] Leu, S. S., Lu, C. Y., & Wu, P. L. (2023). Dynamic- [8] Das, S., Tariq, A., Santos, T., Kantareddy, S. S., & Bayesian-network-based project cost overrun Banerjee, I. (2023). Recurrent neural networks prediction model. Sustainability, 15(5), 4570. (RNNs): architectures, training tricks, and https://doi.org/10.3390/su15054570 introduction to influential research. Machine [19] Asiedu, R. O., & Adaku, E. (2020). Cost overruns of Learning for Brain Disorders, 117-138. public sector construction projects: a developing https://doi.org/10.1007/978-1-0716-3195-9_4 country perspective. International Journal of [9] Hananto, A. L., Fauzi, A., Suhara, A., Davison, I., Managing Projects in Business, 13(1), 66-84. Spraggon, M., Herawan, S. G., et al. (2023). Elman https://doi.org/10.1108/IJMPB-09-2018-0177 and cascade neural networks with conjugate gradient [20] Kumar, R. (2022). Memory recurrent Elman neural polak-ribière restarts to predict diesel engine network-based identification of time-delayed performance and emissions fueled by butanol as nonlinear dynamical system. IEEE Transactions on sustainable biofuel. Results in Engineering, 19, Systems, Man, and Cybernetics: Systems, 53(2), 753- 101334. 762. 10.1109/TSMC.2022.3186610 https://doi.org/10.1016/j.rineng.2023.101334 [21] Miranda, M. H., Silva, F. L., Lourenço, M. A., Eckert, [10] Mahmoodzadeh, A., Mohammadi, M., Daraei, A., J. J., & Silva, L. C. (2023). Particle swarm optimization of Elman neural network applied to Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 105 battery state of charge and state of health estimation. Energy, 285, 129503. https://doi.org/10.1016/j.energy.2023.129503 [22] Chandar, S. K. (2021). Grey Wolf optimization- Elman neural network model for stock price prediction. Soft Computing, 25(1), 649-658. 10.1007/s00500-020-05174-2 [23] Zhang, Y., Zhao, J., Wang, L., Wu, H., Zhou, R., & Yu, J. (2021). An improved OIF Elman neural network based on CSO algorithm and its applications. Computer Communications, 171, 148-156. https://doi.org/10.1016/j.comcom.2021.01.035 [24] Sohail, A. (2023). Genetic algorithms in the fields of artificial intelligence and data sciences. Annals of Data Science, 10(4), 1007-1018. https://doi.org/10.1007/s40745-021-00354-9 [25] Chotchantarakun, K. (2023). Optimizing Sequential Forward Selection on Classification using Genetic Algorithm. Informatica, 47(9). https://doi.org/10.31449/inf.v46i9.4964 [26] Guo, W., Meng, Q., Wang, X., Zhang, Z., Yang, K., & Wang, C. (2022). Landslide displacement prediction based on variational mode decomposition and GA– Elman model. Applied Sciences, 13(1), 450. https://doi.org/10.3390/app13010450 [27] Wang, C., Zhao, Y., Bai, L., Guo, W., & Meng, Q. (2021). Landslide displacement prediction method based on GA-Elman model. Applied sciences, 11(22), 11030. https://doi.org/10.3390/app112211030 https://doi.org/10.31449/inf.v49i12.6927 Informatica 49 (2025) 105–114 105 Comparison of Machine Learning Algorithms for Predicting Thyroid Disorders in Diabetic Patients Hiba O. Sayyid*1, Salma A. Mahmood2, and Saad S. Hamadi3 1Department of Computer Science, University of Basrah, College of Computer Sciences and Information Technology, Basrah, Iraq 2Department of Intelligent Medical Systems, University of Basrah, College of Computer Sciences and Information Technology, Basrah, Iraq 3Department of Internal Medicine, University of Basrah, College of Medicine, Basrah, Iraq E-mail: Itpg.hiba.oudah@uobasrah.edu.iq, Salma.mahmood@uobasrah.edu.iq, and Saad.shaheen@uobasrah.edu.iq *Corresponding author Keywords: machine learning, classification, decision tree, random forest, support vector machine, naïve bayes, logistic regression, K-nearest neighbor, diabetes, thyroid disorders Received: August 17, 2024 Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been used successfully in the healthcare domain for disease diagnosis. Thyroid disorders and diabetes are two of the most prevalent and interconnected chronic diseases, as both play critical roles in regulating various physiological processes in the body. This study aims to predict thyroid disorders in diabetes patients using six machine learning algorithms: Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), Naïve Bayes (NB), and Support Vector Machine (SVM). A locally sourced dataset comprising 44,539 instances of diabetic patients was utilized, undergoing preprocessing steps including data cleaning, encoding, and balancing. Two balancing techniques were employed: manual balancing and RandomUnderSampler. The dataset was partitioned into training and testing sets using a Stratified K-Fold cross-validation approach with 10 folds to ensure robust evaluation. Each algorithm’s performance was assessed using metrics such as accuracy and F1-score. Among the models, the RF algorithm outperformed the others, achieving the highest accuracy of 95% on the manually balanced dataset and 84% when the RandomUnderSampler technique was employed. Additionally, the F1-scores for RF were 95% and 82%, respectively, indicating its robustness in handling imbalanced datasets. This study highlights the importance of selecting appropriate preprocessing techniques and machine learning methods for healthcare datasets. The findings can assist healthcare providers in making early diagnoses and interventions for thyroid disorders in diabetic patients, potentially improving their quality of life and overall healthcare outcomes. Povzetek: Opisana je uporaba strojnega učenja za napovedovanje motenj ščitnice pri bolnikih s sladkorno boleznijo. Algoritem naključnih gozdov doseže najvišjo točnost in oceno F1 na uravnoteženem naboru podatkov. • Type 1 diabetes is an autoimmune disease which is 1 Introduction usually diagnosed in children and young adults [5], it occurs when the insulin-producing cells of the Diabetes and thyroid disorders are among the most • pancreas is attacked by the immune system which prevalent chronic diseases affecting the endocrine and leads to little or no insulin [6]. metabolic systems [1]. These two diseases are often • Type 2 diabetes is the most common type of diabetes coexisted and strongly linked together, as many studies that often occurs in older adults when the body have shown that there is a higher prevalence of thyroid doesn’t produce enough insulin or becomes resistant disorders in diabetic patients and vice versa [2]. to insulin [6]. Diabetes is a chronic condition that is caused by • Gestational diabetes this type develops as a elevated levels of blood sugar (glucose) [3]. This occurs complication in women during pregnancy and usually when the body either cannot use the insulin it produces goes away after the baby is born [7]. effectively or cannot produce enough insulin. Insulin is a hormone that allows the body cells to absorb and use • There are fewer common types of diabetes that are caused by genetic conditions and diseases such as glucose for energy and helps regulate blood sugar [4]. As secondary diabetes and monogenic diabetes. a result, diabetes affects various body functions. There are four types of this disease 106 Informatica 49 (2025) 105–114 H.O. Sayyid et al. Thyroid disorder is a disease that affects the function theorem, Rule, or law is used to describe the probability of the thyroid gland in producing the appropriate amounts of a hypothesis with existing knowledge. Bayes’ theorem of thyroid hormones T3 (tri-iodothyronine), and T4(tetra- formula is [11]: iodothyronine), as these hormones play an important role 𝐏(𝐀|𝐁) = (𝐏(𝐁|𝐀) ∗ 𝐏(𝐀)) / 𝐏(𝐁) (1) in controlling many vital activities of the body such as heart rate, energy level, metabolism, bone health, and NB is computationally efficient, easy to create, and many other functions. The most common thyroid can handle large datasets [12]. It is very effective in text disorders are Hyperthyroidism and Hypothyroidism [8]. In classification tasks, such as spam filtering. Despite that hyperthyroidism, the thyroid gland overproduces thyroid it’s a simple algorithm with the independence assumption, hormones. While in hypothyroidism the thyroid gland it can often outperform complex algorithms. does not produce enough thyroid hormones [8]. Decision Tree (DT): is one of the supervised Machine Studies show that there is a higher prevalence of learning algorithms that is used for both classification and thyroid disorders among patients with type 1 or type 2 regression problems [14]. DT is a visual representation of diabetes in comparison to non-diabetic patients, which the decision-making process, it’s a tree-like graph that reveals their close relationships, it also shows that partitions the data based on the input features, the tree autoimmunity is a key to understanding the link between starts with a root which has the highest gain then nodes type 1 diabetes and autoimmune thyroid disorders [9]. and branches. Where each node represents a test that The presence of insulin resistance or diabetes follows the if-then statement and leads to a different increases an individual’s risk of developing thyroid branch, each branch leads to one outcome (decision) [15]. disorders while having thyroid disorder can increase the It is a widely used algorithm for predicting diseases. risk of developing diabetes and metabolic syndrome [10]. K-Nearest Neighbors (KNN) is one of the simplest It is very important to diagnose thyroid disorder in diabetic lazy learning machine learning algorithms that make patients and a routine screening should also be predictions based on the entire data [12],[16]. The recommended. It is necessary that the clinician identify the algorithm is used for solving classification and regression high-risk diabetic groups and manage the thyroid tasks. KNN assumes that similar data points are located abnormalities if present as soon as possible to reduce the near each other, the similarity is called distance. It uses risk of further complications [10]. distant measures like Euclidean to measure similarity. The primary aim of this study is to assess the Although KNN is a very simple and easy-to-implement effectiveness of six machine learning methods (Decision algorithm, its results can be very competitive [16]. tree, random forest, Support Vector Machine, Naïve Support Vector Machine (SVM) is one of the most Bayes, k nearest neighbor, logistic regression) in popular machine learning algorithms. It is used primarily predicting the presence or absence of thyroid disorders in for classification tasks but can also be used for regression diabetes patients. By comparing the results of each [12]. The main goal of SVM is to separate the data with a method, we aim to identify the most accurate model to hyperplane into different classes so that we can easily put enhance early detection and intervention. Machine the new data point into one of the classes [11]. SVM can learning methods used in this study differ in their nature be effective in high-dimensional spaces and is widely used and work but they are all used for predicting new states. in image classification, Face detection, text categorization, Logistic regression (LR): is a classification machine and handwriting recognition. learning algorithm that is used for predictive analysis Random Forest (RF) is a machine learning algorithm based on the concept of probability [11]. LR classifies the that belongs to the group of decision-tree-based methods data using the logistic sigmoid function. LR predicts one [13]. It can be used for classification tasks and regression. of two possible outcomes of a categorical dependent Random forest is a collection of decision trees built during variable. Therefore, the outcome must be a categorical or the training process and then the prediction of these trees discrete value. It does not give the value of either True or is combined during the testing process. RF approach gives False, Yes or No, 0 or 1, etc. instead, it gives a a better accurate result in comparison to a single decision probabilistic value between 0 and 1[12]. To classify tree with the ability to limit overfitting [16]. instances into the two classes. a common approach is to The organization of this paper is as follows, …. use a threshold value (e.g., 0.5), If the predicted probability is above the threshold, the instance is assigned 2 Literature review to one class, and if it is below the threshold, then it is assigned to the other class. LR is widely used for many In the field of machine learning-based prediction of tasks such as fraud detection, disease diagnosis, and diabetes and thyroid disorders, numerous studies have prediction, Tumor Malignant or Benign, mail spam or not explored various algorithms and methodologies. This spam, etc. [11]. section provides a structured comparison of these studies Naïve Bayes (NB): is a simple machine learning in terms of the algorithms used, evaluation metrics, and classification algorithm based on Bayes’ Theorem [13]. It the reported results. By highlighting the strengths and is called naïve because of the assumption of conditional limitations of previous works, we emphasize the novelty independence among the features which means that the and contributions of the present study. presence or absence of one feature in a class is independent of the presence or absence of the other features. It is used for a large amount of data. Bayes’ Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 107 2.1 Diabetes prediction studies Dudkina Classification DT based Accuracy DT:71% T et al. and detection model Hassan et al. [17] applied SVM, K-Nearest Neighbors (2021) of diabetes disease (KNN), and Decision Tree to classify diabetes patients. Yadav et Predicting Random Accuracy RF: 99% The study showed that SVM outperformed the other al. (2020) thyroid Forest, disease Decision algorithms with the highest accuracy of 90.23%. Tree, CART Samin Poudel [18] tested 20 machine learning Priyanka Diagnosing Naive Bayes, Accuracy SVM: algorithms for diagnosing diabetes based on the Pima Duggal & thyroid SVM, 92.92% Shipra disorders Random Indian Diabetes Dataset. Naive Bayes emerged as the best- Shukla Forest performing algorithm with an accuracy of 77%, an F1- (2020) Chaubey Thyroid Logistic Accuracy KNN: score of 0.83, a precision of 0.80, and a recall of 0.86. G. et al. disease Regression, 96.88% Dudkina T et al. [19] presented a study that is (2012) prediction Decision dedicated to handling the problem of Classification and Trees, KNN Chaganti predicting RF, SVM, Accuracy RF: 99% detection of diabetes disease. The study focuses on et al. thyroid AdaBoost developing a decision tree-based machine learning model (2022) disorders (ADA), LR, and Gradient to solve this problem. The results showed that splitting the boosting data by 50% for training and 50% for testing was the best machine option with 0.71 accuracy. (GBM), as well as three deep learning models 2.2 Thyroid disease prediction studies Current Predicting RF, DT, Accuracy, RF with Yadav D et al. [20] used Random Forest, Decision Tree, study thyroid SVM, KNN, F1-Score, Accuracy: disorders in NB, and LR Precision, 88%, F1- and Classification and Regression Tree (CART) to predict diabetic Recall, and Score: thyroid disease. The results showed that Random Forest patients Specificity 85% achieved an accuracy of 99%, followed by Decision Tree 98% and CART 93%. Their ensemble approach From the table above, we can see that various studies combining these classifiers achieved a perfect accuracy of have employed different algorithms to predict diabetes 100%. and thyroid disorders with varying results. For instance, Priyanka Duggal and Shipra Shukla [21] used feature SVM and Decision Tree techniques are commonly used in selection and classification techniques like Naive Bayes, diabetes prediction, with SVM often yielding higher SVM, and Random Forest to diagnose thyroid disorders. accuracy compared to other algorithms. On the other hand, The study reported that SVM achieved the highest for thyroid disease prediction, Random Forest and KNN accuracy with 92.92%. have been reported to achieve remarkable accuracy, with Chaubey G. et al. [22] tested Logistic Regression, Random Forest reaching up to 100% accuracy when Decision Trees, and KNN for thyroid disease prediction. combined with ensemble methods. KNN achieved the highest accuracy at 96.88%. While these studies have contributed significantly to Chaganti et al. [23] presented a method that focuses the field, there remains a gap in comprehensive and on the multi-class problems to predict thyroid disorders reliable approaches for predicting thyroid disorders using five machine learning models including RF, SVM, specifically in the diabetic population. They often focus AdaBoost (ADA), LR, and Gradient boosting machine on either one disorder or use fewer evaluation metrics. (GBM), as well as three deep learning models. They Some studies rely primarily on accuracy, which may not created a dataset from the UCI thyroid disease datasets reflect the model's true performance, especially when that contained 9173 patient records,31 features, and 6771 class imbalance exists. The F1-score and AUC metrics are normal patient records with no sign of thyroid disease. The more informative but have not been consistently used dataset was randomly balanced by taking 400 samples across studies. from the 6771 records, and at least 200 samples for the The current study addresses these gaps by utilizing a other classes. The results showed that when using the comprehensive preprocessing pipeline that includes random forest classifier with the presented method it can feature selection technique, and effective class imbalance achieve a 0.99 accuracy in predicting ten types of thyroid handling using methods like RandomUnderSampler. diseases. Additionally, this study adopts a range of evaluation metrics (accuracy, F1-score, precision, recall, and Table 1:Summary table specificity) to offer a well-rounded analysis of model Study Methodology Algorithms Key Results performance. Furthermore, we compare multiple machine Used Evaluation learning models RF, SVM, KNN, DT, NB, and LR using Metrics Hassan et Classifying SVM, KNN, Accuracy SVM: cross-validation, which not only strengthens the model al (2020). diabetes DT 90.23% evaluation but also ensures more robust generalization to patients unseen data. Samin Diagnosing 20 ML Accuracy, Naive Poudel diabetes approaches Precision, Bayes: By offering a balanced prediction model with high (2021) Recall, F1- Accuracy accuracy (88%) and F1-score (0.85), the current study score 77%, F1- score 83%, surpasses previous works in terms of both the depth of Precision analysis and the performance metrics, which positions it 80% 108 Informatica 49 (2025) 105–114 H.O. Sayyid et al. as a significant advancement in predicting thyroid accuracy required for clinical applications and minimized disorders in diabetic patients. the risk of introducing errors associated with imputation. 3 Proposed methodology After removing incomplete records, the dataset was carefully inspected to confirm that it remained The main objective of this study is to predicate the representative of the original population in terms of key relationship between diabetes mellitus and Thyroid demographic and clinical features, ensuring that the disorders. Six different prediction methods were used for removal process did not introduce unintended biases. this purpose as aforementioned above. The proposal methodology is shown in the following Figure 1. 3.2.2 Data cleaning Data cleaning is a critical process that significantly impacts the quality and reliability of predictive models. A clean dataset ensures accurate and robust machine learning models with improved performance and trustworthy predictions. In this study, thorough data cleaning was performed to address various issues and errors present in the dataset. The data cleaning process involved • Identifying and rectifying incorrect data entries which included instances where ambiguous letters, words, and symbols were used such as ‘, \\, \L, \N,], B, E, EX, L, M, MN, N, N’, N N, NNNN, N\, N\], N\N, N], N] \, H, U, صى ة, ,. Such values represent noises and inaccuracies in the dataset. Furthermore, inconsistencies in data entry were addressed, including the use of 'لا' in Arabic instead of 'No', as well as the recording of 'N' instead of 'No'. Additionally, discrepancies in capitalization were noted, such as 'female' being recorded instead of Figure 1: Flowchart of the thyroid disorder prediction 'Female'. By rectifying these mistakes, the dataset was system. standardized, eliminating potential sources of error in the analysis. The above flowchart is illustrative of the following process: • Handling the age field by determining ages in ranges (15-100 years), in line with the policy of the diabetes 3.1 Data collection center catering to adults only. For this study, we used a medical dataset related to • Similarly, filtering out heights and weights that fell diabetes patients which was obtained from Faiha outside the normal ranges. These actions were Specialized Diabetes Endocrine and Metabolism Center essential to preserve the integrity of the dataset and (FDEMC) in Basra, Iraq. enhance the accuracy of our analyses. 3.2 Data preprocessing 3.2.3 Feature engineering Data preprocessing was a critical step in preparing the To enhance the performance of machine learning models, dataset for effective machine learning model training and new features were derived from existing ones through evaluation. This section elaborates on the detailed feature engineering. For example, the Age feature was procedures used, including handling missing values, computed from the patients' dates of birth, and the Body feature engineering, and encoding categorical variables Mass Index (BMI) was calculated using height and weight ensuring replicability and transparency. measurements. These newly created features provided additional insights into patient characteristics, which 3.2.1 Handling missing values contributed to improving the predictive power of the models. Given the sensitivity of clinical data and the potential risks of introducing bias through imputation, instances with 3.2.4 Encoding categorical variables missing values were excluded from the dataset. This approach ensured the integrity and reliability of the Since machine learning algorithms generally require analysis by working exclusively with complete data. numerical input, data encoding is essential to convert While this reduced the dataset size, it maintained the categorical variables into a suitable format. This study Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 109 used label encoding to transform variables such as sex, simplicity and effectiveness in achieving balance without family history of DM, glycemic control, lipid control, introducing synthetic data. pressure control, thyroid, marital status, smoker, and drinker into numerical representations compatible with machine learning models. After these steps the dataset 3.3.2 Experiment 2: manual balancing consists of 44539 instant and 12 variables, Table In the second experiment, the dataset was manually 2illustrates each variable along with its corresponding balanced under the expert supervision of a physician to encoded values. ensure the process was clinically valid and aligned with medical standards. The dataset was reduced to 2,166 Table 2: Description of the used data instances, with an equal number of examples from both Feature Description Value After classes. Unlike RUS, manual balancing involved the Encoding careful selection of instances, allowing for greater control Thyroid If the patient is diagnosed with a 0 means No over the data distribution while preserving its clinical thyroid disorder. 1 means Yes DM If the patient has type1 or type2 1 for type1 relevance. This approach mitigated the potential bias Diabetes Mellitus 2 for type2 introduced by random sampling, ensuring that the Age The patient's age in years Range (15-100) balanced dataset reflected real-world clinical scenarios. Sex The patient’s gender: 0 for male 1 for female Family If the patient has a family member with 0 means No Although techniques such as RandomOverSampler history of DM diabetes 1 means Yes BMI Body Mass Index: the patient’s weight Range (10.8- (ROS), Synthetic Minority Over-Sampling Technique divided by the square of height 75.3) (SMOTE), and ensemble methods like Balanced Random Lipid control The patient’s lipid levels in the 0 means No Forest (BRF) are widely used for handling imbalanced bloodstream are managed 1 means Yes Pressure The patient’s blood pressure levels are 0 means No data, they were not employed in this study. The primary control managed to stay in a specific target 1 means Yes concern was that synthetic data might fail to capture the range Glycemic The patient’s blood sugar levels are 0 means No true clinical variability of the minority class, potentially control managed in a specific target range 1 means Yes introducing artificial patterns that could distort model Smoker If the patient is a current smoker, non- 0 means No predictions and reduce generalizability. Additionally, smoker, or former smoker. 1 means Yes 2 means X- these methods increase computational complexity and smoker training time, making them less suitable for the objectives Drinker If the patient is a current drinker, non- 0 means No drinker, or former drinker 1 means Yes of this study. Instead, simpler and more controlled 2 means X- balancing methods were chosen to maintain a drinker representative and manageable dataset. Marital If the patient is married, single, 0 means Single divorced, or widowed. 1 means Married 2 means 3.4 Model selection and training Divorced 3 means Widow 3.4.1 Model selection In this study, we employed six machine learning 3.3 Addressing class imbalance algorithms to predict thyroid disorders in diabetic patients: Class imbalance is a prevalent challenge in machine Random Forest (RF), Decision Tree (DT), K-Nearest learning, especially in healthcare datasets where minority Neighbors (KNN), Logistic Regression (LR), Naïve classes often represent critical conditions. In this study, Bayes (NB), and Support Vector Machine (SVM). These the dataset was imbalanced, with only 15.17% of instances models were selected for their diverse characteristics and representing patients with thyroid disorders (6,755 strengths in classification tasks, particularly in medical instances), compared to 84.83% without thyroid disorders datasets. Allowing us to compare their performance in (37,784 instances). To address this imbalance, two addressing the two different datasets. The rationale for techniques were employed. selecting these models is summarized below: 3.3.1 Experiment 1: RandomUnderSampler • Random Forest (RF) was chosen for its ensemble (RUS) nature, which combines multiple decision trees to In the first experiment, the RandomUnderSampler (RUS) reduce overfitting and improve generalization. RF is technique was used to address the class imbalance. This particularly effective in handling high-dimensional method randomly reduces the size of the majority class to datasets with complex interactions. Additionally, RF match that of the minority class, creating a balanced provides feature importance rankings, offering dataset. After applying RandomUnderSampler, the dataset insights into which factors contribute most to was reduced to 13,438 instances, with an equal predictions. distribution of 50% representing patients with thyroid • Decision Tree (DT) was selected for its simplicity, disorders and 50% without. While this approach ensures interpretability, and ability to model nonlinear that the models are not biased toward the majority class, it relationships. Furthermore, DTs offer visual can result in the loss of valuable information by discarding representations of decision rules, making them majority-class instances. Nonetheless, it was chosen for its especially useful for understanding model behavior. 110 Informatica 49 (2025) 105–114 H.O. Sayyid et al. • K-Nearest Neighbors (KNN) was included due to its ranked by Random Forest importance, using 10-fold ability to perform well in non-linear decision Stratified Cross-Validation to evaluate each combination. boundaries by evaluating the proximity between The optimal configuration was selected based on the instances. It is an intuitive algorithm that can be highest cross-validation accuracy and minimal train-test effective when there are clear clusters in the data. accuracy differences, ensuring good generalization and • Logistic Regression (LR) was chosen for its minimizing the risk of overfitting or underfitting during simplicity, interpretability, and strong performance in cross-validation. binary classification tasks. As a linear model, LR The combination of multiple models, cross- serves as a robust baseline, helping to benchmark the validation, sequential feature selection, and performance of more complex approaches. hyperparameter tuning ensured that we could rigorously • Naïve Bayes (NB) was selected for its simplicity and evaluate the performance of each algorithm and select the efficiency in handling large datasets with categorical one best suited for predicting thyroid disorders in diabetic features. Its probabilistic nature makes it well-suited patients. This approach provided a comprehensive for classification tasks with independent features. understanding of the strengths and weaknesses of each particularly the Gaussian variant, model, helping guide the decision-making process for • Support Vector Machine (SVM) was chosen for its real-world applications. ability to find complex decision boundaries in high- dimensional spaces. It is particularly effective in 3.5 Evaluation separating classes with a clear margin. The evaluation phase focused on assessing and comparing the performance of the models using different metrics: 3.4.2 Training accuracy, precision, recall, F1 score, sensitivity, Initially, a Random Forest classifier was employed to specificity, and a confusion matrix to provide a determine the most influential features by ranking them comprehensive view of the model’s ability to correctly based on their importance scores shown in Figure 2 and classify instances. The metrics were calculated based on Figure 3. These top-ranked features were subsequently the model predictions on the test dataset. utilized for training the models. All models were trained using Stratified K-Fold cross- Accuracy: means how many times the model made a validation with 10 folds, ensuring that the distribution of correct prediction among the total number of instances thyroid and non-thyroid patients was maintained in each [16]. fold. This method provides a robust evaluation of the 𝑻𝑷 + 𝑻𝑵 (2) 𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = models' performance by assessing them across multiple 𝑻𝑷 + 𝑻𝑵 + 𝑭𝑵 + 𝑭𝑷 data splits, which helps mitigate the risk of overfitting or underfitting. Precision: means the number of positive (correct) To enhance the feature selection process, we used a predictions made by the model and belongs to the positive sequential feature selection approach, where we started by class [12]. 𝑻𝑷 training each model with a single feature and (3) 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 = 𝑻𝑷 + 𝑭𝑷 incrementally added more features. This allowed us to identify the most relevant features for each model and Recall (Sensitivity): means the number of actual ensured that only the most informative variables were positive (correct) predictions made by the model out of all used, optimizing the model's performance. positive examples in the dataset [15]. We assessed both training and testing accuracies to 𝑻𝑷 (4) evaluate how well each model generalized to unseen data. 𝒓𝒆𝒄𝒂𝒍𝒍 = 𝑻𝑷 + 𝑭𝑵 By comparing these accuracies, we were able to detect potential overfitting (where the model performs well on F1score: provides a single score that combines both training data but poorly on testing data) or underfitting precision and recall in one number to find balance [24]. It (where the model performs poorly on both training and is needed when there is uneven class distribution (more testing data). This evaluation ensured that the models negative). maintained a balance between accuracy and 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍 (5) generalization. 𝒇𝟏𝒔𝒄𝒐𝒓𝒆 = 𝟐 ∗ 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝒓𝒆𝒄𝒂𝒍𝒍 3.4.3 Hyperparameter tuning Specificity (True Negative Rate): The percentage of Hyperparameter tuning was performed to optimize model actual negatives properly identified by the model [12]. 𝑻𝑵 performance. For RF and DT, fixed parameters such as (6) 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 = max_depth=10 and n_estimators=100, were selected after 𝑻𝑵 + 𝑭𝑷 experimenting with various combinations of parameter Each model's performance was evaluated using these values. These experiments involved testing different metrics. The fold that yielded the highest accuracy with depths for the trees and numbers of estimators to evaluate equal training and testing accuracies was noted, along with their impact on the model's performance. While the the corresponding optimal number of features. hyperparameter tuning for KNN involved testing different numbers of neighbors (1–10) and subsets of top features Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 111 4 Results In this section, we present the feature importance ranking results and the evaluation results of the six machine learning models used. Figure 2: Feature importance ranking for Experiment 1 (on the RandomUnderSampler balanced dataset) Figure 3:Feature importance ranking for experiment 2 (on the manually balanced dataset) Figure 2 and Figure 3 display the feature importance Table 3: Experiment 1 evaluation metrics comparison ranking derived from a Random Forest model, used to table. predict thyroid disorders in diabetic patients. The x-axis shows the relative importance of each feature, with higher Classifi Accura Precisi F1- Sensitiv Specif Confusion er cy on Sco ity icity Matrix values indicating greater influence on the model's re (Recall) predictions. BMI and age are identified as the most critical RF 0.84 0.96 0.82 0.713 0.967 [[649,22], features, with BMI showing the highest impact. Other [193,479]] DT 0.83 0.95 0.81 0.702 0.960 [[644,2], features, such as diabetes type and sex, also contribute to [200,472]] the model but with comparatively lower importance. This KNN 0.83 0.92 0.81 0.720 0.934 [[627,4], [188,484]] ranking provides valuable insights into the factors most SVM 0.79 0.85 0.77 0 .708 0.871 [[585,87], predictive of thyroid disorders in the context of diabetes. [196,476]] The results emphasize the significance of clinical factors LR 0.78 0.84 0.76 0.701 0.868 [[583,89], [201,471]] like BMI and age in thyroid disorder prediction for NB 0.78 0.84 0.76 0.701 0.868 [[583,89], diabetes patients. [201,471]] 112 Informatica 49 (2025) 105–114 H.O. Sayyid et al. Table 4: Experiment 2 evaluation metrics comparison 5 Discussion table. This study highlights the efficacy of machine learning Classifi Accura Precisi F1- Sensitiv Specific Confusi models, particularly the Random Forest (RF) algorithm, in er cy on Sco ity ity on re (Recall) Matrix predicting thyroid disorders among diabetic patients. The RF 0.95 0.99 0.95 0.917 0.991 [[107,1], findings emphasize the importance of model selection, [9, 100]] DT 0.95 0.96 0.95 0.944 0.963 [[105,4], data preprocessing, and feature analysis in achieving high [6, 102]] predictive performance. This section explores KNN 0.94 0.97 0.94 0.917 0.972 [[105,3], comparisons with related works, reasons for Random [9, 100]] LR 0.94 1.00 0.94 0.890 1.000 [[108,0], Forest’s superior performance, variations in model [12, 97]] effectiveness, and limitations, alongside real-world SVM 0.93 1.00 0.93 0.861 1.000 [[108,0], implications of the findings. [15, 93]] NB 0.93 1.00 0.93 0.861 1.000 [[108,0], [15, 93]] 5.1 Comparison with related works The results show that in Experiment 1 where the The findings align with recent studies in the literature that RandomUnderSampler technique was employed for data emphasize the utility of machine learning for healthcare balancing, the Random Forest model demonstrated applications. For instance, studies such as Yadav et al. superior performance across all metrics compared to other demonstrated the effectiveness of ensemble-based models models achieving the highest accuracy of 84%, precision of 96%, and F1-score of 82%. Followed by DT and KNN like RF in handling structured medical datasets, classifiers having the same accuracy of 83%. particularly for classification problems. Compared to other methods, the RF model in this study yielded superior However, SVM and LR showed a lower performance, accuracy, recall, and precision, which can be attributed to with accuracies of 79% and 78%, respectively. its ability to handle non-linear relationships and its robust In Experiment 2, which used a manually balanced feature selection mechanism. dataset, all the classifiers performed extremely well across While [Priyanka Duggal & Shipra Shukla (2020)] also all metrics, with the RF classifier achieving the highest applied Support Vector Machines (SVMs) to medical accuracy of 95%, precision of 99%, and an F1-score of datasets with 92% accuracy, our results indicate that SVM 95% indicating the model's high effectiveness in predicting underperformed relative to RF, potentially due to the high thyroid disorders. dimensionality of the features or the imbalanced nature of the dataset. This highlights the importance of model Similarly, the DT and KNN also demonstrated high selection based on the characteristics of the data. accuracies of 95%, and 94%, Correspondingly. This great performance is most likely due to the balanced data that ensured a better representation of both classes, leading to 5.2 Reasons for random forest's more reliable model predictions. performance superiority The sensitivity and specificity of these models are The RF model's outperformance can be attributed to significantly higher in Experiment 2 compared to several key factors. First, its inherent ability to handle both Experiment 1, showcasing the efficacy of the manually categorical and numerical data without extensive balanced dataset in enhancing model performance. preprocessing makes it well-suited for medical datasets, The results showed that while using which often include diverse feature types. Second, the use RandomUnderSampler for data balancing in the first of RandomUnderSampler for data balancing helped experiment, the models did not reach the same level of mitigate the issue of class imbalance, which is a critical effectiveness as in the manually balanced dataset in the challenge in predicting rare conditions such as thyroid second experiment which achieved a consistently high disorders in diabetic patients. RF’s capacity to combine performance across all classifiers. This highlights that predictions from multiple decision trees also reduces the choosing thoughtful and effective data-balancing risk of overfitting, ensuring more generalized predictions. technique can improve the model's overall performance Furthermore, feature importance analysis revealed that and prediction accuracy. variables such as BMI, age, and diabetes type were among In summary, for both experiments, the Random Forest the most predictive, aligning with clinical insights and model emerged as the best-performing algorithm for lending credibility to the model. predicting thyroid disorders in diabetic patients, followed closely by the Decision Tree and K Nearest Neighbors 5.3 Variations in performance across models. These models demonstrated high accuracy, models precision, recall, and F1-score, making them suitable for deployment in clinical settings. Logistic Regression, Naïve The variations in performance between models can be Bayes, and SVM, while useful, showed comparatively linked to their differing sensitivities to the dataset lower performance and may require further optimization characteristics. For example, while K-Nearest Neighbors for effective use in this context. (KNN) is sensitive to feature scaling and data distribution, its relatively low performance could stem from the high dimensionality of the dataset. Similarly, SVM’s reliance Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 113 on kernel functions may not have adequately captured the In summary, this research contributes to the growing complex interactions within the data. In contrast, Decision body of evidence supporting machine learning’s role in Trees (DT) performed reasonably well but lacked the healthcare, particularly for complex, multifactorial ensemble effect of RF, leading to slightly lower accuracy diseases. Future work should focus on validating these and recall. These findings suggest that models like RF, findings in diverse clinical settings, exploring alternative which can effectively leverage feature interactions and resampling techniques, and integrating these models into handle imbalanced data, are better suited for this specific healthcare systems for real-world application. prediction task. References 5.4 Limitations and real-world applicability [1] F. Rong et al., “Association between thyroid Despite these promising results, several limitations must dysfunction and type 2 diabetes: a meta-analysis of be acknowledged. First, the study relied on a single prospective observational studies,” BMC Medicine, dataset, which may limit the generalizability of the vol. 19, no. 1, Oct. 2021, doi: findings to other populations or healthcare settings. https://doi.org/10.1186/s12916-021-02121-2. Second, while RandomUnderSampler addressed class [2] B. Biondi, G. J. Kahaly, and R. P. Robertson, imbalance, other techniques such as SMOTE or hybrid “Thyroid Dysfunction and Diabetes Mellitus: Two approaches could be explored for potentially better results. Closely Associated Disorders,” Endocrine Reviews, Additionally, the dataset’s retrospective nature may vol. 40, no. 3, pp. 789–824, Jan. 2019, doi: introduce biases inherent to the original data collection https://doi.org/10.1210/er.2018-00163. process. [3] N. T. Y. Alibrahim, M. G. Chasib, S. S. Hamadi, and In real-world healthcare environments, the A. A. Mansour, “Predictors of Metformin Side applicability of this method is promising. The RF model's Effects in Patients with Newly Diagnosed Type 2 interpretability, particularly through feature importance Diabetes Mellitus,” Ibnosina Journal of Medicine scores, provides clinicians with actionable insights, aiding and Biomedical Sciences, Apr. 2023, doi: in early diagnosis and tailored treatment planning. https://doi.org/10.1055/s-0043-1761215. However, practical deployment would require rigorous [4] I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes external validation and integration with electronic health prediction using machine learning and explainable AI records to assess scalability and user-friendliness. techniques,” Healthcare Technology Letters, vol. 10, no. 1–2, pp. 1–10, Dec. 2022, doi: 6 Conclusion https://doi.org/10.1049/htl2.12039. [5] S. Hassan, A.-K. Ali, and R. Saleem, “Relationship Early prediction and diagnosis of diseases remain critical between glycemic control and different insulin challenges in the medical domain, particularly for regimens in pediatric type 1 diabetes mellitus,” The interconnected conditions like diabetes and thyroid Medical Journal of Basrah University, 2023, doi: disorders. While many studies have focused on predicting https://doi.org/10.33762/mjbu.2023.140990.1138. these diseases individually, limited research exists on [6] R. Kumar, P. Saha, S. Sahana, Yogendra Kumar, A. predicting thyroid disorders specifically among diabetic Dubey, and O. Prakash, “A REVIEW ON patients. DIABETES MELLITUS: TYPE1 & This study aimed to bridge this gap by applying six TYPE2,” WORLD JOURNAL OF PHARMACY AND machine learning algorithms to a local dataset of diabetic PHARMACEUTICAL SCIENCES, vol. 9, no. 10, pp. patients to predict the likelihood of thyroid disorders. 838–850, Aug. 2020, doi: Unlike previous studies that treated these conditions https://doi.org/10.20959/wjpps202010-17336. independently, this research explored the relationship [7] C. McElwain, F. McCarthy, and C. McCarthy, between diabetes and thyroid disorders, given their “Gestational Diabetes Mellitus and Maternal Immune intertwined impact on vital body functions. Dysregulation: What We Know So Among the tested algorithms, the Random Forest Far,” International Journal of Molecular Sciences, model emerged as the most effective, achieving the vol. 22, no. 8, p. 4261, Apr. 2021, doi: highest accuracy, precision, and recall. Its ability to handle https://doi.org/10.3390/ijms22084261. imbalanced data and highlight key predictive features, [8] K. Dharmarajan, K. Balasree, A.S. Arunachalam, and such as BMI, age, and diabetes type, further solidifies its K. Abirmai, “Thyroid Disease Classification Using potential as a valuable tool for early diagnosis. Decision Tree and SVM,” Indian Journal of Public The implications of these findings extend to Health Research & Development, vol. 11, no. 03, pp. enhancing healthcare practices by enabling clinicians to 229, Mar. 2020. Doi: identify diabetic patients at risk of thyroid disorders, https://www.researchgate.net/publication/341742234 facilitating timely interventions, and potentially reducing _Thyroid_Disease_Classification_Using_Decision_ complications. By improving early detection, this Tree_and_SVM approach could significantly enhance the quality of life for [9] M. Nishi, “Diabetes mellitus and thyroid individuals affected by both conditions. diseases,” Diabetology International, vol. 9, no. 2, 114 Informatica 49 (2025) 105–114 H.O. Sayyid et al. pp. 108–112, May 2018, doi: Letters, vol. 44, no. 3, pp. 233–238, May 2020, doi: https://doi.org/10.1007/s13340-018-0352-4. https://doi.org/10.1007/s40009-020-00979-z. [10] P. Sharma, S. Shrestha, and P. Kumar, “A review on [23] R. Chaganti, F. Rustam, I. De La Torre Díez, J. L. V. association between diabetes and thyroid disease,” Mazón, C. L. Rodríguez, and I. Ashraf, “Thyroid Santosh University Journal of Health Sciences, vol. 5, Disease Prediction Using Selective Features and no. 2, pp. 50–55, Jan. 2020, doi: Machine Learning Techniques,” Cancers, vol. 14, no. http://doi.org/10.18231/j.sujhs.2019.013. 16, p. 3914, Aug. 2022, doi: [11] S. Gopal, P. Gaurav, and D. Prateek, Machine https://doi.org/10.3390/cancers14163914. learning algorithms using Python programming. New [24] G. S. Ohannesian and E. J. Harfash, “Epileptic York: Nova Science Publishers, 2021. Seizures Detection from EEG Recordings Based on a [12] A. Panesar, Machine Learning and AI for Hybrid system of Gaussian Mixture Model and Healthcare: big data for improved health outcomes. Random Forest Classifier,” Informatica, vol. 46, no. Berkeley, CA: Apress, 2021. doi: 6, Sep. 2022, doi: https://doi.org/10.1007/978-1-4842-6537-6. https://doi.org/10.31449/inf.v46i6.4203. [13] F. Pedro and G. Márquez, Handbook of research on big data clustering and machine learning. Hershey, PA: Engineering Science Reference (an imprint of IGI Global), 2020. [14] I. H. Sarker, “Machine Learning: Algorithms, Real- World Applications and Research Directions,” SN Computer Science, vol. 2, no. 3, pp. 1–21, Mar. 2021, doi: https://doi.org/10.1007/s42979-021-00592-x. [15] Yuxi. (Hayden). Liu, Python Machine Learning by Example Build Intelligent Systems Using Python, TensorFlow 2, Pytorch, and Scikit-Learn, 3rd Edition. Birmingham: Packt Publishing, Limited, 2020. [16] S. L. Mirtaheri and R. Shahbazian, Machine Learning Theory to Applications. CRC Press, 2022. doi: https://doi.org/10.1201/9781003119258. [17] A. H. Khassawneh et al., “Prevalence and Predictors of Thyroid Dysfunction Among Type 2 Diabetic Patients: A Case–Control Study,” International Journal of General Medicine, vol. Volume 13, pp. 803–816, Oct. 2020, doi: https://doi.org/10.2147/ijgm.s273900. [18] S. Poudel, “A Study of Disease Diagnosis Using Machine Learning,” Medical Sciences Forum, vol. 10, no. 1, p. 8, Feb. 2022, doi: https://doi.org/10.3390/iech2022-12311. [19] Dudkina, I. Meniailov, K. Bazilevych, S. Krivtsov, and A. Tkachenko, “Classification and Prediction of Diabetes Disease using Decision Tree Method,” Symposium on Information Technologies & Applied Sciences, Bratislava, Slovakia , Mar. 2021. Available: https://ceur-ws.org/Vol-2824/paper16.pdf [20] C. Yadav and S. Pal, “Prediction of thyroid disease using decision tree ensemble method,” Human- Intelligent Systems Integration, vol. 2, no. 1–4, pp. 89–95, Apr. 2020, doi: https://doi.org/10.1007/s42454-020-00006-y. [21] P. Duggal and S. Shukla, "Prediction Of Thyroid Disorders Using Advanced Machine Learning Techniques," 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2020, pp. 670-675, doi: https://doi.org/10.1109/Confluence47617.2020.9058 102. [22] G. Chaubey, D. Bisen, S. Arjaria, and V. Yadav, “Thyroid Disease Prediction Using Machine Learning Approaches,” National Academy Science https://doi.org/10.31449/inf.v49i12.6690 Informatica 49 (2025) 115–126 115 Motor Imagery Detection in ECG Signals Using Wavelet Packet Decomposition and Multiscale Convolutional Neural Networks Khawla Hussein Ali Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah, Iraq E-Mail: khawla.ali@uobasrah.edu.iq Keywords: Wavelet decomposition, multi-scale CNN, ECG signals, motor imagery Received: July 16, 2024 Detecting motor imagery from electrocardiographic (ECG) signals is complex but crucial in developing advanced neuroprosthetic devices and brain-computer interface (BCI) systems. In most cases, linear mod- els applied using conventional methods are not appropriate for the time-varying and non-linear nature represented by the ECG characteristics, resulting in weak performances. This research addresses this problem, combining Wavelet Packet Decomposition and Multi-Scale Convolutional Neural Networks to improve the feature extraction mechanism and classification accuracy. ECG data is pre-processed from the PhysioNet EEG Motor Movement/Imagery Dataset to remove noise and standardize signals. WPD is thus applied to decompose the signals into detailed frequency components to be input as features in the proposed Multi-Scale CNN. Different kernel sizes are implemented in these parallel convolutional layers to learn complicated features at various hierarchical resolutions. The proposed architecture is evaluated using performance parameters such as accuracy 92%, precision 89%, recall 93%, F1 score 91%, and ROC- AUC 95%. These results showed that the model outperformed the earlier-used traditional methods, such as Support Vector Machines (SVM) and Random Forests, better-detecting motor imagery. This research emphasizes the integrative power of advanced signal processing techniques with deep learning in analyzing biomedical signals, providing a powerful solution to advancing neuroprosthetic and BCI technologies. Povzetek: Študija dokazuje učinkovitost kombinacije obdelave signalov in globokega učenja za analizo biomedicinskih signalov. Uporabljena je valčna paketna dekompozicije in večskalna konvolucijska nevron- ska mreža za detekcijo motorične imaginacije v signalih EKG. 1 Introduction terfaces [3]. Neuroprosthetic devices work best when the intended motor actions are accurately detected, so the 1.1 Background on motor imagery in ECG machine works appropriately to aid a person with motor signals deficits. At the same time, BCIs must provide this intended signal from brain activities accurately into control signals of Motor imagery is a cognitive process by which one in- high precision to guarantee reliability and user satisfaction. ternally represents motion without physically carrying it Faulty interpretations and actionsmay occur if the detection out [1]. This mental process class engages neural pathways needs to be more accurate, making the advantages of such closely related to those involved during actual movements, highly developed systems irrelevant. Thus, robust methods a fact that can be picked up in various physiological signals. for detecting motor imagery in ECG signals are required to For example, in electrocardiogram (ECG) signals, motor improve these technologies further. imagery detection can provide insights into neural activity related to motor functions [2]. Since ECG signals, unlike other neurophysiological signals, are mainly used for mon- 1.3 Introduction to wavelet packet itoring cardiac health, they find interest in detecting motor decomposition (WPD) imagery based on their accessibility and non-invasiveness in the recording. Wavelet Packet Decomposition (WPD) is an advanced method of processing signals that decompose a signal in its constituent frequency components [4]. It further de- 1.2 Importance of accurate detection and composes in detail compared to the conventional wavelet classification transform, which focuses on a specific set of frequency bands. The advantages of WPD include multiresolution Accurately detecting and classifying motor imagery from analysis with both approximation and detail coefficients de- ECG signals are essential for various emerging technolo- composed at every level, hence being very useful for ana- gies, particularly neuroprosthetics and brain-computer in- lyzing non-stationary signals like ECG, where signal prop- 116 Informatica 49 (2025) 115–126 K.H. Ali erties can change over time. The baseline model used in motor imagery and provides a noninvasive, accessible way this study combinesWavelet Packet Decomposition (WPD) of developing neuroprosthetic devices and BCI systems. and a Multiscale Convolutional Neural Network (CNN), where WPD decomposes ECG signals into multiple fre- quency bands to extract features across various scales and 1.5.3 Comprehensive evaluation resolutions, and the multiscale CNN processes these fea- The proposed detailed experimental evaluation includes tures to capture patterns of different sizes and temporal fre- preprocessing steps, feature extraction, model training, quencies for improved classification accuracy. Themodel’s and performance assessment; such a roadmap could be performance is evaluated using metrics such as accuracy, handy for implementing and validating similar methodolo- sensitivity, specificity, and F1-score, providing a basis for gies. Multiple metrics used for assessment and comparison comparison with modified versions of the model to assess against traditional methods will ensure the robustness and the impact of each component. This ablation study aims comprehensiveness of the evaluation for the proposed ap- to determine the contribution of Wavelet Packet Decompo- proach. sition (WPD) when used with a Multiscale Convolutional Neural Network (CNN) for motor imagery classification in ECG signals. By systematically removing or altering the WPD component, we aim to understand its significance and 2 Related work and SOTA how it enhances the performance of the Multiscale CNN. experiment 1.4 Motivation for using multiscale CNN 2.1 Previous approaches to motor imagery Thus, Wavelet Packet Decomposition coupled with a Mul- detection tiscale CNN represents a practical approach to best deal Although most of the research has been on detecting mo- with the feature extraction task. CNNs are among the most tor imagery with electroencephalogram (EEG) signals, re- prevalent and well-known models for automatically learn- cently, an emerging interest has been in using the noninva- ing features in a hierarchical fashion from raw data that can sive and easily obtainable ECG signal [6]. Priori methods handle complicated pattern recognition tasks with supreme have thus focused on feature extraction for detecting motor grace [5]. The proposed multiscale technique of CNN is a imagery from the ECG signal, followed by classification multiple-parallel convolutional layer with different kernel using machine learning algorithms. sizes simultaneously to take up the features of multiple res- Time-domain, frequency-domain, and time-frequency olutions. This bears a specific benefit in dealing with the analysis techniques have been applied to extract pertinent variability in ECG signals, as it allows learning fine and features from ECG signals. These techniques generally an- coarse features. The proposed method combinesWPDwith alyze the amplitude and duration characteristics of the ECG the multiscale CNN to utilize the advantages of these two signal. Some features that they use are mean, variance, techniques toward a maximized level of classification ac- skewness, and kurtosis of the signal segments. However, curacy in detecting motor imagery from ECG signals. such features severely affect noise and will fail to capture the underlying patterns associated with motor imagery. 1.5 Contributions Frequency-domain methods involve transforming the ECG signal from its time domain into its frequency domain, This research makes several critical contributions to the for which techniques like the Fourier Transform have been field of biomedical signal processing and brain-computer used [7]. Extracted features such as power spectral den- interface (BCI) systems: sity and spectral entropy have been used. Though these ap- proaches can be informative about the signal’s frequency 1.5.1 Novel methodology components, the transient characteristics of motor imagery Wavelet packet decomposition, coupled with multiscale should be noticed. convolutional neural networks, is a new concept in motor Short-time Fourier Transform (STFT) and Wavelet imagery detection from ECG signals. This approach ef- Transform are prevalent in motor imagery detection. These fectively combines the WPD-based multiresolution analy- techniques offer a compromise by giving information about sis with CNN’s automatic feature learning capabilities for time and frequency. However, STFT gives a fixed res- improved classification performance. olution; thus, it is limited to various cases that present effectively different frequency contents. Wavelet Trans- 1.5.2 Improved detection of motor imagery form gives multiscale analysis and is more suited for non- stationary signals like the ECG. The present study extends the horizon of motor imagery The machine learning techniques in support vector ma- detection to ECG signals compared with the conventional chines, k-NN, and random forests are among the classifiers EEG-based approaches. The findings have demonstrated used in this work on classifying motor imagery based on that an ECG signal can be a suitable alternative for detecting feature extraction. Although this approach has proven to Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 117 be quite promising in practice, its performance depends on ing meaningful features from such signals might be chal- the feature extraction quality and a set of hyperparameters. lenging. The conventional convolutional neural networks applied to ECG signals are usually composed of 1D convolutional 2.2 Use of wavelet transforms in ECG layers. In this case, local patterns in the signal are collected analysis by sliding filters over the signal. The pooling layers sum up these patterns, reducing dimensionality and capturing only Wavelet transforms have widely been applied to ECG sig- the most salient features. At the network’s end, fully con- nal processing because they can analyze non-stationary sig- nected layers take these features and make the final classi- nals [8]. A wavelet transform decomposes a signal into fication. frequency components related to a defined scale. This de- composition can then serve as a detailed analysis of the CNNs’ efficacy in processing biomedical signals comes signal’s time-frequency characteristics. Wavelet transforms from their capability to handle massive datasets and learn are used for various tasks such as denoising, feature extrac- robust features [10]. However, designing a highly effec- tion, and classification in ECG analysis. tive CNN architecture requires consideration of network depth, filter size, and other hyperparameters. CNN design Most ECG signal processing operations involve denois- is hyperparameter-specific, not only computationally ex- ing, which eliminates as many noise artifacts as possible pensive but also requiring abundant training data for per- without changing the critical information content of sig- formance. nals. Wavelet-based denoising is performed by decompos- Some strategies developed to counter this and related ing an ECG signal into wavelet coefficients, thresholding challenges of sparsely labeled data include transfer learn- the noisy coefficients, and reconstructing the signal from ing and data augmentation. Transfer learning involves us- the modified coefficients. This method has proven effec- ing a pre-trained network that has been previously trained tive in denoising ECG to reduce noise while keeping the on tasks similar to the one at hand and fine-tuning it to the salient features intact. target task. This paradigm borrows knowledge from the Wavelet transforms in feature extraction embrace mul- source task to reduce the quantity of labeled data needed. tiscale analysis, capturing both high-frequency details and Data augmentation techniques, including adding noise and low-frequency trends. Features like wavelet coefficients, shifting and scaling the signal, are included to add some entropy, and wavelet energy have been extracted from this degree of variability in the training data, thus improving time series data and used for classification tasks. Such fea- generalization capacity within a given network. tures characterize both the spectral and temporal character- A study incorporating CardiacNet was conducted to istics of the ECG signal. identify and categorize cardiac arrhythmia based on ECG The Wavelet Packet Decomposition (WPD) generalizes signals and elaborate on the constraints of traditional pre- theWavelet Transform technique so that decomposition can diction systems and AI methods to identify arrhythmia due be performed on approximation and detail coefficients at to poor feature extraction correctly. The approach applied every level [9]. ECG signal processing uses WPD to ex- pre-processing on ECG data by eliminating non-linearities, tract very informative features of classification tasks. By feature extraction using unsupervised machine learning- implementing the WPD technique, detecting the subtle pat- based PCA (UML-PCA), and feature selection with im- tern associated with motor imagery is improved by analyz- proved Harris Hawk’s Optimization (IHHO). CCNN was ing the signal at different scales and frequencies. then used to classify and yield impressive quantitative mea- sures such as accuracy of 97.57%, sensitivity of 98.29%, 2.3 Convolutional neural networks in and MCC value of 98.17% [11]. biomedical signal processing An essential step in raw ECG signal preprocessing is noise and artifact removal, which may affect classification Convolutional neural networks (CNNs) are the break- model performance. Other processes in the preprocessing through in biomedical signal processing because they can stage were baseline wandering removal, noise filtering, and automatically learn hierarchical features from raw data. normalization. CNNs consist of convolutional, pooling, and fully con- A high-pass filter technique was employed to remove nected layers. Each layer takes the input signal and extracts Baseline wandering as it consists of low-frequency compo- increasingly complex features, helping the network catch nents [12]. Bandpass filtration removed noise and retained intricate patterns. only the frequency components relevant to the ECG signals. CNNs have been broadly applied in the analysis of ECG It was done to normalize the ECG signals into a standard signals for arrhythmia detection, ischemia detection, and range of values—that is, every sample was uniform. the classification of several other cardiac diseases. One of the significant advantages of CNNs is the automatic 2.4 Wavelet packet decomposition feature-extraction feature; therefore, the need to perform manual feature engineering can be ruled out. This is This work has decomposed preprocessed ECG signals into very useful in biomedical signal processing since extract- frequency components using the Wavelet Packet Decom- 118 Informatica 49 (2025) 115–126 K.H. Ali Figure 1: Sample ECG signal from the PhysioNet EEG motor movement/imagery dataset Figure 2: Wavelet packet decomposition of an ECG signal showing decomposition levels and corresponding frequency components position technique (WPD). WPD can give a better analysis 3 Methodology than the traditional wavelet transformation because it de- composes approximation and detail coefficients at all lev- 3.1 Data acquisition and preprocessing els. Due to its multiresolution property, this is essential in capturing the transient characteristics of the motor imagery The database used for this study was obtained from the signals. PhysioNet database, specifically the EEG Motor Move- ment/Imagery Dataset. The dataset includes a set of ECG recordings of multiple subjects carrying out motor imagery tasks. Each record is annotated concerning whether or not there was motor imagery—these were used as ground truth and MCC value of 98.17% [11]. CardiacNet uses a dif- against which the classification task results were compared. ferent technique to detect motor imagery from recorded ECG signals. By integrating the Wavelet Packet Decompo- The choice of wavelet function and the decomposition sition (WPD) with the Multiscale CNN, the present study level are the most basic but essential parameters in WPD. seeks to optimize the classification of dynamic and non- The Daubechies 4 wavelet was chosen to do this because it linear signal features previously unexplored and additional was best suited for the analysis of ECG signals. Following ECG uses beyond cardiac health. the same thesis, it is decomposed to level 4, giving an ade- Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 119 Figure 3: Flowchart of the proposed model from data acquisition to performance comparison quate compromise between the complexity of computations forming binary classification for motor imagery. and the level of detail. Wavelet-packet decomposition is based on decomposing any ECG signal into a set of wavelet coefficients at differ- 3.3 Implementation details ent resolutions. These wavelet coefficients represent the Implementation was done in Python and its associated li- ECG signal at various resolutions. The obtained coeffi- braries, including the PyWavelets library used in wavelet cients were used as features for the classification model. packet decomposition and the TensorFlow/Keras library for A feature set includes the mean, variance, and energy of modeling and training. Figure 3: Flowchart of the proposed wavelet coefficients at each level of decomposition, which model from data acquisition to performance comparison. gives a representative of ECG. The CNN. The ECG signals were pre-processed using Wavelets and decomposed after in the CNN [13]. The MS- 3.2 Multiscale convolutional neural network CNN was trained using a Binary cross-entropy loss func- (CNN) tion and Adam optimizer because of its high performance and efficiency. The training was split into dataset training- The Multiscale Convolutional Neural Network constitutes validation sets, and early stopping was implemented to the core part of the methodology, which aims to increase avoid overfitting. The accuracy, precision, recall, and F1 feature extraction from the granularities, from fine-grained score metrics assessed model performance. to coarse ones. After the convolutional layer, the follow- ing layer used for representation is a pooling layer respon- 3.4 Algorithm and flowchart sible for reducing the dimensionality of the features while retaining the most salient parts. The outputs of these par- The proposed model for motor imagery detection in ECG allel pathways are multiple kernel sizes, enabling the net- signals starts by acquiring data from the PhysioNet EEG work to capture features at different resolutions. The mul- Motor Movement/Imagery dataset. The raw ECG signals tiscale CNN is designed with three parallel convolutional are then pre-processed, followed by baseline wandering re- pathways. The first layer in each path is a 1D convolutional moval using a high-pass filter, noise filtering using a band- layer. Kernel sizes of 3, 5, and 7 were applied to extract fea- pass filter, and, lastly, normalization to standardize the sig- tures at different scales, then concatenated and input into a nal range. Next, the Wavelet Packet Decomposition pro- few fully connected layers for the final classification. This cess uses level 4 of the Daubechies 4 (db4) wavelet func- neural network uses ReLU-activated hidden layers and con- tions to decompose the ECG signals into sub-high, high, volutional layers with a sigmoid-activated output layer, per- and low-frequency bands. Features are then acquired from 120 Informatica 49 (2025) 115–126 K.H. Ali Figure 4: The architecture of the multi-scale CNN wavelet coefficients at each level: mean, variance, and en- based Multiscale CNN approach on motor imagery detec- ergy. These features would be given as input to a Multi- tion in ECG signals using a publicly available dataset. The Scale Convolutional Neural Network designedwith parallel dataset consists of ECG recordings from several subjects convolutional layers of filter sizes 3, 5, and 7. The output performing motor imagery tasks. From each ECG record- of each conv layer is ReLU activated and then subjected to ing, ground truths are available on the presence or absence max-pooling. The subsequent is shown in Figure 3. of motor imagery, which are the results to be achieved Two outputs are concatenated and passed through fully within a classification task. connected layers into a sigmoid-activated output layer to Dividing the data set into a training, validation, and test implement the final binary classification. Using the Adam set ensures the model’s evaluation is all-around. The data optimization algorithm, the network trains against these consisted of 70% for training, 15% for the validation, and data with binary cross-entropy loss in the back end; it has 15% for the test set. This partitioning would ensure that preactivation stops concerning the loss of the hold-out set. models trained on a diversified set of samples are evalu- Performance evaluation metrics include accuracy, preci- ated on completely unseen data to estimate generalization sion, recall, F1-score, and ROC-AUC, with comparisons to capability. highlight its superior performance over traditional methods such as SVM and Random Forest. 4.1.1 Experiment 1: removing wavelet packet decomposition (WPD) 4 Experiments and analysis TheWPD step was removed in this experiment, and the raw 4.1 Ablation experiments ECG signals were directly fed into the Multiscale CNN. The expected impact was that without WPD, the model A series of experiments were conducted to evaluate the per- processes only the raw signal, potentially missing critical formance of the proposed Wavelet Packet Decomposition- frequency-specific features. The multiscale CNN still at- Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 121 tempts to capture features at different scales but lacks the 4.4 Performance metrics enriched input from WPD. The effectiveness of the proposed method was evaluated based on various metrics, such as accuracy, precision, re- 4.1.2 Experiment 2: using standard CNN instead of call, F1 score, and area under the Receiver Operating Char- multiscale CNN acteristic curve; these measures assessed how well the In the second experiment’s model set-up, WPD was re- model could discriminate motor imagery within ECG sig- tained, but the Multiscale CNN was replaced with a stan- nals. dard CNN that processes the signal at a single scale. The •Accuracymeasures howwell the model performs over- expected impact was that the standard CNN may not fully all by calculating the true positives and negatives ratio exploit the multiscale features provided byWPD, leading to among all the predictions. suboptimal feature extraction and classification. Themodel • Precision reflects the proportion of true positives to the might Figure 4: The architecture of the Multi-Scale CNN total number of optimistic predictions the model made [15]. perform better than the raw signal input but is expected to •Recall (sensitivity) refers to the model’s ability to iden- underperform compared to the baseline multiscale CNN. tify all relevant instances (true positives) accurately. • F1-score is the harmonic mean of the precision and re- 4.1.3 Experiment 3: combined removal of WPD and call. It provides a single score that balances both concerns. multiscale CNN • ROC-AUC measures the model’s performance overall classification thresholds; thus, higher values indicate better This experiment removed WPD and the CNN’s multiscale discrimination. structure, producing a standard CNN processing raw ECG signals. The experiment serves as a control, representing the most straightforward model setup. The expected out- 5 Results come is the poorest performance, as the model needs more enriched input fromWPD and the capability to process fea- In Experiment 1, removing WPD from the model led to tures at multiple scales. a slight but noticeable decrease in performance measures. This drop shows that WPD is critical in improving the qual- 4.2 Data preprocessing ity of features fed into the Multiscale CNN to boost clas- sification efficiency. This gap partially explains why the As given in the methodology section, raw ECG signals un- model could not adequately reconstruct some frequency- derwent some preprocessing. A high-pass filter with a cut- specific features when WPD was absent; this lack of dis- off frequency of 0.5 Hz was employed to remove baseline tinction landed the model lower scores by failing to differ- wandering. A bandpass filter ranging from 0.5 Hz to 40 entiate motor imagery from other signal components. Hz was used for further filtering, which helped smooth the Replacing the Multiscale CNN with a standard CNN high-frequency noise and retain important frequency com- while maintainingWPD in Experiment 2 resulted in a mod- ponents [14]. Post-preprocessing, these signals were nor- erate decrease in performance. This implies that while malized to a standard range of 0-1 to make them uniform. WPD may still provide helpful multi-resolution features to The signal was preprocessed before running the Wavelet be exploited, its usefulness greatly depends on the subse- Packet Decomposition up to level 4 with a Daubechies 4 quent application of a Multiscale CNN, which training can wavelet. The obtained wavelet coefficients were used to incorporate these features at suitable scales. Due to the build feature vectors for each ECG segment. These are the single-scale characteristic of the standard CNN, it was not inputs of the Multiscale CNN, which treated these different possible to fully utilize such features asWPD to provide the frequency components represented by ECG signal feature best classification results. vectors. The results of Experiment 3 revealed the most signifi- cant decline in all performance metrics when both WPD 4.3 Model training and the Multiscale CNN were removed, leaving a standard CNN to process the raw ECG signals. Such a consider- Multiscale CNN architecture for TensorFlow/Keras: three able decrease also underscores the importance of utilizing parallel convolutional pathways with kernel sizes 3, 5, and WPD and a Multiscale CNN to improve MI detection ac- 7, concatenated features after max-pooling, passed through curacy. The features extracted from WPD offer more en- fully connected layers with the final output layer, which hancements, together with the capacity of the Multiscale uses a sigmoid activation function for binary classification. CNN to scrutinize the features at different scales, which is, This model was compiled using the Adam optimizer therefore, important for accurate and solid classification. and binary cross-entropy loss function. Training was done Table 1 shows the proposed method’s performance as through 100 epochs and a batch size of 32. Early stopping tested on the test set. Compared to traditional approaches, with patience set at ten epochs was applied to avoid overfit- Multiscale CNN better detected motor imagery from the ting bymonitoring the validation loss and stopping training. ECG. 122 Informatica 49 (2025) 115–126 K.H. Ali Figure 5: Bar chart showing the performance metrics tween the actual positive rate and the false positive rate. Table 1: Performance metrics Multiscale CNN model was performed with that of the Metric Value traditional machine learning methods Support Vector Ma- Accuracy 0.92 chines (SVM) and Random Forests (RF) on the same Precision 0.89 dataset and preprocessing steps. Recall 0.93 The summarized results in Table 2 pointed to the F1-Score 0.91 supremacy of the Multiscale CNN. The table for multiple ROC-AUC 0.95 models overview numerous metrics, including accuracy, precision, recall, F1-score, specificity, and Matthews Cor- relation Coefficient (MCC). This comparison also shows how effective and efficient the proposed Multiscale CNN Themodel attained an accuracy of 92%, meaning it could with WPD is compared to other prevailing classifiers like correctly classify 92% of samples. The obtained precision SVM and Random Forest. and recall values were 89% and 93%, respectively, show- High performance could be attributed to the Multiscale ing that the model could correctly distinguished true posi- CNN’s ability to automatically learn hierarchical features tives and maintained a low false positive rate. The F1 Score of the wavelet coefficients, which represent fine-grained was 0.91, reflecting a good balance between precision and and coarse patterns crucial for discriminating between mo- recall. The ROC-AUC of 0.95 indicated excellent discrim- tor imagery types [16]. ination ability across a range of classifications. The proposed approach addresses a significant gap in the Figure 6 shows a confusion matrix that shows precisely field of ECG-based signal processing by extending its ap- how the model performed—the quantity of true positive, plication from traditional cardiac health monitoring to mo- true negative, false positive, and false pessimistic predic- tor imagery detection. Models such as CardiacNet are ac- tions. The confusion matrix depicted many accurate opti- curate in detecting cardiac arrhythmias but are centered on mistic and pessimistic predictions, with very few. disease classification and not the detection of cognitive pro- The ROC curve showed that the model could maintain cesses like motor imagery. Non-invasive motor imagery a high actual positive rate with a low false positive rate; based on ECG signals is still unexplored and opens a vast actually, 0.95 under the curve shows good performance. possibility for investigating cognitive processes using neu- ral signals. This approach meets a significant requirement 5.1 Comparison with traditional methods in BCI and neuroprosthetics, where an efficient and cost- effective identification ofmovement goals improves usabil- To further substantiate the efficacy of the approach, the per- ity and functionality. formance comparison of the false positives and negatives, Unlike traditional methods like Support VectorMachines thus substantiating the model’s robustness. (SVM) and Random Forests, the proposed method offers The ROC curve in Figure 7 provided more information distinct advantages through its use of a Multiscale Con- on model performance, indicating a good separation be- volutional Neural Network (CNN) combined with Wavelet Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 123 Packet Decomposition (WPD). Furthermore, most ear- lier traditional machine methods require feature extrac- tion through experience, which may not artistically depict the ECG signal’s subtle patterns, especially during the MI detection phase. Instead, automatic hierarchical feature learning of multiscale CNN and the WPD facilitates multi- resolution signal analysis. Hence, the themodel can capture high-level and low-level details at multiscale and multires- olution, which improves the classification ofmotor imagery tasks. In addition, problems associated with the time and fre- quency domains, like the Fourier Transform used conven- tionally, are well addressed by the proposed method. Such traditional methods fail to capture short-term features and non-stationary aspects inherent in the signals used to imag- ine motor control. Thus, the proposed WPD approach cap- Figure 7: ROC curve tures more refined time-frequency features by the multi- scale CNN to distinguish motor imagery and signal noise more accurately from other irrelevant components. WPD enhances the input features in that it gives pre- cise frequency details. On the other hand, the Multiscale 6 Discussion and future works CNN operates these features at different scales, thus im- proving its ability to learn complex patterns that enhance 6.1 Discussion the classification result. The experiments also show that standard CNN is suboptimal as it does not reproduce the The experiments reveal critical insights into the effective- results even when WPD is applied. This implies that the ness of combining Wavelet Packet Decomposition (WPD) multiscale framework of using different kernel sizes to ob- with a Multiscale Convolutional Neural Network (CNN) tain features of various scales is essential. Generally, these for motor imagery detection in ECG signals. This substan- results demonstrate that WPD is beneficial in detecting MI tial degradation of performance indicates the importance of from ECG signals, so it is for Multiscale CNN. WPD in extracting significant frequency band features from ECG data required for classification. The characteristic of WPD is that the signals can be analyzed at different reso- 6.2 Generalizable capabilities lutions; this is beneficial when dealing with transient/non- stationary signal characteristics that would typically go un- The generalization capabilities of a model are critical in as- noticed when utilizingmost of the conventional signal anal- sessing its robustness and applicability across various sub- ysis techniques. Moreover, the combination of WPD and jects and datasets. This study used independent dataset the Multiscale CNN can be observed in better baseline validation to determine the model’s predictive accuracy on model performance. new cases not part of the training dataset. The indepen- dent dataset used for validation differed from the one used in the training process, and there was no intersection be- tween these two datasets. To check this, validation was conducted using the k-fold cross-validation method, where the data set was split into five sets (k=5) such that subjects were distributed across all the five data splits. It contributes to mitigating inter-subject variability, a significant issue in motor imagery tasks, as different patterns of ECG signals can influence a model. The performance was almost steady across the folds, which shows the ability of the model to perform well for new subjects in the dataset. However, exercising the model with a validation tech- nique other than K-fold cross-validation would be more meaningful, for instance, testing themodel on a new data set not used in the training phase. A validation approach with an independent test set would also test the model’s ability to generalize to highly different conditions if the independent Figure 6: The confusion matrix dataset differs in signal quality, subject characteristics, or data acquisition techniques. For instance, using an exter- 124 Informatica 49 (2025) 115–126 K.H. Ali Table 2: Comparison with traditional methods Metric Accuracy Precision Recall F1-Score ROC-AUC Recall SVM 0.85 0.83 0.86 0.84 0.88 0.88 Random Forest 0.87 0.84 0.88 0.86 0.9 0.9 Multiscale CNN 0.92 0.89 0.93 0.91 0.95 0.95 MCC 98.17 - - - - - Figure 8: Comparison with traditional methods nal set we obtain from a different recording protocol would and generalize these findings: help determine how much the proposed model hampers or inspires adaptability for different data characteristics. If loss occurs in these cases, it may show the aspects in which 6.3.1 Exploration of alternative wavelet functions: the model has to be optimized to be generalized better. The Daubechies 4 dB4 wavelet was suitable for this work; To strengthen the evaluation, it is recommended to ex- some studies on other wavelet functions and their impact on pand the main parameters, including accuracy, precision, the features extracted would addmore value. Other wavelet and relative, especially in cases of imbalance in collection. functions capture unique characteristics in the signal that It should be noted that using values such as specificity or could lead to further improvements in classification accu- MCC will give a better picture of the effectiveness of the racy. examined model. Although MCC was not used in the cur- rent study, it is a valuable metric considering true posi- tives, false positives, and false negatives, thus offering in- 6.3.2 Multi-modal data fusion: sight into the model’s performance in imbalanced condi- Such ECG signals can be combined with other physiologi- tions. Future work could incorporate these additional met- cal signals, such as EEG and EMG, further to improve the rics and further investigate techniques such as domain adap- robustness and accuracy of motor imagery detection [17]. tation to enhance the model’s applicability across different Multi-modal fusion methods combine complementary in- data sources. formation from different sources to describe a motor imag- inary event completely. 6.3 Directions for future work 6.3.3 Advanced deep learning architectures: While the proposed methodology has made considerable strides in motor imagery detection research, there are sev- Research about advanced deep learning architectures based eral avenues of inquiry forward that would further enhance on RNN and attention mechanisms can achieve even better Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 125 performance for motor imagery detection [18]. These ar- motor imagery classification performance. Removing ei- chitectures have confounders of temporal dependencies and ther component significantly declines model accuracy, il- contextual information that could improve the detection of lustrating their combined importance in the overall frame- the subtle pattern of ECG signals. work. This research presents a new motor imagery de- tection scheme from ECG signals using Wavelet Packet 6.3.4 Real-time implementation: Decomposition and Multiscale Convolutional Neural Net- works. The methodology indeed enhances the classifica- Based on the proposed methodology, designing real-time tion accuracy to a large extent; hence, it needs no explicit systems for motor imagery detection must move the work mention. The results also indicate that an ECG signal is fea- toward a practical application. Implementing the model sible in motor imagery detection as a noninvasive and eas- in time-real environmental systems and testing its perfor- ily accessed technique for developing neuroprosthetic de- mance under dynamic conditions becomes crucial for de- vices and BCI systems. These contributions help support ploying the technology in neuroprosthetic devices and BCI further studies in this area of research, which has enormous systems. room for further improvement and exploration. On the way ahead, addressing the suggested directions for future work 6.3.5 Large-scale validation: will continuously advance the field and, eventually, achieve more effective and dependable technologies related to mo- Further large-scale validation is required to generalize the tor imagery detection. findings and check the robustness of the proposed approach This study’s methodology will be open-sourced to en- with datasets and subjects under study. The model will be sure reproducibility for other research groups to extend and tested experimentally across different populations, tasks, continue their collaboration in biomedical signal process- and recording conditions to estimate its reliability and scal- ing and brain-computer interface. In the future, with fur- ability. ther research in this area, the full potential of MI detection using the ECG signal can be achieved to benefit people suf- 6.3.6 Transfer learning and domain adaptation: fering from motor impairments and progress in neuropros- thetic/BCI capabilities. If the model is adapted to different domains and tasks us- ing transfer learning, it can be flexible. Domain adaptation methods may alter the model’s capability towards gener- alizing new data, at least with its reusability on minimal Acknowledgment retraining. I am grateful to the University of Basrah for providing the resources and environment needed to complete this re- 6.3.7 User-centric design: search. I alsowant to thank the Computer ScienceMembers Provisions for user feedback in the motor imagery detection for their collaborative spirit and helpful discussions, which system and the development of user-centric interfaces will contributed significantly to the ideas presented here. likely improve its usability and acceptance [19]. Knowl- edge of the desires and preferences of the end-user, such as a person with a motor impairment, may guide the develop- References ment of more intuitive and effective BCI systems. [1] P. Bach, C. Frank, and W. Kunde, “Why mo- 6.3.8 Ethical considerations and data privacy: tor imagery is not really motoric: Towards a re-conceptualization in terms of effect- Ethical considerations and data privacy are paramount in based action control,” Psychological Re- collecting, processing, and using physiological signals. search, vol. 88, no. 6, pp. 1790–1804, 2024. Frameworks on ethical data handling and compliance with https://doi.org/10.1007/s00426-022-01773-w. privacy regulations will be essential to ensure the respon- sible deployment of technologies for motor imagery detec- [2] A. Saibene, M. Caglioni, S. Corchs, and F. Gas- tion [20]. parini, “Eeg-based bcis on motor imagery paradigm using wearable technologies: a systematic re- view,” Sensors, vol. 23, no. 5, p. 2798, 2023. 7 Conclusion https://doi.org/10.3390/s23052798. In conclusion, the ablation study confirms that Wavelet [3] A. Palumbo, V. Gramigna, B. Calabrese, and N. Ielpo, Packet Decomposition and Multiscale CNN are integral “Motor-imagery eeg-based bcis in wheelchair move- components of the proposed method. WPD provides a ment and control: A systematic literature re- rich, multi-scale representation of the ECG signals, which, view,” Sensors, vol. 21, no. 18, p. 6285, 2021. when processed by a Multiscale CNN, leads to superior https://doi.org/10.3390/s21186285. 126 Informatica 49 (2025) 115–126 K.H. Ali [4] W. Cabrel, G. T. Mumanikidzwa, J. Shen, and [14] R. Y. L. Al-Taai and X. Wu, “Speech enhance- Y. Yan, “Enhanced fourier transform using ment for hearing impaired based on bandpass wavelet packet decomposition,” Journal of Sen- filters and a compound deep denoising autoen- sor Technology, vol. 14, no. 1, pp. 1–15, 2024. coder,” Symmetry, vol. 13, no. 8, p. 1310, 2021. https://doi.org/10.4236/jst.2024.141001. https://doi.org/10.3390/sym13081310. [5] M. A. Qureshi, K. N. Qureshi, G. Jeon, and F. Pic- [15] R. Yacouby and D. Axman, “Probabilistic exten- cialli, “Deep learning-based ambient assisted living sion of precision, recall, and f1 score for more for self-management of cardiovascular conditions,” thorough evaluation of classification models,” in Neural Computing and Applications, pp. 1–19, 2022. Proceedings of the first workshop on evaluation https://doi.org/10.1007/s00521-020-05678-w. and comparison of NLP systems, pp. 79–91, 2020. [6] G. Aggarwal and Y. Wei, “Non-invasive fetal https://aclanthology.org/2020.eval4nlp-1.9. electrocardiogram monitoring techniques: Poten- [16] J. Wen, Y. Li, M. Fang, L. Zhu, D. D. Feng, and tial and future research opportunities in smart tex- P. Li, “Fine-grained and multiple classification for tiles,” Signals, vol. 2, no. 3, pp. 392–412, 2021. alzheimer’s disease with wavelet convolution unit https://doi.org/10.3390/signals2030025. network,” IEEE Transactions on Biomedical En- [7] A. K. Singh and S. Krishnan, “Ecg signal feature ex- gineering, vol. 70, no. 9, pp. 2592–2603, 2023. traction trends in methods and applications,” BioMed- https://doi.org/10.1109/tbme.2023.3256042. ical Engineering OnLine, vol. 22, no. 1, p. 22, 2023. https://doi.org/10.1186/s12938-023-01075-1. [17] B. Rim, N.-J. Sung, S. Min, and M. Hong, “Deep learning in physiological signal data: A [8] C. Zhuang and P. Liao, “An improved empirical survey,” Sensors, vol. 20, no. 4, p. 969, 2020. wavelet transform for noisy and non-stationary signal https://doi.org/10.3390/s20040969. processing,” IEEE Access, vol. 8, pp. 24484–24494, 2020. https://doi.org/10.1109/access.2020.2968851. [18] J. Mladenović, “Standardization of protocol design for user training in eeg-based brain–computer in- [9] H. Wang, W. Wang, Y. Du, and D. Xu, “Examin- terface,” Journal of Neural Engineering, vol. 18, ing the applicability of wavelet packet decomposi- no. 1, p. 011003, 2021. https://doi.org/10.1088/1741- tion on different forecasting models in annual rain- 2552/abcc7d. fall prediction,”Water, vol. 13, no. 15, p. 1997, 2021. https://doi.org/10.3390/w13151997. [19] I. Y. Zhao, Y. X. Ma, M. W. C. Yu, J. Liu, W. N. Dong, Q. Pang, X. Q. Lu, A. Molassiotis, E. Hol- [10] M. A. Abdou, “Literature review: Efficient royd, and C. W. W. Wong, “Ethics, integrity, and deep neural networks techniques for medical retributions of digital detection surveillance systems image analysis,” Neural Computing and Appli- for infectious diseases: systematic literature review,” cations, vol. 34, no. 8, pp. 5791–5812, 2022. Journal of medical Internet research, vol. 23, no. 10, https://doi.org/10.1007/s00521-022-06960-9. p. e32328, 2021. https://doi.org/10.2196/32328. [11] K. Srinivas, V. Ch, S. R. Borra, K. S. Raju, G. R. K. Rao, K. V. Satyanarayana, and P. M. [20] Y. Hou, S. Jia, X. Lun, S. Zhang, T. Chen, Kumar, “Cardiacnet: Cardiac arrhythmia detec- F. Wang, and J. Lv, “Deep feature mining via the tion and classification using unsupervised learn- attention-based bidirectional long short term mem- ing based optimal feature selection with custom ory graph convolutional neural network for human cnn model,” Informatica, vol. 48, no. 2, 2024. motor imagery recognition,” Frontiers in Bioengi- https://doi.org/10.31449/inf.v48i2.5076. neering and Biotechnology, vol. 9, p. 706229, 2022. https://doi.org/10.3389/fbioe.2021.706229. [12] M. Bejani, E. Luque-Buzo, A. Burlaka-Petrash, J. A. Gómez-García, J. D. Arias-Londoño, F. Grandas- Pérez, J. Grajal, and J. I. Godino-Llorente, “Base- line wander removal applied to smooth pursuit eye movements from parkinsonian patients,” IEEE Access, vol. 11, pp. 32119–32133, 2023. https://doi.org/10.1109/access.2023.3263045. [13] C. Pravin and V. Ojha, “A novel ecg signal de- noising filter selection algorithm based on conven- tional neural networks,” in 2020 19th IEEE Inter- national Conference on Machine Learning and Ap- plications (ICMLA), pp. 1094–1100, IEEE, 2020. https://doi.org/10.1109/icmla51294.2020.00176. https://doi.org/10.31449/inf.v49i12.7558 Informatica 49 (2025) 127–144 127 Online Criminal Behavior Recognition Based on CNNH and MCNN- LSTM Jingwei Hu Department of Legal Practice, Shandong Judicial Police Vocational College, Jinan 250200, China E-mail: 17866981007@163.com Keywords: anonymous networks, traffic segmentation, convolutional neural networks, online crime, long short-term memory networks Received: November 10, 2024 In light of the proliferation of cybercrimes, the effective identification and mitigation of such online criminal activities has emerged as a significant challenge within the domain of network security. Therefore, this study introduces dilated convolution technology, self-attention mechanism, convolutional neural network and long short-term memory network, and proposes an overlapping traffic recognition model based on improved convolutional neural network and an online crime recognition model with long short-term memory network. In the traffic segmentation model test, the recall rate, F1 value, and error rate of the model under normal traffic conditions were 91.43%, 93.46%, and 92.43%, respectively. The error rate was 4.15%. The accuracy of the online crime recognition model for malware propagation and illegal transactions was 96.54% and 92.87% respectively. In the concept drift test, when the training time and test time interval was 60 days, the accuracy of the model was 48.67% higher than that of the long short-term memory network. Compared with the mainstream framework and traditional methods, its accuracy in high traffic scenarios was 94.78%, the error rate was 3.89%, and the P-value was < 0.05. In the final simulation test, the model could effectively identify illegal software transactions. The results show that the proposed model has high accuracy and strong generalization ability in identifying overlapping traffic and website fingerprint crimes, and effectively improves the detection ability of criminal activities in anonymous networks. Povzetek: Predstavljen je model za prepoznavanje spletnega kriminala, ki temelji na konvolucijskih in LSTM nevronskih mrežah in z uporabo tehnologije razredčene konvolucije in mehanizma samo- pozornosti dosega visoko točnost pri segmentaciji prometa in prepoznavanju spletnih kaznivih dejanj. Učinkovito izboljšuje zaznavanje kriminalnih aktivnosti v anonimnih omrežjih. 1 Introduction key issue that needs to be addressed urgently. At the same time, the industry's research on anonymous network With the rapid development of Internet technology, the traffic analysis and criminal behavior identification is increasing complexity and openness of cyberspace have also deepening and developing. Wang Y et al. proposed a brought unprecedented opportunities and challenges to deep learning-based intrusion detection system SMSO- society [1]. The emergence and popularization of CNN to address the security risks and privacy issues anonymous networks provide an important guarantee for caused by the transmission of large amounts of data in users' privacy protection in the network. However, it also wireless networks. The system combined the spider makes some wrongdoers utilize anonymous networks to monkey swarm optimization algorithm and CNN to engage in various criminalactivities, among which the improve the ability to identify network attacks. The anonymous communication system represented by the results showed that the system was superior to LSTM and onion router (Tor) is particularly typical [2]. Tor network other methods in terms of accuracy [4]. Gu X et al. realizes the high anonymity of user identity and proposed an online defense strategy based on non- communication content through multi-layer encryption targeted adversarial patches to address the limitations of and node forwarding techniques, which is widely used for existing WF attack defense methods in practical legitimate purposes such as protecting user privacy and applications. Experiments indicated that the model preventing network surveillance. However, the achieved 95.50% defense accuracy and 12.57% time anonymity of Tor network is also used by some criminals overhead in real-time traffic [5]. To address the problem to circumvent legal supervision and become a hotbed for of high dimensionality of cybercrime data, Rawat R et al. cybercriminal activities, such as illegal trading, malware proposed a feature selection method based on multi- distribution, hacking and other behaviors [3]. In this objective evolutionary algorithm (MOEA) and combined context, applying overlapping traffic segmentation and it with NSGA-II to reduce data dimensionality and website fingerprinting (WF) technology to collect identify the most relevant features. The experimental potential criminal evidence and detect abnormal behavior results indicated that this method effectively improves the in an early stage in order to identify and combat online efficiency of data processing [6]. Xian K proposed an criminal behavior in anonymous networks has become a 128 Informatica 49 (2025) 127–144 J. Hu improved WF fingerprint recognition algorithm to solve on experimental assessments, the model performs better the problem of identifying encrypted traffic in virtual in terms of increased accuracy and decreased false alarm private networks. Moreover, it combined it with an rate than popular classification algorithms like long short- optimized capsule neural network model CapsNet to term memory (LSTM) [9]. A CNN intrusion detection classify encrypted traffic. The research results showed technique based on data imbalance was presented by Gan that this method was superior to the random forest B et al. to address the hazards to network security brought algorithm in terms of recognition accuracy and on by recurrent network intrusions. The findings revealed convergence speed, with a recognition rate of 99.98% [7]. that, with an implementation time of 1.42 seconds, the Milad N et al. proposed a blind adversarial perturbation method attained an average accuracy of 98.73% in binary algorithm to address the problem that traffic analysis and multi-classification identification [10]. An intelligent technology based on deep neural networks (DNN) was prediction technique for security performance was vulnerable to adversarial perturbation attacks. By suggested by Xu L et al. to address security concerns in remapping functions to create adversarial perturbations mobile IoT healthcare networks. To increase the CNN independent of network connections, the algorithm was model's adaptability to nonlinear medical large data, the applied to real-time anonymous network traffic analysis study combined a four-branch beginning block with a to defeat WF identification and traffic association four-layer convolution. The results indicated that the classifiers. The experimental results indicated that this intelligent algorithm improved the security performance method was applicable to a variety of traffic classifier prediction accuracy by 20% and had better prediction types. The robustness test of existing countermeasures performance [11]. Yan F. et al. addressed the issue of performed poorly [8]. inadequate training samples and sample class imbalance Because of their superior 2D data processing in intrusion detection systems by proposing an intrusion capabilities, convolutional neural networks (CNNs) are detection system based on migration learning and frequently utilized in image categorization and target integrated learning. The two fundamental learning models recognition applications. Yesodha K et al. suggested a that were selected were Xception and Inception. A tree- novel intrusion detection system incorporating CNN, structured estimator was used to tune the hyperparameters fuzzy temporal rules, and an artificial bee colony [12]. Finally, the study summarizes the research areas, optimization algorithm for the security vulnerability indicator test results, and limitations of the above problem in wireless sensor network communication with literature review. The results are shown in Table 1 below. the goal of improving the classifier's performance. Based Table 1: Literature summary table Study Methodology Performance Metric Shortcomings Not designed for anonymous Wang Y et al. Intrusion detection system based Higher accuracy than LSTM network traffic, struggles with [4] on SMSO-CNN and nearest neighbor algorithms overlapping traffic Focuses on defense tasks, does not Fingerprint defense strategy of 95.50% defense accuracy, Gu X et al. [5] address abnormal behavior recognition online website based on Grad-CAM 12.57% time overhead in anonymous networks Feature selection method based Rawat R et al. Effectively improves data Focused on feature selection, lacks on MOEA combined with NSGA-II [6] processing efficiency real-time traffic analysis for dimensionality reduction SSL VPN traffic recognition Effective for encrypted traffic Xian K et al. Optimized fingerprint recognition rate of 99.98%, recall rate of classification but lacks ability to handle [7] for encrypted traffic based on CapsNet 99.98% complex anonymous traffic patterns Blind adversarial perturbation Demonstrated high Milad N et al. algorithm to defeat DNN-based traffic effectiveness across multiple traffic Robustness testing performs poorly [8] analysis methods classifiers Low false alarm rate, higher Limited to generic network Yesodha K et Intrusion detection system based classification accuracy than long features, cannot handle overlapping al. [9] on FT-ABC-CNN short-term memory networks traffic patterns Lacks temporal feature extraction, Gan B et al. Intrusion detection method based Average binary and multi- struggles with dynamic and complex [10] on CNN-IDMDI class accuracy of 98.73% behaviors Focused on IoT, does not consider Xu L et al. Improved CNN for IoT-enabled Improves prediction accuracy dynamic features of anonymous [11] security performance prediction by 20% networks Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 129 Limited datasets, does not address Yan F et al. Intrusion detection system based Significantly improves overlapping traffic or anonymous [12] on TL-CNN-IDS accuracy network issues Combined with Table 1, most studies have some year have reached hundreds of billions of dollars, which shortcomings while improving the ability of traffic has brought a huge burden to the global economy [13]. classification and behavior recognition. First, the majority The diversity and complexity of cybercrime make of extant methods prioritize comprehensive network traditional legal supervision and law enforcement traffic monitoring, yet they are deficient in their capacity methods face huge challenges in dealing with these to discern intricate and clandestine criminal activities. behaviors. This is particularly problematic in anonymous network Among online crimes, online gambling is a relatively environments, where traditional rule-based matching common type. Criminals attract users to participate in methods are challenging to implement effectively to online gambling activities by setting up and operating detect anomalous behaviors indicative of specific illegal gambling websites. These websites usually rely on criminal activities. Second, many traffic analysis methods anonymous networks, such as the Tor network or often have a high false alarm rate in practical cryptocurrency payments, which greatly improves their applications, which makes it difficult for law enforcement concealment and evades legal supervision. This makes it agencies to respond quickly when faced with massive difficult for law enforcement agencies to track and collect alarm information. In addition, these methods have low evidence, making it difficult to effectively combat these computational efficiency and are difficult to meet the criminal activities. Online prostitution is also an illegal requirements of real-time monitoring of large-scale activity carried out using the Internet. Criminals usually network traffic. In view of this, this study introduces the promote and trade through dark web platforms to avoid hollow convolution technology in CNN and proposes a tracking. In addition, illegal transactions are also an Tor overlapping traffic segmentation model based on the important aspect of online criminal activities. Criminal’s hollow convolution convolutional neural network trade prohibited items such as drugs, weapons, and (CNNH). At the same time, combining the attention counterfeit goods in anonymous markets such as the dark mechanism, CNN, and LSTM, an online criminal web. Such markets often rely on complex encryption behavior recognition model based on multi-core technology and anonymous payment methods to conduct convolutional neural networks and long short-term transactions, making it extremely difficult for law memory networks (MCNN-LSTM) is proposed. The enforcement agencies to investigate. Another important model analyzes network traffic characteristics, accurately form of online crime is the spread of malware. Malware identifies the websites visited by users, and effectively includes ransomware, phishing software, etc., which can identifies anomalous network behaviors related to be spread through various network channels and pose a criminal activities, becoming a powerful auxiliary tool for serious threat to individuals, enterprises and even online crime investigation. government agencies. The spread of malware can not The main contributions of the study are as follows: only steal personal privacy information, but also lead to First, the MCNN-LSTM model based on the combination the loss of core corporate data, and in serious cases, even of multi-core convolution and LSTM network is endanger national security. Every year, the number of proposed. By using multi-module collaborative data leaks caused by malware is huge, and the economic optimization, the modeling capabilities of spatial features losses caused are difficult to estimate [14]. In addition, and time series features are integrated to improve the with the popularization of IoT technology, cyber attacks theoretical framework and method design of network on smart devices are also on the rise, further expanding traffic anomaly detection. Second, the self-attention the scope of online criminal activities. mechanism (SAM) is introduced into the model Faced with these challenges, traditional legal and law architecture, which can dynamically focus on key enforcement methods are unable to cope with the high features and improve the model's adaptability to dynamic concealment and transnational nature of online crimes. environments. Finally, a multi-scale feature extraction Researchers and law enforcement agencies have begun to method is proposed to capture multi-scale spatial features rely on advanced technical means, especially recognition based on the multi-core convolution module. algorithms based on network traffic analysis and deep learning. Through these technologies, researchers can 2 Methods and materials extract useful features from massive amounts of network data to identify and track criminal behavior. In recent 2.1 Online crime and its challenge years, more and more research has been devoted to improving traffic analysis methods to improve the ability Online criminals often use the anonymity, privacy to detect complex cybercrime, especially crimes in protection and global characteristics of the Internet to anonymous networks. In the future, with the further carry out various illegal activities, including illegal development of technology, more intelligent detection gambling, online transactions, money laundering, systems for online criminal behavior will be widely used malware propagation, etc. Studies have shown that the to better cope with the growing network threats. economic losses caused by cybercrime worldwide each 130 Informatica 49 (2025) 127–144 J. Hu Online crime identification is the process of locating protection [15]. Tor achieves anonymity in and assessing possible illegal activity, such as online communications by dividing user communications into gambling, malware distribution, and illegal transactions, multiple data packets, transmitting them through multiple by analyzing network traffic, user behavior patterns, and relay nodes, and encrypting and decrypting the data data characteristics. Unlike traditional network traffic packets. The high privacy of anonymous networks makes analysis, online crime identification focuses more on the them an important tool for legitimate users to protect their complex characteristics of criminal behavior hidden in privacy, but they also provide shelter for various anonymous networks, often involving protocol abuse, criminalactivities, such as online gambling, online encrypted data streams, and anomalous behavior patterns. prostitution, and illegal transactions. These crimes not Anomalous network behavior usually manifests itself in only cause great social harm, but also bring great the form of anomalous network traffic patterns, including challenges to law enforcement agencies in identification but not limited to the following. On the Tor network, and tracking. At the same time, this anonymity also high-frequency, short-duration access patterns may makes traffic analysis and identification more difficult, reflect scanning attacks. Abnormal packet intervals or especially in the case of overlapping traffic. Overlapping excessively large packet sizes may indicate covert traffic segmentation refers to the technique of decoupling channel communications. Sudden changes in traffic and segmenting the traffic when the communication data characteristics may indicate malware activity. In this of multiple users are transmitted simultaneously over the context, this study will explore a network traffic analysis same communication link in an anonymous network method based on deep learning and explore its environment. In contrast to the broader approach of application potential in identifying anomalous network network traffic analysis, the concept of overlapping behavior in the early stage of online crimes. traffic segmentation entails the identification of the traffic aliasing relationship between disparate users and the 2.2 CNN-based model construction for tor extraction of characteristic information from the traffic of overlapping traffic segmentation particular users. This facilitates the detection of potential abnormal behavior. The flow of traditional overlapping With the development of anonymous communication traffic segmentation is shown in the following Figure 1. technology, Tor network is widely used for both legal and criminalactivities due to its strong anonymity and privacy Find Overlapping Segment Traffic Start Segmentation Output Tor Traffic Traffic Identification Points Figure 1: The basic process of overlapping traffic segmentation M −1 N−1 As shown in Figure 1, first identify the key segmentation Yi, j = Xbehavior ,(i+m),( j+n)Wm,n +b (1) m=0 n=0 points in the traffic, and use these segmentation points to In Equation (1), Y segment the traffic and extract feature points related to i , j represents the value of the specific behavior patterns. Then, the segmented traffic output feature map at position (i, j) . Xbehavior ,(i+m),( j+n) is segments are recognized and classified, and finally the element in the input feature map. Wm,n represents the further processing is performed based on the recognition results. CNN has a strong feature extraction capability weight matrix element of the convolution kernel. The and is suitable for handling overlapping traffic in network training model can automatically adjust the weight to traffic. Online criminal activities are often accompanied better capture specific behavior features. b is the bias by complex network traffic patterns that may overlap term. M and N denote the height and width of the with normal traffic, increasing the difficulty of convolution kernel. Equation (2) illustrates why the identification. By using convolution kernels to extract rectified linear unit (ReLU), which is easy to understand, local features from input traffic, CNN can effectively quick to compute, and capable of handling deep separate and identify abnormal behavior patterns in networks, is chosen as the activation function. overlapping traffic, thereby helping to detect potential criminal activities, such as suspicious transaction requests f (x) =max(0, x) (2) or abnormal data packet transmissions. Therefore, the study will construct overlapping traffic segmentation In Equation (2), x denotes the value input to the model based on CNN. CNN applies convolutional kernel activation function after the convolution operation. In the to extract local features by sliding window approach, the next pooling stage, the expression is shownin Equation calculation is shown in Equation (1) [16]. (3) [17]. Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 131 P = max(X N behavior _ feature ) (3) window L = −w ˆ behavior _ feature (yi log(yi )) (5) i=1 In Equation (3), P is the maximum value of the In Equation (5), L is the loss value. wbehavior _ feature pooling window. Xbehavior _ feature is the element of the input represents the weight factor related to the behavior feature map, which includes behavior features extracted characteristics. N represents the number of samples, yi from network traffic such as transmission frequency and is the true label, and i represents the actual category directional features. Through downsampling, the pooling corresponding to the sample, that is, whether it is an procedure shrinks the FM's size, lowering computational illegal activity. yˆi is the probability distribution predicted cost and enhancing the model's resilience. Finally, the fully connected layer (FCL) expression is shown in by the model. By adding the weight factor of the behavior Equation (4) [18]. characteristics, the model can more effectively focus on the characteristics related to the criminal behavior, thereby improving the recognition effect of the model in  z =Wz +b  (4) specific criminal behavior scenarios. z = z , z Due to the highly encrypted and complex time series   trade malware , z  anomaly  characteristics of Tor traffic, the study introduces the hollow convolution technique, by introducing cavities in In Equation (4), z is the input high-dimensional the convolution kernel. That is, extending the receptive feature vector, and W is the weight matrix of the fully field without adding more parameters by adding gaps connected layer. z contains a combination of multiple between the convolution kernel's parts. Hollow behavioral features, and ztrade , zmalware , and zanomaly convolution (also known as expanded convolution) is a represent features related to illegal transactions, malware technique that expands the receptive field by inserting propagation, and other abnormal behaviors, respectively. holes in the convolution kernel, capturing a wider range Among the most often utilized loss functions in of features without increasing the number of parameters. classification problems is the cross-entropy loss function. This method helps the model handle long-range Equation (5) illustrates its expression by calculating the dependencies while maintaining computational efficiency, difference between the probability distribution (PD) of as shown in Figure 2. the real labels and the PD predicted by the model. Convolution kernel: 3×3 Convolution kernel: 3×3 Convolution kernel: 3×3 Receptive field: 3×3 Receptive field: 7×7 Receptive field: 15×15 (a) 1 Dilated Convolution (b) 2 Dilated Convolution (c) 4 Dilated Convolution Figure 2: Multi-scale feature extraction using dilated convolution Figure 2(a), (b), and (c) represent the convolutional complexity, and is suitable for processing complex kernel arrangement with convolutional expansion rate of features in Tor traffic. 1, 2, and 4, respectively. In Figure 2(a), when the In the rest of the model, batch normalization is first convolutional expansion rate is 1, the convolutional introduced after each convolutional layer to accelerate kernel size is 3×3, which is the same as the conventional convergence and improve generalization. Second, a larger convolutional kernel, and the sensory field only covers range of contextual information is captured by expanding the local area. In Figure 2(b) with a convolution the sensory field by the application of null convolution. expansion rate of 2, the convolution kernel sense field Moreover, to prevent overfitting, a Dropout layer is expands to 7×7, but the actual parameters remain 3×3. In introduced to enhance model robustness. Furthermore, to Figure 2(c) with a convolution expansion rate of 4, the better handle the complex aspects of Tor traffic, a deep sense field further expands to 15×15, and the number of network structure is built by stacking numerous parameters remains the same. Null convolution can convolutional, pooling, and FCLs. Therefore, the effectively extract multi-scale information and remote- structure of the overlapping traffic segmentation model of dependent features without increasing the computational CNNH is shown in Figure 3 below. 132 Informatica 49 (2025) 127–144 J. Hu Input layer Output layer ... ... ... ... ... CNN feature extraction layer Fully connected layer Figure 3: Overlapping traffic segmentation model based on CNNH As shown in Figure 3, the process of the CNNH passed to the FCL, where the features are further overlapping traffic segmentation model consists of four comprehensively analyzed to generate a high-dimensional parts. First, the input layer receives the original Tor feature vector. Finally, the output layer completes the traffic data and passes it to the CNN layer. The CNN prediction and classification of the traffic segmentation layer extracts representative features from the input results based on the output of the FCL, helping the model traffic through a series of convolution operations and distinguish between legitimate traffic and potential pooling operations, including network behavior features criminal behavior. To facilitate understanding of the such as packet size, transmission time interval, and specific implementation of the CNNH model, the pseudo transmission frequency. These extracted features are then code is given below, as shown in Figure 4. # Pseudocode for CNNH Model # Pseudocode for CNNH Model # Input: Network traffic data (X), labels (Y) # Output: Predicted labels (Y_hat) # Step 1: Data Preprocessing X_preprocessed = preprocess_data(X) # Normalize and extract features # Step 2: Dilated Convolution (Hollow Convolution) Module def DilatedCNN_Module(X): Conv1 = Conv2D(filters=32, kernel_size=(3, 3), dilation_rate=1, activation='relu')(X) Conv2 = Conv2D(filters=64, kernel_size=(3, 3), dilation_rate=2, activation='relu')(Conv1) Conv3 = Conv2D(filters=128, kernel_size=(3, 3), dilation_rate=4, activation='relu')(Conv2) PooledFeatures = MaxPooling2D(pool_size=(2, 2))(Conv3) return PooledFeatures X_dilated = DilatedCNN_Module(X_preprocessed) # Step 3: Fully Connected Layers for Classification def ClassificationHead(X): Dense1 = Dense(units=64, activation='relu')(X) Output = Dense(units=num_classes, activation='softmax')(Dense1) return Output Y_hat = ClassificationHead(X_dilated) # Step 4: Model Training model = compile_model(optimizer='adam', loss='categorical_crossentropy') model.fit(X_preprocessed, Y, epochs=50, batch_size=32) Figure 4: Overlapping traffic segmentation model based on CNNH The pseudo code in Figure 4 shows the workflow of can reduce information loss while maintaining the the CNNH model in complex network traffic feature integrity of spatial features. extraction. The model effectively expands the receptive field through the hole convolution module. Therefore, it Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 133 2.3 Research on online criminal behavior spaces for subsequent computation of the attention recognition model based on LSTM and scores. The attention score is shown in Equation (8). CNN j=n CNN for traffic segmentation, although excellent in attention(a,V ) =ai v ji (8) spatial feature extraction, still suffers from recognition j=1 limitations when confronted with time-series features in Tor traffic. In contrast, LSTM, as a recurrent neural In Equation (8), a denotes the i th attention weight. i network that excels in processing sequence data, is v j is the element at the j th position in the value vector suitable for the field of network traffic analysis due to its V . Thus, Figure 5 shows a schematized version of the powerful modeling capability of time series features [19]. In online criminal behaviors, such as cyber attacks or SAM structure. illegal transactions, specific time patterns are often Wq Q qi shown, such as persistent illegal access attempts or regular small-amount fund transfers. By analyzing the time series features in network traffic, LSTM can identify X Wk K ai the regularity of these criminal behaviors and provide support for crime prevention by predicting future behavior trends. Therefore, the study will try to combine Wv V Att CNN and LSTM and introduce the SAM to extract and classify important features. Figure 5: Self attention mechanism layer structure In the overall process design, the input data is first processed through a data encoding module to convert the In Figure 5, the input sequence X is first converted raw data into a form suitable for model input. Then, it is to value matrix V , key matrix K , and query matrix Q passed through the SAM module in order to enhance the through three weight matrices W , W , and W , attention to the key features. Then, CNN and LSTM v k q modules perform feature extraction and time series respectively. Then, the Q and K calculate the analysis on the data processed by the attention correlation through dot-product operation, and the result mechanism, to capture behavioral patterns that recur over is inputted into the Softmax function (SF) to generate the long periods of time. Finally, the model outputs the attention weights a after scaling. These attention weights i recognition results to realize the recognition of WF. In are used to weight the corresponding elements in the V the data encoding module, the training data is shown in elements, and finally the weighted value matrix is passed Equation (6). through a summation operation to obtain the output. By using this approach, the model can dynamically T = (X1,G1), (X 2 ,G2 ),..., (X n ,Gn ) concentrate on important features according to how  (6) important each segment of the input sequence is. This  X = (1,−1,1,−1,...,1) successfully boosts the model's performance when processing complex data and improves its capacity to In Equation (6), T denotes the training data set. X n capture vital information. Finally, in the CNN and LSTM and G denote the n th traffic instance and website class n module, the resulting feature sequence is spliced into a label, respectively. One-Hot encoding, a popular two-dimensional feature matrix. Then the one- encoding technique in neural network multi-classification dimensional maximum pooling layer (PL) is connected tasks, is crucial for guaranteeing the classification for data dimensionality reduction processing, and the model's accuracy, preventing label misrepresentation, and expression is shown in Equation (9). increasing computational efficiency. Therefore, One-Hot state bits are used for encoding. Further, in the SAM Y l i,h=5 = max(Z l 1, l 1 Z l j− Z j , Z l j+ , j+2 ) (9) module, the correlation matrix of the input sequence is first calculated as shown in Equation (7) [20]. l In Equation (9), Y i ,h=5 then denotes the result of the l V = X W pooling operation via the convolution kernel of 5. Z j , v  K = X W Z l l k (7) j+2 , and Z j+2 all denote the neighboring feature values   Q = X W in the previous layer of q l . Subsequently, the extracted spatial features are fused as shown in Equation (10). In Equation (7), V , K , Q denote the value, key, and F l query matrices, respectively. W , W , and W j = concat(Y l i,h=3 ,Y l i,h=4 ,Y l i,h=5 ) (10) q all denote v k the initial weight matrices, which correspond to the value, key, and query weight matrices, respectively. These In Equation (10), F l j denotes the fused features after matrices project the input sequences into different vector convolution and pooling. Y l i ,h=3 , Y l i ,h=4 , Y l and i ,h=5 denote 134 Informatica 49 (2025) 127–144 J. Hu the i th output of the pooling of the l th layer with spatial features, temporal features and behavioral features convolution kernel size 3, 4, and 5, respectively. Equation to form a unified feature representation. Specifically, (11) illustrates how the data is put into the LSTM to spatial features are extracted through the convolution extract the temporal features once the fusion is finished. layer, temporal features are captured through the LSTM layer, and behavioral features are extracted based on ht = (Whh t−1 +Wxxt +b) (11) high-risk behavior patterns in traffic. The fused feature representation is shown in Equation (12). In Equation (11), h and h represent the hidden t t−1 z =z states of the current time step and the previous time step, spatial + ztemporal + zbehavioral (12) respectively, that is, the contextual information of the behavioral features at the current moment. For identifying In Equation (12), zspatial represents the spatial criminal behavior, the information of the previous time features extracted by the convolution layer, which can step, such as the occurrence of certain abnormal help identify local anomalies in network traffic. ztemporal behaviors at the previous moment, can help predict represents the temporal features extracted by the LSTM whether the behavior at the current moment is abnormal. layer, which captures recurring patterns in the time x represents the input features of the current time step, t dimension, especially high-frequency packet transmission and W represents the weight matrix of the hidden state, h behaviors. z represents the high-level features behavioral which can learn how to transfer the criminal behavior obtained by the behavioral feature extraction mechanism, features of the previous moment to the current moment. which reflects specific behavioral patterns such as W is the weight matrix of the input features, which is x malware propagation and illegal transactions.  ,  , and used to weight the input features of the current time step.  are all weighting factors. The weights are adjusted These weights can learn the importance of different according to the importance of different features to behavioral features in predicting criminal behavior. b is ensure the sensitivity of the model to specific behavioral the bias term, and  is the activation function. By patterns. Therefore, the improved CNN-LSTM structure introducing nonlinearity, the model can capture complex is shown in Figure 6 below. behavioral patterns. Finally, the model further fuses Unified feature Time feature Spatial characteristics representation dependence Convolution kernel: 5 Convolution kernel: 4 Convolution kernel: 3 Pooling LSTM layer Flatten layer Convolutional layer layer Integrate spatial Extracting spatial Reduced dimensionality features to form a features space features unified representation Capturing time Expanding high- Classification dependencies dimensional features results Figure 6: The structure and temporal feature fusion of the MCNN-LSTM Model Input sequence ... Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 135 In Figure 6, first, the input sequence passes through time-series data and is able to capture long-range multiple convolutional layers, each with a different dependencies in the data. Subsequently, the high- convolutional kernel size to capture different scale dimensional features output from the LSTM layer are features in the input data. Subsequently, a PL is used to expanded into one-dimensional vectors through the downsample the convolved FMs, so decreasing their size Flatten layer. Finally, the output of the classification or and, consequently, the computational complexity. Then, regression task is carried out through the FCL, thereby the multi-scale features are integrated through a fusion identifying potential criminal behavior in network traffic. layer to form a unified feature representation, helping the Therefore, according to the above calculations, the online model capture more comprehensive traffic information. criminal behavior recognition process based on MCNN- Immediately afterward, these features are passed to the LSTM is shown in Figure 7. LSTM layer. The LSTM layer specializes in processing Start Data Preprocessing Binary classification Multi-class model classification model CNN-LSTM for Network traffic Open World Is it a Monitoring Website? Y N CNN-LSTM for Open World Label closed-world Sealed World End Label Figure 7: Online criminal behavior identification process As shown in Figure 7, first, during the training phase, does not belong to the accusation domain, it enters the the preservation model is trained using the binary and closed-world labeling processing. Through the staged multi-classification datasets created from the network processing, the model is able to process the open-world traffic data, respectively. In the recognition phase, the and closed-world labels separately, thus improving the input network traffic is first processed by the open-world accuracy and efficiency of the recognition. To intuitively MCNN-LSTM to determine whether it is labeled in the demonstrate the implementation process of the MCNN- accusation domain. If the traffic belongs to the accusation LSTM model, its pseudo code is given below, as shown domain, it enters the open-world label processing and is in Figure 8. recognized using the closed-world MCNN-LSTM. If it 136 Informatica 49 (2025) 127–144 J. Hu # Pseudocode for MCNN-LSTM Model # Pseudocode for MCNN-LSTM Model # Input: Network traffic data (X), labels (Y) # Output: Predicted labels (Y_hat) # Step 1: Data Preprocessing X_preprocessed = preprocess_data(X) # Normalize and extract features # Step 2: Multi-Scale Convolution (MCNN) Module def MCNN_Module(X): Conv1 = Conv2D(filters=32, kernel_size=(3, 3), activation='relu')(X) Conv2 = Conv2D(filters=64, kernel_size=(5, 5), activation='relu')(Conv1) Conv3 = Conv2D(filters=128, kernel_size=(7, 7), activation='relu')(Conv2) CombinedFeatures = concatenate([Conv1, Conv2, Conv3]) PooledFeatures = MaxPooling2D(pool_size=(2, 2))(CombinedFeatures) return PooledFeatures X_spatial = MCNN_Module(X_preprocessed) # Step 3: Temporal Feature Extraction with LSTM def LSTM_Module(X): LSTM_output = LSTM(units=128, return_sequences=True)(X) return LSTM_output X_temporal = LSTM_Module(X_spatial) # Step 4: Self-Attention Mechanism (SAM) def SelfAttention(X): Q = dot(X, Wq) # Query matrix K = dot(X, Wk) # Key matrix V = dot(X, Wv) # Value matrix AttentionScores = Softmax(dot(Q, K.T) / sqrt(d_k)) # Scaled Dot-Product Attention Output = dot(AttentionScores, V) # Weighted sum of values return Output X_attention = SelfAttention(X_temporal) # Step 5: Fully Connected Layers for Classification def ClassificationHead(X): Dense1 = Dense(units=128, activation='relu')(X) Output = Dense(units=num_classes, activation='softmax')(Dense1) return Output Y_hat = ClassificationHead(X_attention) # Step 6: Model Training model = compile_model(optimizer='adam', loss='categorical_crossentropy') model.fit(X_preprocessed, Y, epochs=50, batch_size=32) Figure 8: Schematic diagram of MCNN-LSTM pseudo code This pseudo code in Figure 8 clearly shows the main Windows 10 operating system with a 12-core Xeon modules of the MCNN-LSTM model and their interaction Platinum 8163 processor and a graphics card NVIDIA process. First, the multi-core convolution module Tesla P100-16GB. The model development language is captures the multi-scale features of the input data and Python 3.7. The study selects the CW200 dataset as the combines the pooling layer to reduce the computational experimental object, which contains a variety of normal complexity. Subsequently, the LSTM module is and abnormal traffic with high noise and complex traffic employed to model the time series features, with the self- patterns, meeting the needs of overlapping traffic attention mechanism further emphasizing the key features segmentation and abnormal behavior identification in to enhance the classification performance. Finally, the anonymous networks. The diversity of protocol network traffic classification is completed by the fully distribution and user behavior is taken into account connected layer. during data collection in order to mimic traffic patterns in real-world scenarios as closely as possible. The dataset 3 Results collects traffic data from 200 different websites accessed through the Tor network in a closed world. Each site has 3.1 Performance testing of overlapping 2,500 traffic accesses, which are divided into training and traffic segmentation model for CNNH test sets in a 6:4 ratio. A stratified sampling method is used to ensure that the proportions of the training and test The study began by setting up a suitable sets are consistent in terms of protocol type, traffic experimental environment to meet the computational feature distribution, and attack type, thus avoiding the requirements of the experiment. The experiment uses bias of the model performance evaluation due to uneven Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 137 data distribution. In addition, to reduce the risk of proportion of category samples is balanced and covers a overfitting, the dropout regularization technique is variety of network protocols and anonymous network introduced into the experiment, and the diversity of the scenarios. Data cleaning, feature extraction, and training data is improved by data enhancement. Among standardization are performed during data preprocessing, them, normal traffic accounts for 60%, abnormal traffic and traffic behavior patterns are labeled as normal or accounts for 40%, and is further divided into four abnormal. First, the settings of each parameter in CNNH categories: Trojans, Worms, Viruses, and Adware. The are shown in Table 2 below. Table 2: Model parameter settings Parameter Value Input dimension 5000 Network architecture layers 12 Batch size 256 Epochs 50 Gradient optimization function Adam Learning rate 0.001 Dropout 0.4 Table 2 shows the settings for input dimension, normal network behavior from potential criminal network architecture layers, training details, optimizer, behavior, and can more accurately capture traffic patterns learning rate and Dropout rate, respectively. The study related to criminal activities such as illegal transactions uses CNN, dilated CNN (DC-CNN), and multi-layer and malware propagation, thereby reducing false perceptron with dilated convolution (MLP-DC) as positives and improving the effectiveness of crime comparison models. When criminal activities are carried identification. Therefore, the traffic segmentation out in anonymous networks, criminal behavior is often accuracy is used as an indicator, and the test results are hidden in normal traffic. A high segmentation accuracy shown in Figure 9. means that the model can more accurately distinguish 1.00 0.97 1.00 0.95 0.95 0.92 0.91 0.90 0.90 0.87 0.89 0.85 0.85 0.85 0.80 0.80 0.85 0.83 0.75 0.75 CNN CNN 0.70 DC-CNN 0.70 DC-CNN MLP-DC MLP-DC 0.65 CNNH 0.65 CNNH 0.60 0.60 0 10 20 30 40 50 0 10 20 30 40 50 Iterations Iterations (a) Training set (b) Test set Figure 9: Accuracy trends on training and test sets for different models Figure 9(a) and (b) show the accuracy of CNN, DC- models on the training set is 0.85, 0.89, 0.91, 0.97, CNN, MLP-DC, and CNNH with the iterations on the respectively. In Figure 9(b), the accuracy of the four training set and test set, respectively. In the case of models on the test set are 0.83, 0.85, 0.87, 0.92, malware propagation, the model identified multiple respectively. DC-CNN and MLP-DC introduce the suspicious data packets through high-precision traffic advantage of null convolution to extract deep features segmentation. The transmission frequency and time more comprehensively. To verify whether the difference characteristics of these data packets are highly consistent in accuracy between different models on the training and with known malware propagation behaviors, thereby test sets is statistically significant, a paired t-test is enabling law enforcement to swiftly identify the source of performed on the normalized accuracy and the 95% the behavior. confidence interval is calculated, as shown in Table 3. In Figure 9(a), when the number of iterations is 50, the accuracy of CNN, DC-CNN, MLP-DC and CNNH Normalized accuracy Normalized accuracy 138 Informatica 49 (2025) 127–144 J. Hu Table 3: Statistical significance analysis Normalized 95% Statistical Dataset Model Comparison accuracy confidence interval P-value significance difference (%) (%) CNNH vs. CNN 12 [10.2, 13.8] < 0.01 Significant Trainin CNNH vs. DC-CNN 8 [6.4, 9.6] < 0.05 Significant g set CNNH vs. MLP-DC 6 [4.7, 7.3] < 0.05 Significant CNNH vs. CNN 9 [7.5, 10.5] < 0.01 Significant Testing CNNH vs. DC-CNN 7 [5.6, 8.4] < 0.05 Significant set CNNH vs. MLP-DC 5 [3.8, 6.2] < 0.05 Significant The statistical analysis results in the table show that 0.05, indicating statistical significance. In addition, the 95% the accuracy improvement range of CNNH on the confidence interval indicates that the range of differences training set is from +6.0% to +12.0%, and the is relatively stable. Subsequently, the segmentation effect improvement range on the test set is from +5.0% to of each model under different traffic flows is shown in +9.0%. The P-values for all comparisons are less than Table 4 below. Table 4: Performance evaluation indicators for each algorithm CNN DC-CNN MLP-DC CNNH Index Normal Attack Normal Attack Normal Attack Normal Attack P/% 83.52 85.67 86.14 88.43 88.79 90.57 91.43 93.45 R/% 84.67 86.82 88.53 89.92 90.35 91.74 93.46 94.32 FPR/ 13.65 12.34 10.74 9.98 8.96 7.43 4.15 3.07 % F1/% 84.09 86.24 87.32 89.17 89.56 91.15 92.43 93.88 AUC 0.769 0.788 0.812 0.828 0.839 0.846 0.928 0.935 Time/ 12.34 13.02 15.89 16.58 19.65 20.23 18.41 19.12 s Resource consumpti 68.54 69.85 72.32 73.46 75.69 76.78 70.17 71.54 on/% Table 4 displays the performance comparison of the balance of performance is critical. Maintaining sensitivity models for segmentation under Normal and Attack ensures that abnormal behavior is not ignored due to low traffic. False positive rate (FPR) is critical in law detection capabilities. Further analysis of the enforcement contexts, as a high FPR could lead to experimental results shows that false positives occur misidentifying benign traffic as criminal activity, mainly in normal traffic with high access frequency, such resulting in wasted resources. The CNNH model has as normal data transmission of certain legitimate significantly higher values for P, R, F1 and AUC. protocols being misclassified as abnormal traffic. This Especially, the P-value of CNNH reaches 93.45% and the may be due to the similarity between the characteristics R value is 94.32% under Attack traffic. Meanwhile, the of high-frequency access patterns and abnormal traffic. FPR of CNNH is only 4.15%, indicating that it can False negatives, on the other hand, are mainly effectively reduce the false alarms. However, with the concentrated on abnormal traffic with weaker increase of model complexity, the resource consumption characteristics or close to normal traffic characteristics, rate and calculation time of CNNH increase accordingly, such as covert adware traffic. False positives can lead to reaching 71.54% and 19.12s, respectively. Although its reduced efficiency in resource allocation, while false resource requirements are high, the significant negatives can cause some potential threats to be ignored. improvements in accuracy and sensitivity make up for this shortcoming. In contrast, the traditional CNN is at a 3.2 Online crime recognition experiment lower level in all performance indicators. However, its based on MCNN-LSTM resource consumption rate and computation time are low, which makes it suitable for scenarios with limited In the hyperparameter setting of MCNN-LSTM, the computational resources. The proposed model has been learning rate is optimized in the range of 0.0001 to 0.01 demonstrated to effectively reduce the FPR, ensuring by grid search and finally selected as 0.001. The batch higher accuracy and reliability in identifying criminal size is set to 32. The number of hidden layer nodes is set behavior. Furthermore, it has been shown to facilitate the to 128, which can effectively capture the time series optimization of resource allocation and action decisions. characteristics of traffic data. The time step is set to 20. In the experiment, the recall rate is equivalent to the Adam is used as the optimizer to improve the training sensitivity, i.e., the proportion of actual anomalous traffic efficiency. The number of training rounds is set to 50, that is correctly detected. In practical scenarios, this and the early stopping strategy is combined to avoid Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 139 overfitting. To improve the generalization ability of the network (MLP-CNN), long short-term memory with model, Dropout is added to the network and the ratio is attention mechanism (LSTM-Att), and LSTM are set to 0.3. The study labeled the traffic data set according selected as comparison algorithms. First, the accuracy test to different crime types, mainly including three types of results of the four models for different types of online crimes: online fraud, malware propagation, and illegal criminal behaviors are shown in Figure 10 below. transactions. multi-layer perceptron convolutional neural 100 95 90 LSTM MLP-CNN 85 LSTM-Att 80 MCNN-LSTM 75 0 Online Fraud Malware Propagation Illegal Transactions Crime Types Figure 10: Model performance in crime type identification In Figure 10, the MCNN-LSTM model showed the worst performance among the three crime types, best accuracy, especially in the identification of malware especially in the identification of malware propagation, propagation and illegal transactions, reaching 96.54% and which is only 87.43%. Subsequently, to evaluate the 92.87% respectively. This is because MCNN-LSTM performance of each model in crime prediction and combines the spatial feature extraction capability of CNN prevention, the following indicators are used: prediction with the temporal feature capture capability of LSTM, accuracy, early warning time (early warning time is and can better handle the complex patterns and temporal defined as the time interval between the first detection of dependencies in criminal behavior. Although LSTM-Att an abnormal traffic pattern by the model and the actual improves the focus on important features by introducing occurrence of the attack behavior), precision, FPR, mean the attention mechanism, its spatial feature extraction detection time, and area under the receiver operating capability is weak, so it is still inferior to MCNN-LSTM characteristic curve (AUC). The results are shown in in multi-dimensional feature extraction. LSTM has the Table 5 below. Table 5: Performance comparison of models in crime prediction and early warning tasks MLP- LSTM- Metrics LSTM MCNN-LSTM CNN Att Prediction Accuracy /% 80.45 84.67 88.76 92.43 Average Early Warning Time 15 18 25 30 /Minutes Precision /% 79.87 83.54 87.34 91.23 False Positive Rate /% 9.67 8.23 6.45 5.12 Mean Time to Detect /Seconds 42.8 35.6 28.1 24.3 AUC 0.835 0.874 0.915 0.945 In Table 5, MCNN-LSTM shows the best abnormal behavior features and reduce the interference of comprehensive performance. Compared with other irrelevant features. The accuracy of MCNN-LSTM is also models, MCNN-LSTM achieved a prediction accuracy of better than other models, with the lowest false positive 92.43%, which is significantly higher than LSTM's rate of only 5.12%. The model performs well in reducing 80.45% and MLP-CNN's 84.67%. Although LSTM-Att false positives. In contrast, LSTM has a higher false introduces the attention mechanism, its spatial feature positive rate of 9.67%, due to the lack of spatial feature extraction capability is insufficient, resulting in the modeling, its adaptability to traffic pattern changes is advance warning time and prediction accuracy being poor and the false alarm rate is significantly high. In inferior to MCNN-LSTM. In addition, MCNN-LSTM terms of average detection time, the MTTD of MCNN- can warn of criminal behavior 30 minutes in advance, this LSTM is 24.3 seconds, which is better than 28.1 seconds capability is mainly due to the model's deep modeling of of LSTM-Att and 35.6 seconds of MLP-CNN, which time series characteristics. In particular, the introduction further proves the real-time detection capability of the of the SAM further enhances the model's ability in key model. feature extraction, enabling it to quickly focus on Accuracy/% 140 Informatica 49 (2025) 127–144 J. Hu Finally, the concept of concept drift is used to protocol-related features (e.g., packet length, time evaluate the robustness and adaptability of the models in interval) are subject to random changes, thereby the face of changing data distributions. Conceptual drift simulating fluctuations in protocol usage and traffic refers to the phenomenon that the data distribution characteristics. Furthermore, the incorporation of novel changes over time. Its practical application is due to the attack types at various temporal points serves to mirror fact that the traffic pattern, feature distribution, and user the progression of attack patterns. These designs are behavior of a website may change over time. The drift intended to closely mirror the evolving trends in the simulation involves the gradual adjustment of the ratio of actual network environment, thereby facilitating the normal traffic to abnormal traffic, thereby reflecting the evaluation of the model's efficacy in handling long-term dynamic changes in network attack behaviors. The distribution shifts. The results are shown in Figure 11. 100 90 80 70 60 5 10 20 60 Figure 11: Impact of concept drift on model accuracy over time Figure 11 shows the model accuracy of the four and rapid loss of accuracy. MLP-CNN is biased toward models under training time and testing time of 5, 10, 20, fixed patterns in feature extraction and has insufficient and 60, respectively. As the interval time increases, the adaptability to concept drift. concept drift leads to different degrees of decrease in the Finally, several representative models including accuracy of each model. A lower drop value suggests that time-series Transformer, spatial-temporal graph the model is more flexible and can continue to perform convolutional network and transformer framework (ST- well in classification even when concepts diverge. When GCN+Transformer), bidirectional long short-term the interval between training and testing events is 60 memory with attention mechanism (BiLSTM+Attention), days, the recognition accuracies of LSTM, MLP-CNN, random forest and principal component analysis LSTM-Att, and MCNN-LSTM models are 60.2%, 73.8%, (RF+PCA), and K-nearest neighbor (KNN) are selected 80.7%, and 89.5%, respectively. The advantage of for comparison. These five models cover the hybrid MCNN-LSTM in dynamically changing environments framework and transformer method in modern deep lies in its optimized model architecture. The multi-kernel learning, as well as the classic algorithms of traditional convolution module extracts multi-scale spatial features machine learning and non-deep learning. It can fully by convolution kernels of different sizes. The SAM reflect the advantages and disadvantages of different dynamically focuses on key features to reduce technical routes in network traffic analysis. The dataset interference. It works in conjunction with time series used is the representative open world network traffic modeling to significantly improve adaptability to dataset CIC-IDS2017. It records normal traffic and 12 dynamic changes in the traffic feature distribution. In malicious attack behaviors, has 80 traffic features, and contrast, LSTM lacks spatial feature extraction has highly complex traffic patterns and open network capabilities and relies only on time series modeling, environment characteristics. The results are shown in resulting in high sensitivity to changes in traffic patterns Table 6. Recognition accuracy/% Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 141 Table 6: Performance comparison and scalability testing of models under different traffic loads Accurac Average y at 300% Traffic Accuracy FPR processing P- Model Name data condition /% /% time value expansion (ms/sample) /% MCNN-LSTM 96.78 2.95 18.6 94.12 < 0.05 Time-series 95.23 3.21 17.5 92.45 < 0.05 Transformer Small ST- traffic 95.78 3.1 20.9 93.34 < 0.05 GCN+Transformer (10% BiLSTM+Attention 92.67 4.89 19.2 89.45 < 0.05 data) Random 88.23 7.34 14.7 86.34 < 0.05 Forest+PCA KNN 84.12 9.78 15.9 82.45 < 0.05 MCNN-LSTM 95.89 3.45 20.8 93.34 < 0.05 Time-Series 94.12 3.89 18.3 91.67 < 0.05 Transformer Mediu ST- m traffic 94.78 3.56 21.2 92.78 < 0.05 GCN+Transformer (50% BiLSTM+Attention 90.78 5.45 19.6 88.01 < 0.05 data) Random 86.34 8.12 15.2 84.78 < 0.05 forest+PCA KNN 82.45 10.78 16.5 79.67 < 0.05 MCNN-LSTM 94.78 3.89 22.5 92.12 < 0.05 Time-series 93.12 4.12 19.9 90.56 < 0.05 Transformer High ST- traffic 93.78 4.01 22.8 91.45 < 0.05 GCN+Transformer (100% BiLSTM+Attention 89.34 6.12 20.3 87.12 < 0.05 data) Random 84.89 9.34 15.8 82.45 < 0.05 forest+PCA KNN 81.12 11.78 16.7 78.34 < 0.05 In table 6, MCNN-LSTM shows high accuracy in all websites specialize in the illegal sale of pirated software. traffic load scenarios, reaching 96.78% in small traffic The illegal sale of pirated software not only violates scenarios and maintaining 94.78% in high traffic intellectual property laws, but also involves illegal scenarios. Moreover, it demonstrated strong classification transactions and fund transfers through anonymous capabilities with an FPR of 3.89%. This is mainly due to networks, which is a common and widespread form of its multi-module synergy combining CNN and LSTM, online crime. Such websites conduct transactions through which can effectively capture the complex relationship encrypted networks and anonymous payment systems, between spatial and temporal features. Time-series and users can purchase unauthorized commercial Transformer and ST-GCN+Transformer also perform software, hacking tools, and cracked software. On one of similarly in terms of FPR and accuracy. The global the websites, called Dark Web Software Mall, about modeling capabilities of these two models allow them to 4,000 users visit and trade every day. The website uses perform well in dynamic network scenarios. The encrypted communication protocols and anonymous accuracy of the BiLSTM+attention model is subject to a payment methods such as Bitcoin. significant decrease in high-traffic scenarios due to the The experiment uses web crawler technology to limitations of the feature extraction method. In contrast, capture network traffic data from the website for 10 days, the KNN and Random Forest methods demonstrate a with a total of 400,000 packets, of which 200,000 packets higher degree of suitability for small-scale data sets. are directly related to illegal software transactions, However, when the data is expanded to 300%, the including user login, browsing illegal software, ordering, accuracy undergoes a substantial decline, indicating a and anonymous payment. At the same time, for lack of adaptability to large-scale, complex scenarios. comparison, the study also obtains traffic data from legal e-commerce platforms in the same period, totaling 3.3 Simulation test 150,000 packets, which are related to browsing and purchasing legal software. Traffic capture and model In online criminal behavior on anonymous networks, the training are performed on a server running the Linux illegal software trading market is active, and many 142 Informatica 49 (2025) 127–144 J. Hu operating system, with a 16-core CPU, 32GB memory, transactions, the study preprocesses the data, removed and 500GB storage space. The experiment uses the noise, and extracted key features, including packet size, Wireshark tool to capture network traffic to ensure the time interval, and transmission direction. By analyzing accuracy and integrity of the data. The traffic data the traffic characteristics, the model is further used to includes parts obtained from legal e-commerce websites distinguish the network traffic of legal software and illegal software trading websites, with a total of transactions, and illegal software sales. The results are 400,000 packets. To ensure that the model can effectively shown in Table 7 below. identify traffic behaviors related to illegal software Table 7: Detection results of illegal software transactions compared to legitimate traffic Legitimate traffic Illegal software Metric Detection results (control group) transaction traffic Large-scale software Average of 3.2 0.5-1.1 times/day 8.7-12.3 times/day downloads (times/day) detections/day Frequent small anonymous Average of 1.8 20.6-30.4 1.3-2.2 times/day payments (transactions/day) transactions/day times/day Abnormal data packet Average detection 75.4 MB/day 502.6 MB/day transmission (data volume/day) of 100 MB/day Average file size of downloads 15.3 MB 10.7 MB 50.8 MB (MB) Anonymous payment amount Average of $512.4 $ 52.3 - $ 98.5 $ 10.7 - $ 49.6 (per transaction) The experimental results show that the model can extraction of multi-scale features by introducing atrous effectively identify illegal software transactions. First, in convolution technology. Second, the accuracy trends the detection of large-scale software downloads, there are of different models on the training and test sets. As the an average of 8.7 to 12.3 large file downloads per day in number of iterations increases, the normalized illegal transaction traffic, while there are only 0.5 to 1.1 accuracy of CNNH reached 97% and 92% on the downloads in legal traffic. Secondly, frequent small training and test sets, respectively. In addition, the anonymous payments have also become an important statistical significance analysis further proved the feature for identification. An average of 20.6 to 30.4 reliability of these performance differences, with P- small payments are made per day in illegal transaction values less than 0.05 for all comparisons. The traffic, while legal transactions are only about 1.8. In prediction accuracy of the MCNN-LSTM model addition, the transmission volume of abnormal data reached 92.43%, the precision was 91.23%, the false packets in illegal traffic far exceeds the normal range, positive rate was 5.12%, and it could achieve a 30- with an average of 502.6MB of data transmitted per day, while the transmission volume of legal traffic is about minute early warning capability. In comparison, the 75.4MB. accuracy of the traditional LSTM model and the MLP- Through the traffic identification of criminal CNN model under complex traffic patterns was behavior, technology not only provides analysis results, 84.67% and 80.45%, respectively. but more importantly, it helps law enforcement agencies Finally, compared to traditional methods, the take quick action. The proposed model provides proposed model showed significant advantages in categorized anomalous traffic patterns and their scalability and dynamic adaptability. Traditional associated characteristics, such as traffic types and time methods had acceptable performance on small data intervals, which can help law enforcement identify sets, but their accuracy was less than 80% in high-load potential threats and prioritize suspicious behavior for traffic scenarios, making it difficult to effectively further investigation. capture dynamic characteristics in complex network environments. In contrast, by combining deep learning 4 Discussion technology, MCNN-LSTM not only performed stably in highly complex scenarios, but also provided early The CNNH model showed excellent performance in warning capabilities for criminal behavior, showing a the overlapping traffic segmentation task, with wide range of practical application potential. precision and recall reaching 91.43% and 93.46%, respectively, and a false positive rate of only 4.15%. In contrast, DC-CNN and MLP-DC each had an accuracy 5 Conclusion lower than 87% due to their limited feature extraction Through real-time monitoring of network traffic, the capabilities. The main reason for this performance system can detect potential risks before criminal activities difference was that CNNH achieved effective occur. In view of this, this study proposed a CNNH-based Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144 143 Tor overlapping traffic segmentation model and an through adversarial training and multimodal data fusion. MCNN-LSTM website fingerprint recognition model. In addition, integrating data from multiple sources, such The performance test results indicated that the average as user behavior logs and vulnerability information, can segmentation accuracy of CNNH was 95.05% when the improve the ability to detect low-frequency anomalous number of iterations was 50. Under Attack traffic, the P, behavior. In terms of privacy protection, future work will R, F1, and AUC values of CNNH were 93.45%, 94.32%, introduce data encryption and anonymization processing 93.88%, and 0.935, respectively. The FPR was only technologies, and combine the context post-processing 3.07%, which was better than the comparison model. Its mechanism to optimize false positive control to ensure computational time consumption was 19.12s, and the the credibility and legality of the model application. resource consumption rate was 71.54%. In the MCNN- Future research will also explore the applicability of the LSTM performance test, its recognition accuracy for model in other potential application areas. For example, malware propagation and illegal transactions reached in enterprise network security, MCNN-LSTM can be 96.54% and 92.87% respectively. In the prediction used to detect abnormal traffic and potential attack experiment results, the prediction accuracy of MCNN- behavior in the enterprise internal network, helping to LSTM was 92.43%, and it could issue an early warning improve security protection capabilities. At the same time, 30 minutes in advance, with a false positive rate of only future research must focus on the ethical and privacy 5.12% and a detection time of only 24.3s. In terms of implications of model deployment, strictly adhere to computational time consumption, the MCNN-LSTM relevant laws and regulations, and ensure the social model consumes 102ms per round of training. In the responsibility and legality of the technology. concept drift test, the recognition accuracy of the MCNN- LSTM model was 89.5% when the training and testing References events were separated by 60 days. This shows that the proposed model in the study had excellent recognition [1] F. Zhou, B. Zhou, S. Zhao, and G. Pan, accuracy and robustness. “DeepOffense: a recurrent network-based approach for crime prediction,” CCF Transactions on Pervasive Computing and Interaction, vol. 4, no. 3, 6 Limitations and future research pp. 240-251, 2022. https://doi.org/10.1007/s42486- The proposed MCNN-LSTM model performs well in 022-00100-x anonymous network traffic analysis, but it still has some [2] R. H. Shi, and X. Q. Fang, “Anonymous classical limitations. As the model complexity increases, CNNH message transmission through various quantum and MCNN-LSTM have high computational resource and networks,” IEEE Transactions on Network Science time consumption requirements, and may be difficult to and Engineering, vol. 11, no. 3, pp. 2901-2913, deploy in real-time in hardware resource-constrained 2024. https://doi.org/10.1109/TNSE.2024.3354327 environments. The study simulated concept drift by [3] Y. J. Chen, Y. Su, M. Y. Zhang, H. Y. Chai, Y. K. adjusting feature distribution, protocol variations, and Wei, and S. Yu, “Fedtor: an anonymous framework attack types, but drift in real-world scenarios may be of federated learning in internet of things,” IEEE more complex, such as sudden changes in user behavior Internet of Things Journal, vol. 9, no. 19, pp. 18620- or nonlinear changes in traffic patterns. In addition, 18631, 2022. advanced attackers may confuse traffic patterns by https://doi.org/10.1109/JIOT.2022.3162826 disguising malicious traffic or using complex encryption [4] Y. Wang, “Deep learning models in computer data techniques to increase the difficulty of detection. For mining for intrusion detection,” Informatica, vol. 47, highly dynamic features or low-frequency anomalous no. 4, 2023. https://doi.org/10.31449/inf.v47i4.4942 behavior, the model may run the risk of failing to detect [5] X. D. Gu, B. C. Song, W. Lan, and M. Yang. “An them. Although the false positive rate has been reduced, it online website fingerprinting defense based on the may still cause false alarms that affect monitoring non-targeted adversarial patch,” Tsinghua Science efficiency. In addition, the model may cause privacy and Technology, vol. 28, no. 6, pp. 1148-1159, issues when applied to anonymous network monitoring, 2023. https://doi.org/10.26599/TST.2023.9010062 such as excessive monitoring or false alarms that result in [6] R. Rawat, and A. Rajavat, “Illicit events evaluation innocent users being tagged. The scope of monitoring using NSGA-2 algorithms based on energy must be strictly limited and privacy regulations must be consumption,” Informatica, vol. 48, no. 18, 2024. followed. https://doi.org/10.31449/inf.v48i18.6234 Future research will focus on optimizing the [7] K. Xian, “An optimized recognition algorithm for performance and practical value of the model. First, SSL VPN protocol encrypted traffic,” Informatica, through the lightweight design of the model and the vol. 45, no. 6, 2021. distributed computing architecture, the computational and https://doi.org/10.31449/inf.v45i6.3730 memory consumption can be reduced, and the scalability [8] M. Nasr, A. Bahramali, and A. Houmansadr, of large-scale real-time monitoring can be improved. "Defeating DNN-based traffic analysis systems in Second, by combining long-term real Tor traffic data, the real-time with blind adversarial perturbations," In adaptability of the model in complex concept drift Proceedings of the 30th USENIX Security scenarios will be verified, and the robustness of the Symposium (USENIX Security 21), 2705-2722, model against obfuscation strategies will be improved 2021. 144 Informatica 49 (2025) 127–144 J. Hu [9] K. Yesodha, M. Krishnamurthy, M. Selvi, and A. time defense against website fingerprinting attacks Kannan, “Intrusion detection system extended CNN based on deep reinforcement learning,” IEEE and artificial bee colony optimization in wireless Transactions on Network and Service Management. sensor networks,” Peer-to-peer Networking and vol. 21, no. 3, pp. 2944-2961, 2024. Applications, vol. 17, no. 3, pp. 1237-1262, 2024. https://doi.org/10.1109/TNSM.2024.3360082 https://doi.org/10.1007/s12083-024-01650-w [16] M. Guo, Y. R. Sun, Y. L. Zhu, M. Q. Han, G. Dou, [10] B. Q. Gan, Y. Q. Chen, Q. P. Dong, J. L. Guo, and R. and S. P. Wen, “Pruning and quantization algorithm X. Wang, “A convolutional neural network intrusion with applications in memristor-based convolutional detection method based on data imbalance,” The neural network,” Cognitive Neurodynamics, vol. 18, Journal of Supercomputing, vol. 78, no. 18, pp. no. 1, pp. 233-245, 2024. 19401-19434, 2022. https://doi.org/10.1007/s11227- https://doi.org/10.1007/s11571-022-09927-7 022-04633-x [17] T. Li, Y. B. Yin, Z. Yi, Z. Guo, Z. L. Guo, and S. L. [11] L. W. Xu, X. P. Zhou, Y. Tao, L. Liu, X. Yu, and N. Chen, “Evaluation of a convolutional neural Kumar, “Intelligent security performance prediction network to identify scaphoid fractures on for IoT-enabled healthcare networks using an radiographs,” Journal of Hand Surgery, vol. 48, no. improved CNN,” IEEE Transactions on Industrial 5, pp. 445-450, 2023. Informatics, vol. 18, no. 3, pp. 2063-2074, 2021. https://doi.org/10.1177/17531934221127092 https://doi.org/10.1109/TII.2021.3082907 [18] S. F. Lyu, and J. Q. Liu, “Convolutional recurrent [12] F. R. Yan, G. H. Zhang, D. W. Zhang, X. H. Sun, B. neural networks for text classification,” Journal of T. Hou, and N. W. Yu, “TL-CNN-IDS: transfer Database Management, vol. 32, no. 4, pp. 65-82, learning-based intrusion detection system using 2021. https://doi.org/10.4018/JDM.2021100105 convolutional neural network,” The Journal of [19] A. Mahmoodzadeh, M. Mohammadi, S. G. Salim, H. Supercomputing, vol. 79, no. 15, pp. 17562-17584, F. H. Ali, H. H. Ibrahim, S. N. Abdulhamid, H. R. 2023. https://doi.org/10.1007/s11227-023-05347-4 Nejati, and S. Rashidi, “” Machine learning [13] G. Di Méo, “Historical co-offending networks: A techniques to predict rock strength parameters,” social network analysis approach,” The British Rock Mechanics and Rock Engineering, vol. 55, no. Journal of Criminology, vol. 63, no. 6, pp. 1591- 3, pp. 1721-1741, 2022. 611, 2023. https://doi.org/10.1093/bjc/azad005 https://doi.org/10.1007/s00603-021-02747-x [14] M. Merouane, “Convenient detection method for [20] W. Chen, Y. Lu, H. Ma, Q. Chen, X. Wu, and P. Wu, anonymous networks" I2P vs Tor",” Journal of “Self-attention mechanism in person re- Information Science and Engineering, vol. 39, no. 6, identification models,” Multimedia Tools and pp. 1371-1382, 2023. Applications, vol. 81, no. 4, pp. 4649-4667, 2022. https://doi.org/10.6688/JISE.202311_39(6).0008 https://doi.org/10.1007/s11042-020-10494-4 [15] M. Y. Jiang, B. J. Cui, J. S. Fu, T. Wang, and Z. Q. Wang, “KimeraPAD: a novel low-overhead real- https://doi.org/10.31449/inf.v49i12.6951 Informatica 49 (2025) 145–156 145 Design and Implementation of an Optimized Career Planning System for College Students Using a Hybrid Dijkstra-Genetic Algorithm Zhenhuan Zhou1, Ruohan Chen2, Li Yan1, Haijian Zhong3* 1.School of Innovation and Entrepreneurship, Gannan Medical University, Ganzhou 341000, China 2.School of Pharmacy, Gannan Medical University, Ganzhou 341000, China 3.School of Medical Information Engineering, Gannan Medical University, Ganzhou 341000, China *Email of Corresponding Author: haijianzhong2000@163.com Keywords: dijkstra's algorithm, college students’ career planning, career matching, framework design Received: Octorber 30, 2024 Student career scheduling is divided into regular scheduling and dynamic optimal scheduling. Regular scheduling is the planning task of calculating a student's career year, and the reference parameters are some student career data. When facing the complex career problems of college students, achieving the expected scheduling tasks is difficult. Aiming at the problems existing in college students' career planning, this paper effectively combined the Dijkstra and genetic algorithms to obtain the D-GA optimization algorithm and apply it in the scheduling scheme. The experimental outcomes indicate that the graduate job recommendation algorithm introduced in this study achieves the highest performance, with a hit rate of 44.37% when K=50. This is approximately double the effectiveness of the CBF approach and around 20% higher than the neighborhood-based CF method. The mean reciprocal rank was 17.14%, which is nearly seven times greater than that of the CBF technique and about 3% better than the neighborhood- based CF model. The data problem framework aligns with real-world conditions and is developed based on relevant aspects of college students' career planning. According to the advantages and disadvantages of the Dijkstra algorithm and genetic algorithm, combined with students' career problems, the Dijkstra algorithm was improved and combined with the genetic algorithm to form the D-GA algorithm and applied to the solution optimization process. Finally, combined with J2EE technology, college students' career planning system was realized. Povzetek: Razvit je hibridni Dijkstra-genetski algoritem za optimizacijo načrtovanja kariere študentov, implementiran s tehnologijo J2EE. Pristop izboljšuje učinkovitost in prilagaja priporočila glede na spreminjajoče se podatke in preference. 1 Introduction Through the career planning system, users are able to explore themselves correctly, think about the factors that Career planning has been developed for decades, may affect their future development in an all-round way, and the relevant theories have been continuously and rationally make decisions on career development that improved, but the research on career planning at home suit them. Currently, the more famous ones in career and abroad differs greatly [1]. International research in planning optimization research [6]. Its advantage is that it the career field has been both extensive and detailed. It can solve the student career planning problem containing has thoroughly examined various aspects, including negatively weighted paths, and the code is simple to career exploration, job search intensity, job-seeking implement; its disadvantage is that it wastes a large amount success, factors influencing career choices, career of time on the v-1 slack operation due to the need to carry values, professional preferences, work-related values, out v-1 slack operations in each loop [7, 8]. The SPFA personality types, alignment between career paths and (Shortest Path Faster Algorithm), while an improvement professional choices, as well as job satisfaction. These over Bellman-Ford in many cases, still struggles with areas have been explored in depth, offering a worst-case performance, which can degrade to O(VE) comprehensive understanding of the factors affecting under certain conditions. Moreover, SPFA can be career development. [2, 3]. Domestic career research unpredictable in terms of run time, which poses challenges has only started in recent years, and the research level is for scalability and consistent system performance when shallow and the research content is single. The research handling large datasets or diverse user inputs typical in on college students focuses on the current situation of career planning systems. This study is necessary because it career planning and career values, mainly for ordinary proposes a hybrid approach that combines Dijkstra's college students, without distinguishing the differences algorithm with genetic algorithms to overcome the between students of different professional backgrounds, shortcomings of these SOTA techniques. The hybrid and not enough exploration of gender students [4, 5]. method not only optimizes the computational efficiency For everyone, career is limited, if not effectively but also enhances the accuracy of career path planned, will inevitably lead to a waste of time. recommendations by dynamically adapting to evolving 146 Informatica 49 (2025) 145–156 Z. Zhou et al. data patterns and user preferences, which neither demonstrates the fundamental mechanism of online pattern Bellman-Ford nor SPFA can achieve effectively in this mining, and evaluate each path and rank the candidates context [9, 10]. Ford algorithm can handle the student based on their fitness values. Select the most promising career planning problem containing negatively career paths based on fitness. Use techniques like weighted paths as well; however, the time complexity tournament selection or roulette wheel selection to pick of the Floyd-Warshall algorithm is very unfriendly [11]. individuals for the next generation. Perform crossover Dijkstra's algorithm is the most typical and between selected pairs of paths to generate new offspring. representative student career planning algorithm for This helps explore new potential career trajectories by solving student career planning problems, and its combining features of existing paths. Apply mutation to application in practice is the most numerous. The most some individuals by altering a few nodes in the career traditional implementation of Dijkstra's algorithm uses paths. This step introduces diversity and ensures the the adjacency matrix to store the data information of the algorithm doesn't get stuck in local optima. After graph, and uses simple arrays to realize its priority generating new paths, use Dijkstra’s algorithm to further queue, which cannot adapt to the path query problem optimize these solutions by adjusting the node sequences with high real-time requirements in terms of memory for better cost or relevance. This ensures that the final usage and Dijkstra's algorithm in depth, analyze its solutions are both optimal and diverse. performance bottlenecks and improve and optimize the algorithm using heap data structures and features of the d  Ht  h (3) application scenarios [12, 13]. Dijkstra's algorithm solves the problem of single-source student career C planning with any point in the graph as the starting 1 C C = C ( +C )+C ( 2 1 1 2 +C2 ) (4) point, which requires that the weights of each edge in C2 C1 the weighted graph must be non-negative. Using Dijkstra's algorithm to solve the single-source student Common models mainly include linear regression, career planning problem starting at vertex 1 in the graph logistic regression, decision tree model, plain Bayesian will result in the student career planning spanning tree model, neural network model, clustering algorithm and so [14, 15]. For any given point in a directed graph, on. Equation (5) represents the fundamental formula for Dijkstra's algorithm can compute the student career plan training the model, and Equation (6) represents the from that point to each of weights from that point to collaborative filtering algorithm, where the model each of the remaining vertices in the graph. Dijkstra’s evaluation phase. algorithm can also compute the student career plan for any pair of vertices in the graph by starting at the C1 C R = S(W C ( +C )+W C ( 2 1 1 1 2 2 +C2 ))+ L (5) beginning and expanding it layer by layer end point [16, C2 C1 17]. The table provides key metrics, including efficiency, complexity, and accuracy for each reviewed T method. For instance, while the Bellman-Ford Q = maxΔt N( t ) (6) algorithm offers advantages in specific contexts, it t =1 suffers from higher computational complexity in larger datasets. Similarly, the SPFA algorithm, although faster Collaborative filtering recommendation algorithms in many cases, lacks robustness in accuracy when faced are based on user-item interaction matrix, which can be with real-world data variations [18, 19]. divided into two categories according to the calculation method: neighborhood-based collaborative filtering algorithms and collaborative filtering algorithms based on 2 Dijkstra's algorithm hidden factor decomposition Equation (7) can improve the 2.1 Dijkstra's algorithm planning and recommendation accuracy, and Equation (8) improves the designing hidden nature of the item, begin by initializing a population of potential career paths or solutions. Each individual in Same as the data mining process, the educational the population represents a candidate path composed of model evaluation. Different from the traditional data multiple nodes. Randomly generate an initial population or mining, the data of EDM comes from the teaching seed it with paths from Dijkstra's shortest-path search. For environment, Equation (1) is used for model evaluation, each candidate path in the population, use Dijkstra’s and Equation (2) is applied to data mining, and the algorithm to compute the cost. In the context of career obtained data is applied to the construction of teaching planning, this could represent the efficiency or suitability data. of a given career path based on factors such as job Dt :V ( ) t →V 1 t+1 prospects, personal preferences, and professional goals. V I t+1 =Vt +Qt −q (2) t N( t ) = N( i,t ) (7) i=1 The main role of the model, which generally includes the processes. Equation (3) establishes H( i,t ) = minH( i,t )+ xmaxH( i,t )−minH( i,t ) (8) mathematical and statistical models, and Equation (4) Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 147 single-point crossover method. The next step is to Equation (9) shows the basic idea of TF-IDF determine the crossover operator. It is known from method, Equation (10) can explain the importance of the previous experience that many practical applications use a occupation in the document, and then count the feature predetermined value as the crossover operator, which does values; use the TF-IDF method to determine each not change throughout the genetic operation. feature value. If k also occurs more times in other documents, it means that k does not contribute much to ln p(Θ|u )= ln p(u|Θ)p(Θ) (17) document differentiation.TF-IDF is the feature value determination method that synthesizes these two −rˆ −e uij  considerations. ΔΘ =  rˆ − ΘΘ (18) ( u ,i , j )D −rˆ uij s 1+ e uij Θ n ( xi − x )( yi − y ) If crossover then it may result in the following r = i=1 (9) xy n n situation: individuals with high adaptation are subjected to ( x 2 2 i − x ) ( yi − y ) crossover operation, which does not reflect the advantages i=1 i=1 of high adaptation, that is to say, the advantages of arg min  ( r − p qT )2 + (|| p ||2 individuals with high adaptation are not well retained, + || q ||2ui u i u i ) (10) p ,q ( u ,i )R Equation (19) can filter the individuals with high adaptation, and Equation (20) demonstrates the 2.2 OSCache framework opportunity of individual crossover. Based on the above two assumptions, Equation (11) shows the neighborhood-based collaborative 1 1 h ( x ) = = (19) filtering algorithm, Equation (12) shows the mechanism 1+ e−z 1+ e−( wx+b ) of item scoring. Neighborhood-based algorithms. In addition, non-numerical coding is beginning to come h ( x )= P( y =1| x; ) (20) into the limelight, and decimal coding has been applied with many fields. Δ T 3 Application of D-GA algorithm in u = −( rui − puqi )qi +  pu (11) student career system Δ T 3.1 Improvements made to the DIJKSTRA i = −( rui − puqi ) pu + qi (12) algorithm and its validation Equation (13) demonstrates the choice of encoding Bayesian personalized ranking algorithm is a method and Equation (14) allows testing the readability recommendation algorithm with better recommendation of the problem domain encoding. effect and widely used in various scenarios, such as multimedia item recommendation, friend recommendation p(Θ|u ) p( ) u|Θ)p(Θ) (13 and so on [20, 21]. So, for each user u, the BPR algorithm has to find his preference ordering for all items. machine learning algorithms are devoted to studying how to  p(u∣Θ ) =  p( i u ∣j Θ ) (14) uU ( u,i , j )D improve the performance of the system itself through s computational means, experience. p evaluates the When facing some complex problems with large performance of a computer program on a task T [22]. The scale, the problem domain cannot be represented by performance metrics indicate that D-GA consistently discrete sequences at that time, that is, binary coding is outperforms both Dijkstra’s and Genetic Algorithms when not applicable to that situation Equation (15) can detect applied in isolation. Notably, the integration of Dijkstra's whether the coding is missed or not, and Equation (16) graph traversal capabilities with the adaptive nature of can explain the problem of career planning in the coding Genetic Algorithms leads to improved exploration of the process. solution space [23, 24]. While unsupervised learning has only input data x in the data sample and needs to solve for p( i  the markers y based on the sample features, clustering is u j |Θ ) :=( r̂uij ) (15) an unsupervised learning method in machine learning algorithms [25, 26]. Figure 1 shows the initialization state rˆuij := rˆui − r̂uj (16) diagram of Dijkstra's algorithm, and its process is simple Equation (17) demonstrates the generalization of and easy to implement. the crossover approach, Equation (18) represents the 148 Informatica 49 (2025) 145–156 Z. Zhou et al. Motion PQ-VAE Probabilistic Motion Generation Motion Detail Refinement Reconstructed Motion Refined Motion Predicted PQ Codes Predicted PQ Codes Decoder Linear Predictor Refiner Quantized Features Decoder Block Linear Conditions Preliminary Motion Conditions C1 C2 C3 C4 Codebooks Predictor Decoder Extracted features Conditions Quantized Encoder Masked Features PQ Codes Mask Unconfident GT Motion 2D-PE Codes Look up in Codebooks Figure 1: Initialization state diagram of Dijkstra's algorithm The k-means algorithm employs a greedy strategy The data was sourced from institutional career centers, job to approximate the solution of Eq. by iterative portals, and self-reported student profiles. The dataset size optimization [27]. Where line 1 initializes the cluster includes information from 10,000+ students, centers. Lines 4 to 8 are the cluster partitioning process, encompassing several hundred features, such as major, i.e., each data object is partitioned into the cluster GPA, internships, extracurricular activities, and industry closest to it. Lines 9 to 16 are the iterative updating interests. Each student's profile is linked to potential career process all points in the cluster, and if the cluster center paths and outcomes such as job offers, salaries, and job does not change, the clustering result is returned [28, satisfaction, making it rich and varied for analysis. 29]. which can be categorized into cohesive and divisive Therefore, this section will introduce machine learning types. The cohesive type uses a bottom-up strategy [30]. model evaluation methods in two parts: classification Figure 2 shows the relationship between the algorithm algorithm evaluation methods and clustering algorithm execution efficiency and the problem size, while the evaluation methods. The methodology has been split method is the opposite, using a top-down strategy, enhanced to specify the parameters for the Genetic initially all the samples are grouped into one cluster, and Algorithm: a population size of 100, a crossover rate of then split according to some criterion until a certain 0.8, and a mutation rate of 0.02. Additionally, we detail condition is reached or a set number of divisions is the grid search method employed for hyperparameter reached. The dataset used for the experiments consists tuning, allowing readers to understand how optimal of career-related information from college students, settings were derived. including academic background, skills, career preferences, job market trends, and professional goals. Client Side Server Side Class1 Class2 Class3 Pseudo Server Sampling Graph Construct Graph Original Feature Noise Generator Topology CONCAT XT1 Label Topology Embedding Topology-aware Local GNNs W W W Global 1 2 k Wt GNN Local Subgraph G1 Node Embedding Class-wise Prediction Cos Cos Cos Cos Predictor Predictor 0.3 0.1 Semantic Loss Diversity Loss Divergence Loss 0.4 0.2 Mindiverg-Xt1 Mindiverg Predictor Figure 2: Plot of algorithm execution efficiency versus problem size 3.2 DIJKSTRA algorithm optimization convenient for model input. The dataset was split into Cluster assessment is generally based on two training (70%) and testing (30%) sets. The D-GA principles: tightness, i.e., the smallest possible algorithm was then applied to predict optimal career paths differences between cluster members, and separation, based on this data. Dijkstra’s algorithm was used to i.e., the largest possible differences between clusters. compute initial shortest career paths, while the genetic Since the student campus card consumption record is a algorithm explored potential variations, refining campus card consumption flow record, and each student recommendations over successive iterations. The generates a flow record for each consumption, it is performance was evaluated on multiple metrics, including necessary to initially screen the consumption flow data accuracy of career path matching, computation time, and first to extract the consumption features that are memory usage. The dataset was split into training (70%) Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 149 and testing (30%) sets. The D-GA algorithm was then paths based on this data. Dijkstra’s algorithm was used to applied to predict optimal career paths based on this compute initial shortest career paths, while the genetic data. Dijkstra’s algorithm was used to compute initial algorithm explored potential variations, refining shortest career paths, while the genetic algorithm recommendations over successive iterations. The explored potential variations, refining performance was evaluated on multiple metrics, including recommendations over successive iterations. The accuracy of career path matching, computation time, and performance was evaluated on multiple metrics, memory usage. The dataset was split into training (70%) including accuracy of career path matching, and testing (30%) sets. The D-GA algorithm was then computation time, and memory usage. The dataset was applied to predict optimal career paths based on this data. split into training (70%) and testing (30%) sets. The D- Dijkstra’s algorithm was used to compute initial shortest GA algorithm was then applied to predict optimal career career paths, while the genetic algorithm explored paths based on this data. Dijkstra’s algorithm was used potential variations, refining recommendations over to compute initial shortest career paths, while the successive iterations. The performance was evaluated on genetic algorithm explored potential variations, refining multiple metrics, including accuracy of career path recommendations over successive iterations. The matching, computation time, and memory usage. The performance was evaluated on multiple metrics, dataset was split into training (70%) and testing (30%) sets. including accuracy of career path matching, The D-GA algorithm was then applied to predict optimal computation time, and memory usage. The dataset was career paths based on this data. Dijkstra’s algorithm was split into training (70%) and testing (30%) sets. The D- used to compute initial shortest career paths, while the GA algorithm was then applied to predict optimal career genetic algorithm explored potential variations, refining paths based on this data. Dijkstra’s algorithm was used recommendations over successive iterations. The to compute initial shortest career paths, while the performance was evaluated on multiple metrics, including genetic algorithm explored potential variations, refining accuracy of career path matching, computation time, and recommendations over successive iterations. The memory usage. Figure 3 shows the performance performance was evaluated on multiple metrics, comparison before and after the optimization of the including accuracy of career path matching, algorithm, Continuous features, such as GPA and job offer computation time, and memory usage. The dataset was salary, were normalized to bring all attributes onto a split into training (70%) and testing (30%) sets. The D- similar scale, ensuring that no single attribute GA algorithm was then applied to predict optimal career disproportionately influenced the algorithm. Resources Allocation KBs Maintenance Task Orchestration P Q R Shared Models Shared Knowledge Linear Semantic model theft attack Wireless Semantic Semantic Channel Encoder Channel Channel Encoder Encoder Decoder Raw Softmax Data Semantic adversarial attack Sender Receiver Linear False data injection attack Concat Attacker Sensing Output Figure 3: Performance comparison before and after algorithm optimization Categorical variables like academic major and hybrid approach. Such enhancements not only make the industry interest were encoded using one-hot encoding, system more robust but also align it more closely with the while ordinal features such as job satisfaction were needs of college students, facilitating more informed assigned numerical values. Only the most relevant career choices. These factors in terms of gender, family features, such as skills, academic background, and background, and personal ability all affect the employment career goals, were retained to reduce noise and improve choices of graduates. Therefore, this section analyzes the the efficiency of the algorithm. Figure 3 highlights the employment patterns of students from different tangible benefits of optimizing the career planning backgrounds in three main areas. Table 1 shows the system through the D-GA. The improvements in performance comparison of the seed clustering algorithms, accuracy, reduction in computation time, and enhanced in order to distinguish the employment patterns of students user satisfaction underscore the effectiveness of this with different professional abilities and family Congress Semantic Control Sensing transmission Scaled Dot-Product Attention 150 Informatica 49 (2025) 145–156 Z. Zhou et al. backgrounds, students with good academic 4 Design and implementation of performance generally choose to continue their studies and the proportion of those who choose to go abroad for optimization model for students' career further study is small. In order to avoid the above planning based on Dijkstra's situation of the genetic algorithm, so that the genetic Algorithm algorithm does not converge prematurely and produce the phenomenon of early maturity, in this research, this The Graduate Employment Recommendation section paper adopts the adaptive crossover operator, that is, the is designed to calculate the students' ratings of employment crossover operator is no longer fixed, and can be organizations, and then recommend employment adjusted adaptively with the changes in the population. organizations to the students according to the ratings from high to low. Graduates' ratings of employment units Table 1: Performance comparison of clustering consisted of three main components: group employment algorithms unit choice, students' preferences for employment unit Contour attributes, and students' preferences for employment unit Clustering algorithm Time (s) coefficient location. Figure 5 shows the career path shortest distance K-means partitioning assessment map, and the group employment unit selection 0.415 0.085 clustering algorithm is solved by the traditional BPR algorithm. Then students' Cohesive hierarchical preferences for employment unit attributes are 0.360 0.069 clustering algorithm incorporated into the solution objective of the BPR DBSCAN density clustering 0.029 0.013 algorithm to obtain a new optimization objective function. algorithm A binary Gaussian distribution is used to fit the student A crucial aspect of configuring a genetic algorithm preference function for the location. The last section of this involves establishing its termination criteria. This chapter describes the process of solving the objective entails defining the conditions under which the solution function using stochastic gradient descent method. Dijkstra’ produced by the algorithm is deemed acceptable within s algorithm is a classical algorithm used to find the student the problem domain. Additionally, if the genetic career plan. The algorithm uses breadth-first search to algorithm fails to find a suitable solution, it is essential compute the student career plan from any node in a non- to establish a maximum number of generations for negative weighted directed graph to any other store node, iterations. This means the algorithm should cease i.e., the single-source student career planning problem. operation after reaching a specified number of 2.1 generations, regardless of whether the solution is 2.0 optimal, to avoid unnecessary expenditure of time and 1.9 resources. The selection of these termination conditions 1.8 1.7 plays a significant role in the efficiency of the genetic 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 algorithm and the quality of the outcomes. If the Signal time termination criteria are not aligned with the actual circumstances, even a well-crafted genetic operation 2.1 may not yield satisfactory results. Figure 4 shows the 2.0 1.9 assessment of the match between students' interests and 1.8 careers. 1.7 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Signal time Figure 5: Assessment of shortest distance for career 2.1 2.0 paths 1.9 1.8 Dijkstra's algorithm is currently extensively utilized 1.7 and has established itself as a fundamental theory for 0.5 1.0 1.5 2.0 2.5 3.0 CGNSDE time addressing students' career planning challenges. Researchers frequently adapt Dijkstra's algorithm to suit 2.1 the specific issues they encounter while investigating these 2.0 types of problems. The core concept of Dijkstra's algorithm 1.9 can be summarized as follows: it involves a set represented 1.8 by S, which initially contains only the source point, S0. 1.7 0.5 1.0 1.5 2.0 2.5 3.0 The algorithm subsequently adds the shortest paths to the CGNSDE time set S from the remaining vertices, denoted as V-S. The set Figure 4: Assessment of student interests and S represents the vertices for which the shortest paths have career match been identified. Initially, the set S consists solely of the source point S0, which then extends to each point, progressively adding the point with the shortest path to S and designating the remaining points as V-S. This process continues until a comprehensive career plan for a student DT Loss DA Loss Block-B Block-A Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 151 is formulated, with the relevant points being included in adopts the idea, which plays well with the advantages of S and removed from V-S until all nodes in the directed Dijkstra algorithm, in addition, in the specific graph are incorporated into S, signaling the completion implementation of genetic algorithms, some improvements of the algorithm. are also made, the specific enhancements are outlined as Throughout the execution of the algorithm, it is follows: the design of student career paths aligns with the ensured that the shortest distance from the source point adaptation function. The initial population is generated S0 to each vertex in S remains less than or equal to the based on the principles of Dijkstra's algorithm. This distance from S0 to any vertex in V-S. In its most involves executing selection, crossover, and mutation straightforward application, Dijkstra's algorithm processes on the initial population, utilizing an adaptive primarily focuses on the distances between nodes, crossover method during the crossover phase. Unlike represented by the weights of the directed graph. traditional genetic algorithms that often employ random However, in practical scenarios such as logistics, methods to establish the initial population—an approach distribution, and bus routing problems, it becomes that can lack direction—Dijkstra's algorithm focuses on increasingly crucial to consider the time and costs identifying the path with the lowest cost and the associated with transporting goods or individuals subsequent node that completes the current shortest route. between various nodes. In this research, this paper also In the context of the student career paths explored in this improves the traditional Dijkstra's algorithm, which is project, this means identifying a group that optimally finally applied with the students' occupation in water. In schedules hydraulic resources at minimal cost. Figure 6 nature, a variety of biological generations, similar but illustrates the evaluation of student skills against job different, the children inherited the advantages of the requirements, significantly reducing the randomness father's generation, in the process of biological associated with the original algorithm. reproduction, left behind is always high quality, those less adaptable must be eliminated in the competition, 2.1 that is, the survival of the fittest. At present, the scope of application of genetic algorithm has been quite extensive, due to the good parallelism of genetic 1.9 algorithm, suitable for solving complex nonlinear problems, has been applied with combinatorial 1.7 optimization problems, artificial intelligence very 0.5 1.0 1.5 2.0 2.5 popular research direction in the field of computer. The Linear time specific content of genetic algorithm can be described 2.3 as follows: imitating the situation of biological TG1 TG2 TG3 evolution in nature, modeling the problem to be solved 2.1 as a biological population, choosing a certain coding technique to code the population, and determining the 1.9 initialized population size, in nature, chromosomes are the most basic representation of biological 1.7 characteristics, different chromosomes can be combined 0.5 1.0 1.5 2.0 2.5 Linear time into different biological characteristics, usually, in the case of the When coding, the coding methods that can Figure 6: Assessment of student skills and job be chosen are binary coding, decimal coding and so on. requirements First of all, a group of individuals of a certain size is randomly generated, and the individual with good According to these constraints to solve the solution of fitness is superior, so that the new generation of the objective function, and then optimization combined individuals will be more adapted to the environment with the actual conditions to finally get the optimal compared with the individuals of the parent generation, scheduling plan suitable for the student occupation. In the and the confusion matrix of the classification results is genetic algorithm, this paper refers to the fitness function, shown in Table 2. the individuals in the group are determined by the fitness function, i.e., it can be calculated which individuals have Table 2: Confusion matrix for classification results better adaptability and which individuals should be The real Standard Projected results eliminated. It can be said that the significance of the fitness situation practice Counter-example function in the genetic algorithm is irreplaceable, and the Standard TP (true goodness of the fitness function can ultimately determine FN (false negative) practice example) whether the solution obtained by the genetic algorithm can Counter- FP (false TN (true satisfy the problem domain, which determines the quality example positive) counterexample) of the optimal solution obtained. Figure 7 is assessment of frequency of visits to career development nodes, in summary, all the calculations and judgments are centered In the application process of genetic algorithm, the around the fitness function. Moreover, the fitness function original genetic is considered to select random does not have too many constraints, it does not need to be individuals, while in the D-GA algorithm, this paper continuous or derivable, but it must be guaranteed that the Temperature Rel.humidity 152 Informatica 49 (2025) 145–156 Z. Zhou et al. function value is non-negative in the problem domain, so that it is possible to judge and compare the fitness function values of different individuals. 2.3 2.3 2.2 2.2 2.1 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7 1.7 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Encoder time Encoder time Figure 7: Assessment of frequency of visits to career development nodes In the process of student occupation scheduling, to project, this paper adopts the sequence of state information assess the benefit of scheduling stage, the benefit is that represents the state information of students' career usually calculated based on the objective function, in planning to describe the scheduling decision information general, the objective function can be set up one, or can of students' careers for the specific content of students' be a group of functions composed of multiple functions. careers. In nature, chromosomes are considered to Some of the role of the student occupation is targeted, represent the characteristics of life, therefore, in the in this study, this paper chooses to ensure that the process of student career scheduling, the sequence maximum amount of power generation can be used as information that represents the planning state corresponds the ultimate goal, in addition to setting some auxiliary to the chromosomes in biological evolution. Figure 8 constraint functions. In general, after analyzing and shows the graph of students' background and industry studying the requirements of the fitness function as a demand assessment, therefore, the process of applying genetic algorithm, and make appropriate improvements genetic algorithms to the scheduling of students' careers to the objective function, for example, to meet the non- can be thought of as follows: first, that size are selected, negativity of the fitness function and other requirements, which serve as the initial population, in student careers, in order to more closely match the implementation of would mean selecting a certain size of the initial planning the genetic algorithm. In the research process of this sequence. 1.9 1.8 1.7 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Signal time 1.9 1.8 1.7 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Signal time Figure 8: Student background and industry needs assessment Then the objective function of the student's Repeat the above steps until the newest generation of occupation, and it is obvious that the fitness of the new population meets the termination conditions of the sequences obtained is higher than that of the parent algorithm. When facing the application process of genetic generation, and in the crossover operation, this paper algorithm, after determining the initial conditions and the adopts adaptive crossover algorithm, which improves fitness function, the first thing to be obtained is the the efficiency of the algorithm, and the newly obtained planning sequence, that is, the planning sequence should sequences are Then the mutation operation is carried out, be expressed into the form of coding. With people's which improves the diversity of the species, and at the continuous research and study, several encoding methods same time, a new generation of population is obtained. have been developed that are known to the public, Scatter Plot(Epoch:1750k) PQ-VTG PQ-VAE Scatter Plot(Epoch:2000k) Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 153 including binary encoding, decimal encoding, gray code, and easy to understand the characteristics of binary coding, etc. Among them, binary encoding is the most popular. at the same time the binary coding is extended, broadening Among them, binary coding is one of the simplest the field of application. coding methods. It is also the most widely used at present. Binary coding, as the name suggests, uses only 5 Experimental analysis {0,1} for encoding, i.e., all information is represented using only {0,1}. Although binary coding is very simple A framework for a personalized preference-based to understand, but there are some shortcomings and graduate employment recommendation algorithm is limitations, in the face of some complex problems, the demonstrated, Figure 9 shows the preference assessment ability of binary coding appears to be somewhat map for career planning path selection, and then calculates insufficient, cannot well respond to the root of the the employment choice of the student group by referring to problem, in the application, with the help of other the results of the group delineation; and finally calculates coding features and binary coding is simple and easy to the graduates' scores. implement the characteristics. It still retains the simple 2.1 2.2 2.1 1.9 1.9 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 Sampling time Sampling time Figure 9: Career planning pathway selection preference assessment It was analyzed that there are great differences in of students' career change cost, the problem is an the employment choices of students with different unsupervised learning, so the clustering method is used to academic performances and family economic divide the groups. Dijkstra’s algorithm, with its space conditions. Therefore, the academic performance and complexity, requires considerable memory, especially family economic conditions of graduates are selected as when applied to large graphs. The D-GA, while the reference characteristics for the division of student introducing additional storage requirements for multiple groups. The distribution of graduates' family economic candidate solutions (population), is designed to work condition index and academic performance index is efficiently in parallel, reducing bottlenecks by pruning less shown. Figure 10 shows the evaluation of the analysis relevant solutions over time. 2.1 2.0 1.9 1.8 1.7 0.5 1.0 1.5 2.0 2.5 In-dearee time 2.1 2.0 1.9 1.8 1.7 0.5 1.0 1.5 2.0 2.5 In-dearee time Figure 10: Evaluation of students' career change cost analysis Surface Temperature Frequency Frequency Lab Data 154 Informatica 49 (2025) 145–156 Z. Zhou et al. The D-GA hybrid balances Dijkstra’s efficiency in assessment of students' career planning decisions, while finding the shortest paths with the exploratory the cohesive hierarchical clustering algorithm and the capabilities of Genetic Algorithm (GA). While Dijkstra DBSCAN algorithm do not divide the data samples, and alone computes the shortest path quickly, it can struggle the distinction between academic performance and family with scalability in large datasets. The D-GA introduces economic conditions is not obvious between some groups, population-based search, which increases computation especially in the case of the DBSCAN algorithm, which time due to crossover and mutation steps, but it has no obvious distinction between the groups and the ultimately reduces the number of iterations needed by division is not homogeneous. optimizing paths dynamically. Figure 11 shows the 2.1 2.1 2.0 2.0 1.9 1.9 1.8 1.8 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5 3.0 Loss time Loss time Figure 11: Assessment of student career planning decisions Therefore, this study uses k-means clustering unit characteristics; for example, some students prefer algorithm to classify student groups. Graduates' ratings stable careers such as teachers and civil servants, while of employment units consist of three main components: others prefer positions requiring high professional group employment unit choice, students' preferences for competence such as engineering and technology. Figure 12 employment unit attributes, and students' preferences shows the graph of the assessment of the association for employment unit location. Group employment unit between career advancement speed and educational choice indicates the group's rating of the employment background, which is used in this paper to solve the group unit. Employment unit attribute preference indicates employment choice using Bayesian personalized ranking graduates' preference for some specific employment strategy. 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0 Decoder time Decoder time Figure 12: Assessment of the association between speed of career advancement and educational background 6 Conclusion analyzes some existing optimization measures, and establishes a mathematical model for problems related to Dijkstra algorithm is a classical algorithm for finding optimization models in combination with mathematical college students' career planning. It adopts a breadth-first modelling. The Dijkstra algorithm efficiently finds the search to calculate college students' career planning from shortest path in graphs with non-negative weights, making any node in the non-negatively weighted directed graph to it highly reliable in structured problems. However, its any other storage node, the single-source student career greedy nature limits its ability to adapt to complex, planning problem. The Dijkstra algorithm has been widely evolving datasets, such as those encountered in career used and has become a fundamental theory. This paper planning. It works best in static environments but analyzes some problems in students' career planning, Quantized Features Reconstructed Motion Quantized Features Reconstructed Motion Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 155 struggles when dealing with larger, dynamic datasets. The doi: 10.3390/app12147088. D-GA combines the precision of Dijkstra’s algorithm with [7] R. Botez, A. G. Pasca, A. T. Sferle, I. A. Ivanciu, and the exploratory power of GA. This integration allows D- V. Dobrota, "Efficient Network Slicing with SDN GA to quickly narrow down optimal solutions through and Heuristic Algorithm for Low Latency Services in Dijkstra’s efficient traversal, while GA’s population- 5G/B5G Networks," Sensors, vol. 23, no. 13, pp. 29, based approach ensures that it explores a wider range of 2023. doi: 10.3390/s23136053. possibilities. This results in faster convergence and better [8] G. S. Brodal, "Priority queues with decreasing keys," performance in dynamic environments like career Theoretical Computer Science, vol. 1000, pp. 14, planning systems. By balancing Dijkstra’s exactness and 2024. doi: 10.1016/j.tcs.2024.114563. GA’s adaptability, D-GA outperforms both in terms of [9] E. Çakir, Z. Ulukan, and T. Acarman, "Time- efficiency, scalability, and accuracy, making it ideal for dependent Dijkstra's algorithm under bipolar personalized and evolving career recommendation neutrosophic fuzzy environment," Journal of systems. Intelligent & Fuzzy Systems, vol. 42, no. 1, pp. 227- Among the students who chose employment 236, 2022. doi: 10.3233/jifs-219188. companies, the most significant proportion of students [10] Y. Q. Chen, J. F. She, X. G. Li, S. H. Zhang, and J. Z. chose other enterprises, about 31 per cent, followed by Tan, "Accurate and Efficient Calculation of Three- students who chose state-owned enterprises, about 16 per Dimensional Cost Distance," Isprs International cent. After graduation, about 80% of students choose to Journal of Geo-Information, vol. 9, no. 6, pp. 18, work in computer-related jobs, of which about 62% 2020. doi: 10.3390/ijgi9060353. choose to work in development, and about 17% choose to [11] G. H. Choi, W. Lee, and T. W. Kim, "Voyage work in other professional and technical personnel. optimization using dynamic programming with Among other non-computer jobs, clerical and related initial quadtree based route," Journal of personnel accounted for the largest share at 7.8%. Computational Design and Engineering, vol. 10, no. Students chose a wide range of industries as employment 3, pp. 1185-1203, 2023. doi: 10.1093/jcde/qwad055. units, covering 16 industries. Among them, about 63 per [12] C. Dudeja and P. Kumar, "An improved weighted cent of students choose to work in industry, followed by a sum-fuzzy Dijkstra's algorithm for shortest path large number in manufacturing, accounting for about problem (iWSFDA)," Soft Computing, vol. 26, no. 7, 8.8%. The number of college students who chose the pp. 3217-3226, 2022. doi: 10.1007/s00500-022- remaining 14 fields was small, less than 5 per cent. 06871-w. [13] M. Fazio et al., "A Map-Reduce Approach for the References Dijkstra Algorithm in SDN Over Osmotic Computing Systems," International Journal of [1] I. Alameri, J. Komarkova, T. Al-Hadhrami, A. E. Parallel Programming, vol. 49, no. 3, pp. 347-375, Yahya, and A. Gharbi, "Optimizing Connections: 2021. doi: 10.1007/s10766-021-00693-3. Applied Shortest Path Algorithms for MANETs," [14] E. Gefen and D. Zarrouk, "Flying STAR2, a Hybrid Cmes-Computer Modeling in Engineering & Flying Driving Robot With a Clutch Mechanism and Sciences, vol. 141, no. 1, pp. 787-807, 2024. doi: Energy Optimization Algorithm," IEEE Access, vol. 10.32604/cmes.2024.052107. 10, pp. 115491-115502, 2022. doi: [2] K. Altisen, P. Corbineau, and S. Devismes, 10.1109/access.2022.3218305. "Certification of an exact worst-case self- [15] D. Ghosh, J. Sankaranarayanan, K. Khatter, and H. stabilization time," Theoretical Computer Science, Samet, "Opportunistic package delivery as a service vol. 941, pp. 262-277, 2023. doi: on road networks," Geoinformatica, vol. 28, no. 1, pp. 10.1016/j.tcs.2022.11.019. 53-88, 2024. doi: 10.1007/s10707-023-00497-2. [3] T. K. Astarte, "From Monitors to Monitors: A [16] U. Gurusamy, K. Hariharan, and M. S. K. Primitive History," Minds and Machines, vol. 34, no. Manikandan, "Path optimization of box-covering SUPPL 1, pp. 51-71, 2024. doi: 10.1007/s11023- based routing to minimize average packet delay in 023-09632-2. software defined network," Peer-to-Peer Networking [4] N. Bahrami and S. M. Siadatmousavi, "Ship voyage and Applications, vol. 13, no. 3, pp. 932-939, 2020. optimisation considering environmental forces using doi: 10.1007/s12083-019-00855-8. the iterative Dijkstra's algorithm," Ships and [17] M. Ha, D. M. Tran, and Y. Shichkina, "Model of Offshore Structures, vol. 19, no. 8, pp. 1173-1180, Message Transmission across Parallel Route Groups 2024. doi: 10.1080/17445302.2023.2231200. with Dynamic Alternation of These Groups in a [5] X. S. Bai, L. Wang, Y. B. Hu, P. F. Li, and Y. T. Zu, Multichannel Steganographic System," Electronics, "Optimal Path Planning Method for IMU System- vol. 12, no. 19, pp. 24, 2023. doi: Level Calibration Based on Improved Dijkstra's 10.3390/electronics12194155. Algorithm," IEEE Access, vol. 11, pp. 11364-11376, [18] T. D. Holmes, R. H. Rothman, and W. B. 2023. doi: 10.1109/access.2023.3240518. Zimmerman, "Graph Theory Applied to Plasma [6] M. Behún, D. Knezo, M. Cehlár, L. Knapcíková, and Chemical Reaction Engineering," Plasma Chemistry A. Behúnová, "Recent Application of Dijkstra's and Plasma Processing, vol. 41, no. 2, pp. 531-557, Algorithm in the Process of Production Planning," 2021. doi: 10.1007/s11090-021-10152-z. Applied Sciences-Basel, vol. 12, no. 14, pp. 12, 2022. [19] W. C. Hu, H. T. Wu, H. H. Cho, and F. H. Tseng, 156 Informatica 49 (2025) 145–156 Z. Zhou et al. "Optimal Route Planning System for Logistics 2021. doi: 10.15388/21-infor445. Vehicles Based on Artificial Intelligence," Journal of Internet Technology, vol. 21, no. 3, pp. 757-764, 2020. doi: 10.3966/160792642020052103013. [20] P. Joshi, A. S. Raghuvanshi, and S. Kumar, "An Intelligent delay efficient data aggregation scheduling for distributed sensor networks," Microprocessors and Microsystems, vol. 93, pp. 16, 2022. doi: 10.1016/j.micpro.2022.104608. [21] C. H. Ke, Y. H. Tu, and Y. W. Ma, "A reinforcement learning approach for widest path routing in software-defined networks," Ict Express, vol. 9, no. 5, pp. 882-889, 2023. doi: 10.1016/j.icte.2022.10.007. [22] N. Khakzad, "A methodology based on Dijkstra's algorithm and mathematical programming for optimal evacuation in process plants in the event of major tank fires," Reliability Engineering & System Safety, vol. 236, pp. 11, 2023. doi: 10.1016/j.ress.2023.109291. [23] N. H. Kim, F. He, R. M. Nasir, and S. I. Kwak, "Stepwise benchmarking based on production function: Selecting path towards closest target," Expert Systems with Applications, vol. 228, pp. 11, 2023. doi: 10.1016/j.eswa.2023.120308. [24] S. Kim, T. Choi, S. Kim, T. Kwon, T. H. Lee, and K. Lee, "Sequential graph-based routing algorithm for electrical harnesses, tubes, and hoses in a commercial vehicle," Journal of Intelligent Manufacturing, vol. 32, no. 4, pp. 917-933, 2021. doi: 10.1007/s10845-020-01596-9. [25] P. Kumari and S. K. Sahana, "Heuristic Initialization Based Modified ACO (HIMACO) Mimicking Ant Safety Features for Multicast Routing and its Parameter Tuning," Microprocessors and Microsystems, vol. 93, pp. 9, 2022. doi: 10.1016/j.micpro.2022.104574. [26] S. R. Lawande, G. Jasmine, J. Anbarasi, and L. I. Izhar, "A Systematic Review and Analysis of Intelligence-Based Pathfinding Algorithms in the Field of Video Games," Applied Sciences-Basel, vol. 12, no. 11, pp. 30, 2022. doi: 10.3390/app12115499. [27] F. K. Luan, J. X. Yang, H. Zhang, Z. Y. Zhao, and L. P. Yuan, "Optimization of Load-Balancing Strategy by Self-Powered Sensor and Digital Twins in Software-Defined Networks," Ieee Sensors Journal, vol. 23, no. 18, pp. 20782-20793, 2023. doi: 10.1109/jsen.2022.3216376. [28] S. Duleba, F. K. Gundogdu, and S. Moslem, "Interval-Valued Spherical Fuzzy Analytic Hierarchy Process Method to Evaluate Public Transportation Development," Informatica, vol. 32, no. 4, pp. 661- 686, 2021. doi: 10.15388/21-infor451. [29] D. Kalibatiene and J. Miliauskaite, "A Hybrid Systematic Review Approach on Complexity Issues in Data-Driven Fuzzy Inference Systems Development," Informatica, vol. 32, no. 1, pp. 85- 118, 2021. doi: 10.15388/21-infor444. [30] A. Ulutas et al., "Developing of a Novel Integrated MCDM MULTIMOOSRAL Approach for Supplier Selection," Informatica, vol. 32, no. 1, pp. 145-161, https://doi.org/10.31449/inf.v49i12.6573 Informatica 49 (2025) 157–172 157 Congestion Control of Large-Scale Elevator Terminal Data Access in Large Metro Stations Based on The Internet of Things Juanjuan Shi*, Mengtian Jiao College of Information Engineering, Jiaozuo University, Jiaozuo 454000, China E-mail: jinglove666999@163.com *Corresponding author Keywords: hybrid access method, internet of things, congestion control, ACB control parameters Received: July 6, 2024 Large metro station IoTs used to face congestion while access to terminals was going on a large scale. Due to this, low success rate in access and delay in monitoring critical equipment was observed, which included elevators and escalators. This paper presented a congestion control method for large-scale elevator terminal data access in metro stations using IoT. Business data were categorized based on volume and latency requirements: Slot ALOHA (SA) direct access mode was used for delay-insensitive, small data services, and Access Class Barring (ACB) random access was used for time-sensitive, large data services. ACB control parameters were dynamically adjusted by estimating access requests. Using uniform and Beta distribution models, the method's effectiveness was validated through experiments. With 4000 access requests, the hybrid method achieved a 52.43% success rate and a 76.72 ms average delay under the uniform model, and a 42.07% success rate with an 82.02 ms average delay under the Beta model. These results demonstrated the method's ability to meet Quality of Service (QoS) requirements for high-priority services, ensuring efficient and reliable communication in large-scale IoT environments. Povzetek: Prispevek predstavlja hibridno metodo za nadzor preobremenjenosti do podatkov naprav IoT, ki uporablja kombinacijo direktnega in naključnega dostopa, s prilagajanjem parametrov glede na obseg zahtev. 1 Introduction transmission and processing of multiple types and large- scale data [4]. Based on the current urban development and people's Based on this, some scholars use IoT technology to travel needs, the number of elevators inside large metro monitor elevators' operating status data, dramatically stations is also growing [1]. The stability of elevator improving elevator operation security and effectively operation is closely related to the safety of residents. reducing equipment operation and maintenance costs. However, due to the quality, maintenance, supervision, Mao et al. [5] discussed the integration of Internet of and other influencing factors, elevator accidents often Things (IoT) technology to enhance the remote security occur. How to conduct unified real-time monitoring of management of elevators, addressing the associated safety elevator equipment in large and medium-sized Spaces to risks. They proposed an IoT-based architecture for reduce daily lightweight failures and prevent heavyweight elevator fault diagnosis and maintenance. The study accidents has become a hot topic of scholarly attention [2, established a fault diagnosis management system centered 3]. on IoT, outlining maintenance methods to ensure the The increasing global population and urbanization safety and stability of elevator operations. This approach have heightened the demand for elevators, necessitating aims to improve the overall security and efficiency of advanced, safe, and efficient systems. China’s elevator urban transportation through advanced technology. Lai et demand grows by 5%–7% annually due to the need to al. [6] adopt the more predictive state maintenance method replace outdated units and comply with new regulations, to realize the remote monitoring of highly distributed increasing maintenance workloads and risks. Innovative elevator equipment status, effectively improving the designs must prioritize safety, including weight capacity, safety and reliability of equipment operation. emergency alarms, and secure installation sites. Energy- IoT devices, ranging from consumer products to efficient elevators can reduce operational costs industrial components, are becoming ubiquitous, driving significantly. Traditional monitoring systems, like video the concept of "Smart homes" with enhanced safety and surveillance, fail to reflect the elevator's condition and energy efficiency. Wearable fitness and health monitors, failure rates adequately. With its advantages of low power network-enabled medical devices, and smart traffic consumption, significant connection, low delay, and high systems contribute to “smart cities” that reduce congestion reliability, the Internet of Things (IoT) can realize the and energy use. IoT also promises to improve the independence and quality of life for people with 158 Informatica 49 (2025) 157-172 J. Shi et al. disabilities and the elderly. The impact of IoT extends to scalable and flexible network architecture, enhanced agriculture, industry, and energy sectors, enhancing security and privacy, and advanced AI and machine information flow along the production value chain. learning integration. These features ensure instantaneous Companies and research organizations predict significant and reliable data transmission for critical applications, economic effects [7]. A market research report revealed support billions of IoT devices, extend the battery life of that the global IoT market was valued at $1.90 billion in remote sensors, allow dynamic resource allocation, 2018 and is projected to grow to $11.03 billion by 2026. protect sensitive data, and optimize network performance Additionally, the European Union (EU), the United States through predictive maintenance and anomaly detection. (USA), China, and other nations have developed IoT- These features collectively create an efficient, reliable, related action plans. These initiatives include the IoT-An and secure communication environment for the 6G era Action Plan for Europe and various IoT development [11]. plans for the years 2016–2020 [8]. Due to the limitation of channel resources, when the Song et al. [9] discussed the adoption of smart IoT at metro stations has a large number of elevators, and technologies and networking solutions like the Internet of other equipment data access, the time delay indicator of Things (IoT) by leading cities in China to enhance the system is higher, and the throughput will decrease economic opportunities and global climate resilience. significantly. Therefore, there is a great demand for a They presented the smart city concept as a complex large-scale terminal access algorithm tailored to the system integrating sensors, data, applications, and communication characteristics of the IoT at large metro organizational forms to make cities more agile and stations to ensure the reliable transmission of information sustainable. The paper provided a comprehensive data of crucial equipment. In response to the above issues, assessment of smart city initiatives in China, classifying Chou et al. [12] used Bayesian theory to estimate the practices into six key dimensions: energy, agriculture, number of access applications, preamble code conflict transport, buildings, urban services, and urban security rate, and the number of following time-slot applications at operations. Chinese smart city policies and practices aim the current time-slot. Furthermore, the optimal ACB to explore renewable energy, improve public convenience, control parameters are discussed by judging the number of and enhance urban comfort and citizen friendliness. The applications for the subsequent access time slot through study also addressed concerns in areas such as system quantitative prediction methods. The scheme is based on integration, governance, innovation, and finance. A policy the premise that the current time-slot access conflict vision was outlined to build public-private collaborative makes direct rebleeding at the next time-slot, with some networks, encourage innovation and investment in smart error from the system of refeeding in the actual access city initiatives, and emphasize smart services. process. In practical applications, the infrastructure of the Zhang et al. [13] addressed the growing need for wireless cellular network is relatively perfect, the improved communication content and quality in the coverage area is comprehensive, and the security is high, context of advancing network and communication which is one of the leading carrying networks of IoT technologies. This research concerns the optimal data communication. However, the original intention of collection and path planning of multi-unmanned aerial traditional wireless cellular network design is to deal with vehicle (UAV) to achieve extensive terminal accessibility the communication problem between humans and humans in IoT scenarios. The novelty of the approach consists of (H2H), and there are some differences in the integrating sensor area partitioning with the flight communication characteristics between machine to trajectory planning of multiple UAVs with the main machine (M2M). Machine-type communication (MTC) objectives of load balancing while the overall completion devices, integral to Industry 4.0, support smart factories, time for the tasks at hand is minimized. A novel k-means healthcare, and surveillance by generating data and algorithm has been developed to balance the quantity of making policy-based decisions. The demand for these data in each cluster. Accordingly, the flight trajectories of devices is projected to reach 50 billion by 2025. These the UAVs were represented discretely by an enhanced devices require robust security due to their vulnerability genetic algorithm including the 2-opt optimization and usage in open environments. Lightweight operator for solving the multiple traveling salesman cryptography is the preferred solution for MTC devices problem (MTSP) problem, improving the computational due to their limited computational and memory capacities. effectiveness. Extensive simulations have validated the This cryptographic approach ensures strong encryption efficiency of the suggested approach in smoothing out the while being efficient and cost-effective, enhancing imbalances in the distribution of tasks among UAVs and security for the growing number of IoT devices. MTC significantly reducing the duration of tasks. The devices are autonomous and central to automating IoT convergence rate for this methodology was higher than the frameworks, evolving to support the advancements of conventional genetic algorithm; hence, this proved that it Industry 4.0. They form Machine-to-Machine (M2M) was computationally efficient. Equipped with a new, communication networks, also known as cyber-physical efficient methodology for multi-UAV-assisted IoT systems and edge nodes, creating an autonomous system terminal data gathering, it brings balance and efficiency in of resource-constrained devices [10]. task distribution, unfolding the full power of professional The six key features of Machine Type algorithm solutions when acquiring optimal results in Communication (MTC) in 6G are ultra-low latency and more complicated engineering scenarios. high reliability, massive connectivity, energy efficiency, Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 159 Varsha et al.[14] proposed an innovative intelligent procedures in the case of massive and heterogeneous traffic management system for wireless cellular networks device access for 5G and 6G communication applications. to enhance M2M connections, pivotal for IoT. They Yu et al. [16] investigated the performance of focused on improving Access Class Barring (ACB), a massive machine-type communications (mMTC) in status method traditionally relying on a static factor to manage update systems, where numerous machine-type machine-type communication device (MTCD) traffic. The communication devices (MTCDs) send status packets to a study introduced a Bayesian inference-based learning base station (BS) for system monitoring. The authors automatons (BI-LA) approach that dynamically adjusts identified that packet collisions due to massive MTCDs the ACB factor. This system leverages learning automata's negatively impact status update performance. To address self-adaptive learning to estimate and manage M2M this, they proposed a joint access control, frame division, traffic more effectively. By framing the problem around and subchannel allocation scheme. They first analyzed collision probability and using Bayesian inference to adapt access control, packet collisions, and packet errors, the ACB factor, the proposed method was tested using deriving a closed-form expression of the average age of network simulator-3 (NS3). The performance metrics— information for all MTCDs as a performance metric. Their average access delay, access attempts, access success rate, proposed scheme was shown through simulations and and access success—demonstrated that the BI-LA ACB numerical results to achieve near-optimal performance, technique outperformed traditional and contemporary comparable to exhaustive search methods, and ACB methods, achieving minimal access delays of outperformed benchmark schemes. Bui et al. [17] present approximately 1876 ms and 27.6 ms. an access protocol based on distributed queue (DQ) The main problem arises due to a large amount of UEs mechanisms to deal with M2M communication large- present in the RA techniques, as discussed by Piao and Lee scale access problems for cellular networks. To maximize [15], where increased collisions and delays arise. They the DQ mechanism performance, first of all, the base propose a new RA scheme that combines four-step RA station in the random-access opportunities is roughly the with two-step RA, based on the 3rd Generation number of conflict detection equipment to avoid excessive Partnership Project Release 16. This work tries to avoid a division of DQ. Then based on the probing results, the conflict with the available RA resource, then achieves a base station randomly divides the device into a determined better performance of efficiency and brings down the number of groups and "pushes" these groups to the end of average RA delay. This solution aims to optimize the two- the logical access queue. Finally, the validity and step RA probability and thus provides a resource feasibility of the proposed protocol are verified by configuration and parameter setting algorithm that allows simulation. the UEs to carry out both RA methods simultaneously. Congestion control and optimization methods Then, the authors proved further that the proposed overview in IoT applications-the methodologies, the approach is valid using a Markov chain model. The datasets used, the results, and the limitations are proposed approach also has its potential confirmed in represented in Table 1. This comparison identifies the extensive comprehensive simulations on supporting RA gaps that this paper will address with the proposed hybrid access method. Table 1: Summary of related works on congestion control and access optimization Methods in IoT applications highlighting limitations and positioning the hybrid access method as a novel solution Study Method Datasets Key Results Limitations Mao et IoT-based architecture Elevator operational Improved safety and Limited scope to fault al. [5] for fault diagnosis data stability of elevator diagnosis only operations through IoT monitoring Lai et al. Predictive maintenance Distributed elevator Enhanced safety and Focused only on [6] with IoT integration equipment data reliability of elevator maintenance, lacks systems scalability analysis Chou et Bayesian theory-based Simulated data Improved ACB Errors in real-time al. [12] ACB optimization parameters, reduced predictions conflict rate Zhang et Multi-UAV data Simulated IoT Balanced task distribution, High computational al. [13] collection and path scenarios reduced completion time overhead optimization Varsha Learning Automaton- Cellular Base Controlled M2M data, High implementation et al.[14] based ACB scheme Station data reduced H2H interference complexity (LA-ACB) Piao and Integrated 2-4 step Cellular network Reduced collisions and Limited to specific RA Lee [15] Random Access (RA) simulations delays configurations methods 160 Informatica 49 (2025) 157-172 J. Shi et al. Bui et al. Distributed Queue LTE/LTE-A Reduced congestion, Requires precise group [17] (DQ)-based access network data improved success rate partitioning protocol This paper proposes a hybrid access methodology that 5. Application in IoT environments: Ensured combines Slot ALOHA with Access Class Barring for Quality of Service (QoS) for high-priority large-scale IoT scenarios in metropolitan transit stations. services in large-scale IoT environments in metro The proposed methodology, by dynamically changing stations. ACB control parameters and implementing predictive 6. Predictive access application: Developed a modeling on access requests, should be able to provide method to predict access applications for better high QoS for important applications like elevator access control. monitoring under different traffic conditions. This novel 7. Experimental validation: Validated the method strategy overcomes some fundamental limitations of the in a Shanghai metro station, showing practical previous approaches by providing a scalable, reliable, and advantages over traditional methods. economic solution to congestion management in IoT systems with complex networks. Therefore, the key 2 Systems model and custom MAC contributions of the paper are as follows: The key contributions of the paper are as follows: layer protocol for IoT 1. Congestion control method: Developed a communication in large metro method for managing large-scale elevator stations terminal data access in metro stations using IoT, addressing low access success rates and delays. 2.1 Systems model 2. Data categorization: Divided business data based on volume and latency requirements, using Based on the practical application, a metro station Slot ALOHA (SA) for delay-insensitive data and communication model is built with large-scale MTCD to Access Class Barring (ACB) for time-sensitive simulate the congestion caused by frequent network data. access by communication devices. Illustration of IoT 3. Dynamic ACB adjustment: Proposed communication model for large metro stations in Fig. 1 dynamically adjusting ACB control parameters shows how the MTCDs will be sending their data to the by estimating access requests to optimize server via the eNB. terminal access. The evolved Node B (eNB) receives, controls, and 4. Performance evaluation: Demonstrated allocates up/down dynamic resources. MTCD data is through simulations that the hybrid access transmitted to a fixed gateway through the narrowband method improves access success rates and IoT, which forwards the data to the server. In the IoT reduces delays, especially with high access model, when two or more MTCDs use the same preamble requests. code simultaneously, it indicates that the decision is in conflict and the device access fails. Single RAN MI Single EPC HLR/PCRF G/N/L/NB-LoT G/U/L Application Server eNB LoT EPC NB-LoT New M2M Platform Wireless Core network IoT MTCD network side side Support Platform Figure 1: Illustration of the IoT communication model for large metro stations, showcasing the flow of data from MTCDs to servers via the eNBCustom MAC Layer Protocol Given that complex signaling can reduce the success low sensitivity to delay, SA direct access is utilized. rate of device access, the network employs the Media Conversely, ACB random access is applied to delay- Access Control (MAC) protocol [18]. The MAC layer sensitive and data-intensive services. Fig. 2 illustrates the protocol combines Selective Acknowledgement (SA) and hybrid MAC layer protocol diagram, where 𝑇𝑖 is the i-th Access Class Barring (ACB) controls to adapt to various access timeslot. types of business data and enhance access speed and The hybrid MAC layer protocol divides each success. For services with small amounts of valid data and incoming data packet into four parts: Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 161 1. Broadcasting data access information and ACB Using the hybrid MAC layer protocol for the control parameters for the current timeslot. classified transmission of different business data 2. Assigning preamble codes to randomly accessed effectively reduces signaling consumption, accelerates services. data access, and ensures the Quality of Service (QoS) 3. Handling SA direct access business. demands of high-priority business services. 4. Conducting data transmission. Ti-2 Ti-1 Ti Ti+1 Random Broadcast Access Sotted- Random Access data Allocation ALOHA Data Transfer Leader Figure 2: Hybrid MAC layer protocol 3 Design of hybrid access method network congestion. Therefore, a random wait period is introduced before attempting to resend the data. The data transmission process is illustrated in Fig. 3. 3.1 SA method and improvement The SA transmits data by speaking first. Signal overlap is likely to occur during concurrent operations, leading to Resend Node 1 Resend Resend Node 2 Resend Node 3 Channel success success success collision success success success collision Figure 3: Data sending process for traditional SA method In the traditional Slot ALOHA (SA) method, the time likelihood of collisions is significantly reduced, as nodes for retransmission is random, leading to a high probability are not transmitting simultaneously. This structured of complete or partial collisions. This randomness reduces approach allows for more efficient use of the available the efficiency of information utilization and decreases bandwidth and improves overall system throughput. system throughput. The improved SA data-sending process, which To address these issues, the data transmission process mitigates collisions and enhances throughput, is illustrated has been improved. The transmission period is divided in Fig. 4. This method ensures that each node's into several time slots, and data can only be sent at the transmission is independent of others, leading to more initial point of a time slot. By ensuring that nodes transmit reliable and orderly communication within the network. information within their designated time slots, the Node 1 Node 2 Node 3 Channel frequency domain 162 Informatica 49 (2025) 157-172 J. Shi et al. Figure 4: Data sending process for improving the SA method The relationship between the throughput rate 𝑄 and The probabilities of the 𝑖 -th preamble being in these the sent packet quantity 𝐺 can be expressed as Eq. (1): three states is given by the following Eq. (3): 𝑄 = 𝐺𝑒−𝐺 (1) 𝑃(𝑤 When two nodes transmit within the period 𝑇′, the 𝑖) 𝑛 1 𝑎 data transmission delay function is given in Eq. (2): (1 − ) , 𝑤𝑖 = 0 𝑁𝑝 𝑛 𝑛𝑎 1 𝑎−1 𝑇𝑌 = 2𝑇 ′ + 𝑡𝑑 + [𝜑𝑇 ′ + (𝐵 + 1)𝑇′](𝑒𝐺 (3) − 1) (2) = ⋅ (1 − ) , 𝑤𝑖 = 1 𝑁𝑝 𝑁𝑝 𝑛 1 𝑎 𝑛 𝑛 𝑎−1 𝑎 1 Where 𝜑 represents the waiting time for a response, 1 − (1 − ) − ⋅ (1 − ) ,𝑤𝑖 ≥ 2 { 𝑁𝑝 𝑁𝑝 𝑁𝑝 𝑡𝑑 represents the propagation duration and 𝐵 represents the maximum value of the backoff time slot. Where 𝑁𝑝 represents the number of available The fixed transmission channel and the number of preamble codes in the current timeslot, 𝑛𝑎 indicates the inherent node parameters determine the transmission number of access requests for the current timeslot. delay of SA. Therefore, the improved method is only Assume that the number of preamble codes satisfying suitable for processing delay-insensitive and small data wi = 0, wi = 1, and wi ≥ 2 in the current timeslot are L1, volume services. Otherwise, the transmission error will L2, and L3, respectively. Then, the maximum likelihood increase, and the availability of information will be estimation of the number of access applications in the reduced. current timeslot is expressed as Eq. (4): 3.2 Estimation of access applications based 𝑃 = 𝑃(𝑤𝑖 = 0|𝑁𝑎) 𝑛1 ⋅ 𝑃(𝑤𝑖 = 1|𝑁𝑎) 𝑛2 ⋅ 𝑃(𝑤𝑖 on time series prediction ( ≥ 2|𝑁𝑎) 𝑛 4) 3 3.2.1 Estimation of current timeslot access The principle is to ensure that the number of access applications requests in the next time slot is optimal. The estimated For services using the ACB (Access Class Barring) number ?̂?𝑎 of access requests in the current timeslot can random access mode, the application amount of the be obtained by setting 𝑁𝑎 to the maximum value. The service should be estimated based on the occupation of the expression is given in Eq. (5): preamble code [19, 20]. Assume that 𝑤𝑖 represents the 𝐽 state of the 𝑖 -th preamble code. The states are defined as follows: ?̂?𝑎 = 𝑎𝑟𝑔 max∑𝑙𝑛𝑃 (𝑤𝑗|𝑁𝑎) (5) - When 𝑤𝑖 = 1, the preamble code is not selected 𝑗 and is idle. - When 𝑤𝑖 = 1, an MTCD (Machine-Type After ACB, the comparison results between the Communication Device) has selected the maximum likelihood estimate and the actual application preamble code and it is busy. amount are shown in Fig. 5. It can be seen from the figure - When 𝑤𝑖 ≥ 2, two or more MTCDs have that the trend changes of the two lines are relatively selected the preamble code, resulting in a conflict consistent, indicating that the estimated value aligns well status [21]. with the actual value. Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 163 Figure 5: Comparison results of maximum likelihood estimation and actual application volume (After passing the ACB) According to the maximum likelihood estimation the ACB control parameter of the current timeslot. Before after passing the ACB, the actual number of access passing the ACB, he comparison between the maximum applications can be calculated as ?̂? = ?̂?𝑎/𝑎, where 𝑎 is likelihood estimates and the actual number of applications is shown in Fig. 6. Figure 6: Comparison results of maximum likelihood estimation and actual application volume (Before passing the ACB) For services accessed in SA mode, the estimation is ?̂? 𝑊 ?̂? 𝑖 − 𝑖 + 𝐻𝑖+1 + 𝑇𝑖+1, 𝑖 ≤ 𝐼𝐷 based on the physical resource block status of the current 𝑖+1 = { (7) ?̂? g that the total number of available 𝑖 −𝑊𝑖 + 𝐻𝑖+1, 𝑖 > 𝐼time slot [22]. Assumin 𝐷 resource blocks is 𝑈𝑠, and the number of idle rate blocks in the current timeslot is 𝑈𝑘,𝑖. The actual idle rate is ?̃? here, 𝐼 eslot. 𝑘,𝑖 = W 𝐷 represents the last tim 𝑈 𝑈 𝑐 nce the access request volume is a time series, the 𝑘,𝑖 , the theoretical idle rate is 𝑃 𝑠−1 𝑖 Si 𝑘,𝑖 = ( ) , where 𝐶 is 𝑈 ighted sum of historical increments is used as an 𝑠 𝑈 𝑖 we 𝑠 the access application volume of the current timeslot. y increment in the next time slot. The newly arrived access equating the theoretical idle rate to the actual idle rate, applications in the 𝑖 + 1 time slot, 𝑇𝑖 can be expressed +1 ?̃?𝑘,𝑖 = 𝑃𝑘,𝑖, the number of access requests in the current as shown in Eq. (8): time slot is obtained as shown in Eq. (6): 3 3 1 𝑇 𝑙𝑜𝑔( ?̃? 𝑖 = 𝑇 𝑇 +1 𝑘,𝑖) 5 𝑖 + 10 𝑖−1 + 𝑇 10 𝑖−2 (8) ?̂?𝑖 = (6) 𝑙𝑜𝑔(𝑁𝑖(𝑁𝑖 − 1) Because 𝑇𝑖 = ?̂?𝑖 − ?̂?𝑖−1 − 𝐻𝑖 +𝑊𝑖−1, the Eq. (9) is 3.2.2 Estimation of next timeslot access as follows: applications 3 3 1 Assume that the estimated number of access applications 𝑇𝑖 = 𝑚𝑎𝑥 {0, ( 𝑇 + 𝑇 +1 5 𝑖 + 𝑇 10 𝑖−1 −2)} 10 𝑖 in the 𝑖 -th time slot is ?̂? 3 3 2 1 𝑖, the number of access successes ?̂? − ?̂? − ?̂? −2 − ?̂? is 𝑊𝑖, the number of newly arrived access applications in 5 𝑖 10 𝑖−1 10 𝑖 10 𝑖−3 3 3 1 (9) the 𝑖 + 1 time slot is 𝑇 = 𝑚𝑎𝑥 0, 𝑖 , and the number of access − 𝐻 − 𝐻 1 − 𝐻𝑖−2 + +1 5 𝑖 10 𝑖− 10 applications that need to be retransmitted is 𝐻 3 3 1 𝑖 . Then the +1 { 𝑊 𝑊 (5 𝑖−1 + 𝑊 + 10 𝑖-2 10 𝑖−3 )} estimated number of access applications in the 𝑖 + 1 time slot can be shown as Eq. (7): After transformation, the estimated amount of access requests for the next time slot can be obtained. The expression is given in Eq. (10): 164 Informatica 49 (2025) 157-172 J. Shi et al. ?̂?𝑖+1 = 3 3 2 1 ?̂? − ?̂?𝑖−1 − ?̂? 5 𝑖 10 10 𝑖−2 − ?̂? 10 𝑖−3 3 3 1 𝑚𝑎𝑥 ?̂?𝑖, − 𝐻 − 𝐻 − 𝐻 −𝑊𝑖 , 𝑖 ≤ 𝐼𝐷 (10) 5 𝑖 10 𝑖−1 + 10 𝑖−2 3 3 1 𝑊 + 𝑊 { (5 𝑖−1 + 𝑊 10 𝑖-2 10 𝑖−3 )} { ?̂?𝑖 −𝑊𝑖 + 𝐻𝑖+1, 𝑖 > 𝐼𝐷 The comparison between the predicted application actual value is relatively consistent, indicating that the amount and the actual application amount of the time predicted result of the access application volume aligns series is shown in Fig. 7. It can be seen from the figure that well with the actual value. the curve change trend of the estimated value and the Figure 7: Comparison results of predicted and actual application volumes of time series 𝑃(𝑁 3.2.3 Parameter adjustment of predicted 𝑎 = 𝑛𝑎|𝑁 = 𝑛) 𝑛 (12) = 𝐶 𝑎 𝑎 values 𝑛 ⋅ 𝑎𝑛𝑎 ⋅ (1 − 𝑎)𝑛−𝑛 Update the packet parameter 𝐿1 and ACB control Then the estimated value of success access is given in parameter 𝑎 of the dynamic preamble code according to Eq. (13): the prediction value of the service arrival to ensure the 𝑎 access success rate of the next timeslot. Since 𝑤 𝑀[𝑁 1 𝑖 = 1 𝑠|𝑁 = 𝑛] = 𝑛 ⋅ 𝑎 ⋅ (1 − )𝑛𝑎− (13) 𝑁 indicates the successful transmission of the preamble 𝑝 code, the estimated value of the preamble code that can Deriving from a, the optimal control parameter is transmit successfully is given in Eq. (11): given in Eq. (14): 𝐽 𝑁 𝑀[𝑁 𝑎′ = (14) 𝑠|𝑁 𝑝 𝑎 = 𝑛𝑎] = ∑ 𝑃(𝑤 𝑖=1 𝑖 = 1|𝑁𝑎 = 𝑛 𝑛 1 1 1 𝑎) =𝑁𝑝 ⋅ 𝐶𝑛 ⋅ ⋅ (1 − )𝑛𝑎−1=𝑛 𝑎 𝑁 𝑎 ⋅ (1 − (11) From Eq (14), the access success rate is highest when 𝑝 𝑁𝑝 1 the number of access requests matches the number of )𝑛𝑎−1 𝑁𝑝 currently available preamble codes. The effect is optimal when 𝐿1 equals the number of high-priority access 𝑁𝑠 represents the number of preamble codes requests in the current timeslot. successfully transmitted, and 𝑁𝑎 represents the number of Fig. 8 shows the relationship between the number of services filtered by ACB. Suppose the system contains access requests and access successes when the number of 𝑁 MTCDs, and 𝑁𝑎 MTCDs pass the screening. The preamble codes is 35, 60, and 76, further verifying the probability is given in Eq. (12): correctness of the above conclusions. Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 165 Figure 8: Relationship between access success and access requests priority and low-priority services using the hybrid access 3.3 Hybrid access process method. Here’s a detailed explanation of the process: The access process is outlined in Figure 9, illustrating the steps involved in managing access requests for high- start MTCD access failed Y N Attempts exceeded High-level Select a random preset? business? number between (0,1) Y Pick a preamble from N N Small data ACB Less than control an interval (1, L1) and parameter a ? volume business? Access try to access it Y Y Leader Code Select the preamble Delay sensitive Repeated from the interval (L1+1, business? Y Selection? Np) and try to access it Y N Current slot SA access MTCD access access failure succeeded Y N Conflict? Figure 9: Access flow of the hybrid method 1. Initial collection and setup: reserved for high-priority services and proceeds ❖ The evolved Node B (eNB) collects access data to the access link. from the previous timeslot, counts the usage of ➢ For low-priority services, a random number 𝑝 is preamble codes, completes channel resource selected from the interval [0,1]. If 𝑝 is less than allocation, and sets parameters such as ACB the ACB control parameter 𝑎 of the current control and backoff parameters. timeslot, a preamble is selected from the set 2. Random access phase: 𝐾2[𝐿1 + 1,𝑁𝑝] designated for low-priority ❖ Determine the priority of the application access services. If 𝑝 ≥ 𝑎, the access is terminated. business: 3. Direct access phase: ➢ For high-priority services, the system directly ❖ Services with small data volumes proceed with selects a preamble code from the set 𝐾1[1,𝐿1] direct access. 166 Informatica 49 (2025) 157-172 J. Shi et al. 4. Data transmission phase: 4 Experiments ❖ MTCDs that have successfully obtained a transmission opportunity begin data 4.1 Experimental preparation transmission. This structured approach ensures that high-priority The experimental site for the study is a large metro station services are given precedence and that low-priority in Shanghai, equipped with a significant number of IoT services are managed in a way that minimizes conflicts terminals. The configuration of the parameters used in the and optimizes resource use. The hybrid access method experiments including the number of preambles, dynamically adjusts parameters based on historical data, maximum transmission attempts, conflict resolution time, improving overall system throughput and efficiency. and escape time, providing a baseline for evaluating the hybrid access method are detailed in Table 2. Table 2: Key parameters used in the simulation experiments, including preambles and conflict resolution time, forming the baseline for evaluating the hybrid access method Parameter Value Number of preambles 60 Maximum transmission times of preamble code 8 Conflict resolution time 24 ms Escape time 15 ms These parameters were utilized to simulate and Therefore, the success rate of preamble code access is analyze the performance of the hybrid access method redefined for a fair assessment. The success rate, 𝑃𝑇 , is under various traffic conditions, including uniform and calculated as the ratio of the number of successfully beta distribution models, to verify its effectiveness in accessed services (𝑁𝑐) to the total number of preamble managing access congestion and ensuring timely data codes used in the access process (𝑁𝑎𝑙𝑙). This redefinition transmission in large-scale IoT environments. allows for a more accurate comparison of the efficiency The uniform and beta distribution models are and effectiveness of the hybrid access method against employed to verify the feasibility of the hybrid access traditional ACB methods. method by simulating various types of business data, including periodic and sudden data as well as random and 4.2 Experimental results and analysis irregular data, in elevator monitoring. To ensure comparability, ACB access and LA-ACB with different 4.2.1 Simulation results and analysis of parameters are also used as benchmarks in the uniform distribution model experiments. These experiments aim to count and This section discusses the simulation results and analysis compare the average access delay and access success rate using a uniform distribution model to evaluate the of different services [23]. performance of the hybrid access method compared to Given that the hybrid access method assigns different traditional methods such as ACB (Access Class Barring) ranges of preamble codes according to the priority of and LA-ACB (Learning Automata ACB). services, while the ACB method shares all access resources uniformly, a direct comparison would be unfair. Figure 10: Comparison of access success rates for high-priority services using the hybrid access method, ACB, and LA-ACB under the uniform distribution model Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 167 Figure 11: Comparison of average access delay for high-priority services under the uniform distribution model The access success rate of high-priority services is 52.43% at 4000 applications. Fig. 11 shows the demonstrated in Fig. 10. When the number of access comparison of average access delay for high-priority applications is small, the LA-ACB method performs services. With an increase in access applications, the excellently. However, as the number of applications average access delay for the hybrid access method remains increases, LA-ACB causes resource wastage, and its relatively stable, indicating higher resource utilization and performance gradually declines. The hybrid access meeting high-priority service requirements more method initially shows lower success rates and higher effectively than LA-ACB. In other words, the hybrid delays due to high estimation errors but improves access method achieves a lower delay (76.72 ms at 4000 significantly as the number of access applications requests) compared to ACB and LA-ACB, ensuring QoS increases. Precisely, the hybrid method demonstrates a for time-sensitive applications. higher success rate as access requests increase, reaching Figure 12: Comparison of access success rates for concurrent services in the uniform model, with the hybrid method outperforming ACB and LA-ACB by reducing collisions and improving resource use Figure 13: Comparison of average access delays for concurrent services in the uniform model, showing the hybrid method's lower delays (76.72 ms), ensuring timely transmission 168 Informatica 49 (2025) 157-172 J. Shi et al. The comparison of access success rates for multiple significantly improves the system's access success rate and types of concurrent services is illustrated in Fig. 12, while average access delay, thereby meeting the QoS (Quality of Figure 13 shows the comparison of average access delay Service) needs for high-priority services in large-scale IoT for these concurrent services. The hybrid access method terminal access scenarios. outperforms ACB and LA-ACB, showing a higher success rate and lower delay, especially when the number of 4.2.2 Simulation results and analysis of beta access applications reaches 4000. At this point, the hybrid distributed access model method achieves a 52.43% success rate and an average When the beta distribution model is adopted, the delay of 76.72 ms, demonstrating undeniable advantages performance of the hybrid access method is evaluated in in efficiency and effectiveness. terms of the access success rate and average access delay These results indicate that the hybrid access method, for high-priority services. especially under a uniform distribution model, Figure 14: Comparison of access success rates for high-priority services in the beta distribution model, with the hybrid method excelling (42.07% at 4000 applications) through dynamic adjustments and efficient resource use Figure 15: Average access delays for high-priority services in the beta distribution model, with the hybrid method achieving a lower delay (82.02 ms at 4000 applications) than ACB and LA-ACB Fig. 14 illustrates the access success rate of high- requests, thereby minimizing the waiting time and priority services under the beta distribution model. The improving overall efficiency. results indicate that the hybrid access method achieves a These results highlight the advantages of the hybrid higher access success rate compared to the ACB and LA- access method in managing high-priority service requests, ACB methods. This improvement is due to the dynamic ensuring higher access success rates, and reducing average adjustment of access application amounts and access access delays under the beta distribution model. This parameters in the next timeslot, which optimizes the demonstrates the method's effectiveness in handling allocation of resources for high-priority services. dynamic and bursty traffic patterns in large-scale IoT Fig. 15 presents the comparison of average access environments. delay for high-priority services using the beta distribution The total number of system preamble codes is 60. model. The hybrid access method demonstrates a lower When high-priority services are concurrent with low- average access delay compared to ACB and LA-ACB priority services, the access success rate is shown in Fig. methods. This reduction in delay is attributed to the 16, and the average access delay is shown in Fig. 17. method's ability to better predict and manage access Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 169 Figure 16: Access success rate for concurrent services in the beta distribution model, with the hybrid method achieving 42.07% at 4000 applications, surpassing ACB and LA-ACB Figure 17: Average access delay for concurrent services in the beta distribution model, with the hybrid method achieving 82.02 ms, outperforming ACB and LA-ACB Figure 16 shows access success rate for concurrent methodologies. This leads to a substantial increase in services under the beta distribution model. The hybrid system throughput that ensures reliable and efficient access method outperforms ACB and LA-ACB methods, communications over large-scale IoT topologies. achieving a success rate of 42.07% at 4000 applications, In summary, the hybrid access method enhances the demonstrating robust handling of burst traffic. Figure 17 performance of the system and also responds to robustness illustrates average access delay for concurrent services and scalability challenges; hence, it is the best against all under the beta distribution model. The hybrid access the complexities in communications in IoT at a metro method reduces delay to 82.02 ms at 4000 applications, railway station. Dynamic adaptability and predictive ensuring better performance for high-priority and time- accuracy make this tool indispensable to maintain the sensitive services. In fact, it is these very measures of optimum service level and meet the stringently demanding performance that represent important favorable points for QoS of critical infrastructure. the proposed hybrid model over conventional algorithms like ACB and LA-ACB. 5 Discussion The experimental results also reveal that the access success rate and average access delay are significantly The proposed hybrid access scheme constitutes one of the improved by the proposed hybrid access method. In key improvements in congestion management schemes addition, it well satisfies the requirements brought by the over large-scale IoT networks, especially in highly Quality of Service of high-priority traffic for periodic and populated areas such as in metro stations. In the process, bursty large-scale terminal access requests. It enables the SA-ACB merging is targeted at the solution of method to predict the volume of the access application fundamental issues like low access success rates and high effectively in the next timeslot in a dynamic way by taking delays in a network. Higher performance indices are advantage of the historical state of the preamble code, promised to be exhibited compared with the existing without assuming anything about the quantity of access methodologies LA-ACB and traditional ACB. For applications. instance, under the uniform distribution model, the The predictability allows for the tailoring of the maximum access success rate reaches 52.43% at 4000 hybrid access method to the various characterizations of requests, which is far beyond the limitation of LA-ACB different services, hence optimality in the choice of access owing to the inefficiency of resource utilization when 170 Informatica 49 (2025) 157-172 J. Shi et al. requests are too many. Besides, this approach ensures an Nevertheless, the suggested congestion control approach, average latency of no more than 76.72 ms for high-priority primarily corroborated through simulations, may not services that strictly meet the QoS requirement. Under entirely reflect the intricacies of real-world scenarios and correspondence, within the beta distribution model, the diverse traffic patterns encountered. Therefore, even robustness exposed to bursty traffic by the hybrid the refined uniform and Beta distribution models need approach achieved 42.07% in success rate and 82.02 further refinement and validation in order to ensure their milliseconds average delay. accuracy against different scenarios. The scalability of the Those advantages come forth due to novelty in method, especially above 4000 access requests, was not resource allocation and predictive adjustments that this deeply analyzed, as was the application of the method to hybrid method will implement. The method dynamically other IoT applications. It has to be implemented on-site, adapts the ACB control parameters in view of historical considering variations in traffic models, advance data and real-time estimation to optimize channel prediction methods using machine learning techniques, utilization with minimum collision. It efficiently spreads and scalability analysis for performance evaluation. the network load in a dual-access approach wherein small Extension of the method to other IoT applications, data services are managed by SA and large delay-sensitive investigation of energy efficiency, and incorporating services are overseen by ACB. This flexibility is a key robust security will ensure its sustainability, hence reliable ingredient for achieving high scalability and reliability, in different IoT environments. especially under scenarios that exhibit diversified traffic patterns where high-priority applications must coexist Acknowledgment with low-priority ones. The practical implications of these findings are huge. Thanks to our families and colleagues who supported us Hybrid should guarantee environments like metro stations morally. with very low latency and high access success ratios, dependably surveilling the very important equipment of Funding statement elevators and escalators, while improving operational Not applicable safety and efficiency. Besides, this solution also provides a scalable and economically feasible way to handle congestion in IoT networks, thus making it suitable for Conflicts of interest smart city, industrial automation, and, generally speaking, The authors declare that there is no conflict of interest high-traffic IoT systems. Future works may further regarding the publication of this paper. optimize the proposed approach for energy efficiency and extend its applicability to realistic traffic for further Authorship contribution statement generalization. These results have established the hybrid access method as a robust and practical solution to handle The manuscript has been read and approved by all the congestion in large-scale IoT networks. authors, the requirements for authorship, as stated earlier in this document, have been met, and each author believes that the manuscript represents honest work. 6 Conclusion The paper proposed an IoT-based congestion management Availability of data and materials strategy for mass data access from the elevator terminals at the metro station. This method categorized the business On Request data by volume and latency requirements and adopted SA for delay-tolerant services and ACB for real-time services. Declarations Besides, in the proposed methodology, dynamically Not applicable adjusting ACB control parameters was adopted to optimize the access efficiency for terminals. The effectiveness of the approach is corroborated by the References simulation results: from a uniform distribution model, [1] ShuangChang F, Jie C, Yanbin Z, Zheyi L (2020). based on 4000 access requests, the hybrid method can Discussion on improving safety in elevator achieve an access success rate of 52.43% and an average management. In: 2020 2nd International access delay of 76.72 ms. From the Beta distribution Conference on Machine Learning, Big Data and model, 42.07% with an average access delay of 82.02 ms Business Intelligence (MLBDBI). IEEE, pp 195– can be achieved. It is presented that the Hybrid Access 198. Method increases the access success rate greatly and https://doi.org/10.1109/MLBDBI51377.2020.000 decreases the delay hence fulfilling the QoS requirements 43. for high-priority services in a large-scale IoT [2] Ushakov D, Dudukalov E, Kozlova E, Shatila K environment. Future investigations ought to encompass (2022). The Internet of Things impact on smart practical implementation and examine more extensive public transportation. Transportation Research traffic models, sophisticated prediction methodologies, Procedia, 63:2392–2400. and scalability to further substantiate and augment the https://doi.org/10.1016/j.trpro.2022.06.275. applicability and dependability of the method. Congestion Control of Large-Scale Elevator Terminal Data Access… Informatica 49 (2025) 157–172 171 [3] Wang C, Feng S (2020). Research on big data 135:233–260. https://doi.org/10.1007/s11277- mining and fault prediction based on elevator life 024-10943-5. cycle. In: 2020 International Conference on Big [15] Piao Y, Lee T-J (2024). Integrated 2–4 Step Data & Artificial Intelligence & Software Random Access for Heterogeneous and Massive Engineering (ICBASE). IEEE, pp 103–107. IoT Devices. IEEE Transactions on Green https://doi.org/10.1109/ICBASE51474.2020.000 Communications and Networking, 8:441–452. 30. https://doi.org/10.1109/TGCN.2023.3322539 [4] Yao W, Jagota V, Kumar R, et al (2022). Study [16] Yu B, Cai Y, Wu D (2021). Joint Access Control and application of an elevator failure monitoring and Resource Allocation for Short-Packet-Based system based on the internet of things technology. mMTC in Status Update Systems. IEEE Journal Sci Program, 2022:2517077. on Selected Areas in Communications, 39:851– https://doi.org/10.1155/2022/2517077. 865. [5] Mao J, Chen L, Cheng H, Wang C (2023). https://doi.org/10.1109/JSAC.2020.3018801. Elevator fault diagnosis and maintenance method [17] Bui A-TH, Nguyen CT, Thang TC, Pham AT based on Internet of Things. In: Proc.SPIE, p (2019). A comprehensive distributed queue-based 1279305. https://doi.org/10.1117/12.3006383. random-access framework for mMTC in [6] Lai CTA, Jiang W, Jackson PR (2019). Internet of LTE/LTE-A networks with mixed-type traffic. Things enabling condition-based maintenance in IEEE Trans Veh Technol, 68:12107–12120. elevators service. J Qual Maint Eng, 25:563–588. https://doi.org/10.1109/TVT.2019.2949024. https://doi.org/10.1108/JQME-06-2018-0049. [18] Cui Y, Liu F, Jing X, Mu J (2021). Integrating [7] Mouha RARA (2021). Internet of things (IoT). sensing and communications for ubiquitous IoT: Journal of Data Analysis and Information Applications, trends, and challenges. IEEE Netw, Processing, 9:77. 35:158–167. http://www.scirp.org/journal/Paperabs.aspx?Pape https://doi.org/10.1109/MNET.010.2100152. rID=108574. [19] Zhao L, Xu X, Zhu K, et al (2018). QoS-based [8] Wang J, Lim MK, Wang C, Tseng M-L (2021). dynamic allocation and adaptive ACB mechanism The evolution of the Internet of Things (IoT) over for RAN overload avoidance in MTC. In: 2018 the past 20 years. Comput Ind Eng, 155:107174. IEEE Global Communications Conference https://doi.org/10.1016/j.cie.2021.107174. (GLOBECOM). IEEE, pp 1–6. [9] Song T, Cai J, Chahine T, Li L (2021). Towards https://doi.org/10.1109/GLOCOM.2018.8647599 Smart Cities by Internet of Things (IoT)—a Silent [20] Sari RF, Harwahyu R, Cheng R-G (2020). Load Revolution in China. Journal of the Knowledge Estimation and Connection Request Barring for Economy, 12:1–17. Random Access in Massive C-IoT. IEEE Internet https://doi.org/10.1007/s13132-017-0493-x. Things J, 7:6539–6549. [10] Ullah S, Radzi RZ, Yazdani TM, et al (2022). https://doi.org/10.1109/JIOT.2020.2968091. Types of Lightweight Cryptographies in Current [21] He H, Ren P, Du Q, Sun L (2015). Estimation Developments for Resource Constrained Machine based adaptive ACB scheme for M2M Type Communication Devices: Challenges and communications. In: Wireless Algorithms, Opportunities. IEEE Access, 10:35589–35604. Systems, and Applications: 10th International https://doi.org/10.1109/ACCESS.2022.3160000 Conference, WASA 2015, Qufu, China, August 10- [11] Mahmood NH, Alves H, López OA, et al (2020). 12, 2015, Proceedings 10. Springer, pp 165–174. Six Key Features of Machine Type https://doi.org/10.1007/978-3-319-21837-3_17. Communication in 6G. In: 2020 2nd 6G Wireless [22] Zhai D, Lu Y, Shi R, Ji Y (2022). Large-Scale Summit (6G SUMMIT),. pp 1–5. Micro-Power Sensors Access Scheme Based on https://doi.org/10.1109/6GSUMMIT49458.2020. Hybrid Mode in IoT Enabled Smart Grid. In: 2022 9083794. 7th International Conference on Signal and Image [12] Chou CM, Huang CY, Chiu C-Y (2013). Loading Processing (ICSIP). IEEE, pp 719–723. prediction and barring controls for machine type https://doi.org/10.1109/ICSIP55141.2022.988668 communication. In: 2013 IEEE International 4. Conference on Communications (ICC). IEEE, pp [23] Liu G, Jiang X, Li H, et al (2022). Adaptive access 5168–5172. selection algorithm for large-scale satellite https://doi.org/10.1109/ICC.2013.6655404. networks based on dynamic domain. Sensors, [13] Zhang L, He C, Peng Y, et al (2023). Multi-UAV 22:5995. https://doi.org/10.3390/s22165995. Data Collection and Path Planning Method for Large-Scale Terminal Access. Sensors, 23:8601. https://doi.org/10.3390/s23208601. [14] Varsha V, Prakash SPS, Krinkin K (2024). An Intelligent Bayesian Inference Based Learning Automaton Approach for Traffic Management in Radio Access Network. Wirel Pers Commun, 172 Informatica 49 (2025) 157-172 J. Shi et al. https://doi.org/10.31449/inf.v49i12.7840 Informatica 49 (2025) 173–190 173 CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless Sensor Networks Using Chaotic Mapping and Osprey Optimization Songhao Jia, Wenqian Shao *, Cai Yang, Shuya Jia, Yaohui Yuan, Huiyuan Chen and Haiyu Zhang School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, Henan, 473061, China E-mail: shaowenqian2023@163.com *Corresponding author Keywords: emergency communication system, wireless sensor network, prey optimization, chaotic mapping, energy consumption Received: December 17,2024 A wireless sensor network (WSN) represents a promising approach for establishing self-organizing wireless networks comprising a substantial number of wireless sensors, with the objective of facilitating communication in regions where the existing communication infrastructure has been severely disrupted. In order to address the issue of excessive energy consumption by cluster heads and central nodes in emergency communication networks of wireless sensor networks, this paper proposes an emergency communication algorithm for wireless sensor networks based on chaos mapping and osprey optimization. Firstly, an optimization algorithm based on chaos theory is used to select the virtual position of the initial population of the Osprey optimization algorithm. This is achieved by simulating the randomness and unpredictability of chaotic systems. Secondly, the osprey optimization algorithm and the improved fitness function are used to select the optimal cluster head combination. In the selection process, six factors, such as the energy level of network nodes, the distance between cluster heads, the distance between cluster heads and base stations, the distance between cluster heads and ordinary nodes, the variance of the distance between cluster heads and base stations and the variance of the distance between cluster heads, are comprehensively considered. Finally, the heuristic function of FA-star algorithm is used to select the next hop node to transmit the message. The results of the simulation demonstrate that the residual energy of the CM-OOA algorithm is 14% higher than that of the CGWOA algorithm following the transmission of 1000 data rounds. This figure is 54% higher than that observed for the PSO-C algorithm. The findings demonstrate that the CM-OOA algorithm effectively extends the network lifetime and preserves a favorable load balance in diverse network settings. Povzetek: CM-OOA algoritem s kaotičnim preslikovanjem in optimizacijo osprejev natančno izbere optimalna vozlišča, zmanjšuje energijsko porabo in podaljšuje življenjsko dobo WSN, kar je ključnega pomena za nujne komunikacijske sisteme. 1 Introduction among others. Due to its low cost and ease of use, it is capable of functioning in a multitude of challenging In recent years, with the gradual warming of the global environments. In areas inaccessible to humans, climate, after earthquakes, floods, strong tropical storms unmanned aerial vehicles (UAVs) can be deployed to or other disasters, fixed communication network facilities establish wireless communication networks [2]. It can be may be completely destroyed or most of them may not reasonably proposed that wireless sensor networks work normally. Communication is extremely important represent a potential method for emergency for emergency rescue and disaster relief [1]. At this time, communication. However, the considerable number of we need an emergency network that can be quickly sensor nodes, coupled with the limited energy capacity deployed without relying on any fixed network facilities. and relatively short operational lifespan, present A wireless sensor network is a network composed of a significant challenges. Nevertheless, the length of time large number of randomly distributed nodes that are that emergency communication networks based on capable of self-organization. The primary function of the wireless sensors can remain operational is a significant system is to monitor and obtain data from the target area challenge. One promising avenue for further research is and subsequently transmit it to the base station. A to enhance the energy efficiency of these networks, plethora of potential applications can be envisaged in the thereby prolonging their operational lifespan. context of the Internet of Things, including those in the Aiming at the energy consumption problem of WSNs in military, aerospace, ocean and agricultural sectors, data transmission, it is an effective method to prolong the 174 Informatica 49 (2025) 173-190 S. Jia et al. life of wireless sensor networks by selecting cluster normal sensing nodes in the future rounds of the heads for network nodes and data fusion [3]. At present, sub-cycle [12]. Das Rahul proposed a large-scale cluster head selection algorithm usually uses two energy-aware trust optimization algorithm for cluster technologies, one is to randomly select cluster heads head selection and malicious node detection. The through thresholds, and the other is to design appropriate harmonic search genetic algorithm was originally used to fitness function to select cluster heads by using swarm select cluster heads according to energy, trust, distance intelligence technology. Some scholars have also and density. By considering the trust value, this method proposed to solve the problem of rapid death of central avoids choosing malicious nodes as cluster heads, and nodes by using non-uniform clustering algorithm. then uses energy-aware trust estimation models within Firstly, scholar Wendi Rabiner Heinzelman proposed and between clusters to detect malicious nodes, this LEACH protocol, which randomly rotates cluster heads depends on two modules: direct trust and indirect trust with a certain threshold, and reduces energy consumption between clusters and within clusters [13]. Pal. Raju and prolongs network life cycle by clustering nodes to proposed a multi-objective binary grey wolf optimizer to cluster heads [4]. Saxena Madhvi scholars enhanced the find the clustering method in heterogeneous networks, original LEACH protocol by introducing new algorithms and extended the network life cycle through five CHME-LEACH and CHP-LEACH, reducing objectives: maximizing the overall cluster head energy, communication energy consumption and prolonging minimizing the cluster head compactness, minimizing the network life [5]. Jonnalagadda Suman scholars put number of cluster heads, minimizing the energy forward an energy-aware routing protocol MAX LEACH, consumption from non-cluster heads to clusters, and which is suitable for heterogeneous networks and maximizing the cluster spacing [14]. These scholars have homogeneous networks, to minimize the energy developed heterogeneous wireless sensor networks with consumption of nodes and extend the network life [6]. different energy nodes. One method of prolonging the These scholars employ data fusion techniques with the network life cycle is to increase the energy available to objective of reducing network energy consumption and the cluster head nodes. The comparison between extending network lifetime. algorithms is shown in Table 1. Secondly, with the continuous development of intelligent Table 1: Comparison of the different types of protocols algorithms, intelligent algorithms have broad application involved. prospects in selecting cluster heads in wireless sensor networks. The selection of cluster heads in wireless Mode References Vantage Drawbacks sensor networks is very similar to swarm intelligence Cluster heads are algorithm. Therefore, Gülbaş, Gülşah scholars introduced LEACH.2000[4] selected by simulated annealing algorithm to propose LEACH-SA The algorithm is threshold, and Threshold CHP-LEACH.2024 algorithm, and introduced simulated annealing algorithm simple, and the cluster random selection protocol [5] head is selected by the can lead to to select cluster heads to extend the network life [7]. threshold. irrational Mishra Rashmi scholars select the optimal number of MAX combinations of LEACH.2023[6] cluster heads. cluster heads among dense network nodes by introducing butterfly optimization algorithm, and select the next hop LEACH-SA.2023[7] Cluster intelligence node by introducing ant colony optimization algorithm in is used to select the the data transmission stage [8]. Nurul muazzah abdul Mishra Through the cluster head and the Machine continuous selection cluster head nodes latiff scholar proposed PSO-C protocol by introducing Rashmi.2023[8] learning of swarm intelligence, should be particle swarm optimization algorithm, which reduced protocol until the reasonable reasonably located. PSO-C.2007[9] network energy consumption and extended network life cluster head selection. Better reduction of [9]. Bejjam Komuraiah scholar put forward at the 14th energy International Conference on Computing Communication CGWOA.2024[11] consumption. and Networking Technologies in 2023 that genetic Clusters with ECSSEEC.2023[12] The number of nodes algorithm is introduced into wireless sensor networks, inconsistent number within the cluster is Non-unifo which makes the network balance load and optimize, and of nodes in the not the same, which rm Das Rahul.2024[13] cluster may lead to increases the better results in lower cycle [10]. Muntather can avoid the rapid protocol a large energy Almusawi scholars proposed the CGWOA protocol by death of the central consumption gap Pal. Raju.2024[14] node. introducing chaos algorithm and grey wolf optimization between clusters. algorithm, which reduced energy consumption by reducing the transmission distance of network nodes [11]. The various clustering routing algorithms proposed by The application of cluster intelligence algorithms enables the aforementioned scholars have the potential to reduce the selection of cluster heads that optimize the energy the energy consumption of wireless sensor networks and consumption of the network, thereby extending its to extend their operational lifetime. However, there is a operational lifespan. lack of reasonable allocation methods for the election of Thirdly, for heterogeneous networks, there are also many cluster heads and the selection of path nodes from cluster scholars' research on heterogeneous clustering algorithm. heads to base stations in the process of algorithm design. Verma Axel and other scholars put forward the In this paper, a chaos mapping Osprey optimization ECSSEEC protocol based on enhanced cost and sub-era. algorithm (CM-OOA) is proposed to reduce network In ECSSEEC protocol, the optimal number of clusters is energy consumption, improve clustering efficiency and selected by modeling the cost function, and the prolong network life. Firstly, the randomness and previously selected cluster heads are rotated again as ergodicity of chaotic mapping algorithm are used to CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 175 search for the global optimal solution. The core of 4) The power sent and received by each sensor node is chaotic mapping algorithm is chaotic mapping, which is controllable. a discrete nonlinear dynamic system that can produce 5) All sensors have the same properties and their seemingly random state changes. Chaos mapping positions remain unchanged relative to the base station. algorithm can effectively search in the solution space, so as to find the optimal solution or near optimal solution of the problem. Secondly, by using the Osprey optimization Satellite algorithm, the characteristics of local and global optimization can be well balanced. Find out all the optimal solutions or approximate optimal solutions to find the most suitable node as the cluster head, so that each cluster head node has the highest energy, the Internet Base station shortest distance to the base station, the shortest distance from the node to the cluster head and the more balanced distance between cluster heads. Finally, by comparing the distance from the node to the base station with the Computer distance from the cluster head to the base station, and the terminal energy of the cluster head itself, the common node selects the cluster head node and performs the cluster operation. The node to base station Euclidean distance is User less than the cluster head node to base station Euclidean Sensor node Monitoring area distance and will be transmitted directly to the base station. In the inter-cluster routing stage, based on the Figure 1: Network topology model. FA-star heuristic search algorithm, the heuristic function of four factors, namely the distance from the starting node to the forwarding node, the distance from the forwarding node to the base station, the energy of the node and the forwarding times of the node, is optimized. Select the most suitable next-hop routing node from the neighbor nodes composed of all nodes that meet the conditions. For the hot spot phenomenon that may occur in wireless sensor networks, because some nodes directly transmit to the base station, and the inter-cluster forwarding nodes include cluster head nodes and ordinary nodes, the energy consumption is more balanced, which will not cause the nodes to die too quickly, thus prolonging the communication time of the emergency communication network. 2 System model Figure 2: Emergency communication network mode. 2.1 Communication network structure model There are three main communication modes in the The topology model of wireless sensor network adopted emergency communication network. Firstly, in this paper is shown in Figure 1. The simulation model Communication within a cluster. Because information assumes that N nodes are randomly distributed in a transmission is only between clusters, this method square area of M*M and all nodes are wireless sensors of consumes less energy for wireless sensor networks [17]. the same type. The network model is shown in Figure 2. Secondly, Communication of the same cluster head node. In order to accurately calculate the information of the When users are not in the same cluster head node, if node and ensure that the base station receives and sends communication is needed, the common node will report data continuously and stably, the node can independently to the superior cluster head node and communicate with select the appropriate transmission power according to each other through the cluster head node. Thirdly, the energy consumption model [15-16]. In order to avoid Communication between different cluster head nodes. the influence of bad weather and human factors, network When users are neither the same cluster nor the same nodes need to meet the following requirements: cluster head node, if communication is needed, ordinary 1) The random dropping area M×M contains N n odes will report to their superior’s step by step and sensor nodes, and the node positions after dropping are contact each other through base stations [18]. fixed. In the communication process of wireless sensor network, the third process needs the information transmission of 2) Sensor nodes have unique and different ids. the whole wireless sensor network. User information is 3) The base station has unlimited energy and no signal transmitted in both directions from ordinary nodes to interference in the area. base stations and then to ordinary nodes, so energy consumption is mainly concentrated in the third mode 176 Informatica 49 (2025) 173-190 S. Jia et al. [19]. Therefore, this paper mainly studies the energy location of cluster head nodes is very important. consumption of the third communication mode. Attention should also be paid to the direction of data transmission in the process of ordinary nodes entering the cluster, but the "hot spot effect" around the base station in wireless sensor networks is also the key to extend the network life [21]. Through formula (1), it can 2.2 Communication energy consumption be seen that multi-hop transmission is better than model single-hop transmission in long-distance transmission. In this paper, the wireless sensor emergency However, in multi-hop transmission, in the process of communication network adopts the first-order wireless selecting the next hop node from the cluster head node to communication energy consumption model [20], which the base station, the same next hop node will be selected can be divided into short-distance free space model and continuously, resulting in the rapid death of the node. To long-distance multi-path model according to the solve these problems, it contains three main problems: transmission distance. The specific formulas are as 1) Does the cluster head combination affect the follows: (1) - (3). network energy consumption? 𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑓𝑠 ∗ 𝐿2, 𝐿 < 𝐷0 2) How to plan the direction of data transmission to 𝐸𝑇𝑥(𝐾, 𝐿) = { (1) 𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑚𝑝∗𝐿4, 𝐿 ≥ 𝐷0 solve the problem of “hotspot effect” where the center node dies quickly? 𝜀𝑓𝑠 𝐷0 = √ (2) 𝜀 3) Can the multi-hop cluster head node choose the 𝑚𝑝 same node as the forwarding node every round? 𝐸𝑅𝑥(𝐾, 𝐿) = 𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 (3) In the past, many researchers did not comprehensively In formulas (1) - (3), 𝐸 consider the above problems from the perspective of 𝑇𝑥 is the energy consumption for sending K bit data; 𝐸 three-tier network energy consumption model. Some 𝑒𝑙𝑒𝑐 represents the energy consumption associated with the transmission and researchers randomly select cluster head combinations, reception of a single bit of data; 𝜀 which leads to the irrationality of cluster heads, and then 𝑓𝑠 is the loss factor of free space model; 𝜀 leads to redundant energy consumption. Because the 𝑚𝑝 is the energy loss factor of multipath attenuation model; L is the data transmission ultimate goal of data is the base station, the data distance; 𝐸 transmission direction can only be close to the base 𝑅𝑥 is the energy consumption for receiving K bit data. station. However, most researchers do not consider the influence of the clustering operation process of ordinary nodes on the data transmission direction, and all nodes are clustered. This process causes some nodes to transmit 3 Research on energy consumption data in the opposite direction to the base station, resulting in energy transmitted in the opposite direction. Some of three-layer network researchers also use multi-hop in long-distance From the network topology diagram, we can see that the transmission, but the forwarding times of the next hop data acquisition and transmission stage of wireless sensor node are not considered, which can not be ignored for the networks can be divided into three levels: ordinary node node life. layer, cluster head node layer and base station layer, as Aiming at the above three problems, this section will shown in the Figure 3. analyze the energy consumption reasons of each layer network from the perspective of three-layer network energy consumption, and put forward a reasonable cluster head selection, data transmission direction planning, and next-hop node selection and processing algorithm in multi-hop mode. 3.1 Reasonable cluster head combination In the process of selecting cluster head nodes by ordinary nodes, the distances from different cluster head node c ombinations to nodes are different. Therefore, it has an impact on the overall energy consumption of the network. Figure 3: Three-layer network model. From the energy transmission formula (1), it can be seen that the distance is directly proportional to the energy Through the formulas (1) -(3) in the network energy consumption. Reasonable cluster head combination can consumption model, the main reasons of energy better reduce the energy consumption of sending data due consumption in the three-layer network can be analyzed to distance. respectively. The data transmission of ordinary nodes is In the osprey optimization algorithm (OOA), the optimal the key to energy consumption. When the transmission position of individual osprey is obtained by updating the distance exceeds D0, the energy consumption during data position of individual osprey and comparing the fitting transmission will increase sharply, so the appropriate CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 177 function values of each individual position. However, the osprey optimization algorithm is limited by its slow convergence speed and tendency to converge to the local optimal solution. Aiming at the problems of slow convergence speed and easy to fall into local optimal solution in cluster head combination selection, this algorithm combines population initialization process with K-means++ algorithm and chaotic algorithm to form chaotic osprey optimization algorithm (CM-OOA). The output of chaotic Osprey optimization algorithm is similar to the selection of cluster head combination in wireless sensor networks. As shown in Table 2, there is significant consistency between the characteristics of wireless sensor networks and the principle of chaotic Osprey optimization algorithm. Table 2: Similarity correspondence table between wireless sensor networks and chaotic mapping osprey optimization algorithm. WSN CM-OOA Figure 4: Clustering algorithm effect. 3.1.2 Chaos mapping optimization Sensor node number Dimension position size Initializing the population by Logistic chaotic mapping can enhance the global search ability and help CM-OOA Individual position of algorithm jump out of the local optimal solution [22]. Node group osprey The randomness and unpredictability of Logistic chaotic mapping can prevent the algorithm from converging to the local optimal solution prematurely. Logistic chaotic Cluster head node Optimal individual mapping can adapt to different search spaces and combination position of osprey optimization problems, and has good universality. Therefore, Logistic chaotic mapping can be easily Combination of all All positions of osprey pre-selected cluster head combined with Osprey optimization algorithm to form population nodes CM-OOA algorithm, so as to make better use of their respective advantages to deal with the problem. Logistic chaotic mapping formula is as follows: Good population initialization allows the CM-OOA algorithm to start searching from several different initial 𝑃𝑖+1 = 𝛼 ∗ 𝑃𝑖 ∗ (1 − 𝑃𝑖) (6) starting points, which helps the algorithm to explore In the formula, α is the control parameter, and the value multiple regions of the solution space, thus increasing the is taken in (0,4]. Pi is the transformation of the likelihood of finding a globally optimal solution. If the coordinates of the initial population into polar angles as individuals in the initial population are too concentrated, an initial value. the algorithm may quickly converge to the local optimal solution and ignore other potentially better solutions. A The detailed process of CM-OOA algorithm: diverse initial population helps to avoid this. Proper population initialization allows the algorithm to find Step 1. Population initialization. better solutions at an early stage, thus speeding up the The virtual initialization of the Osprey cluster is achieved convergence of the whole search process. Therefore, the by means of the circular symmetric chaotic mapping algorithm in this paper performs the population algorithm and the K means++ clustering algorithm. initialization operation in two ways. Step 2. Initialization of osprey population based on 3.1.1 Kmeans++ algorithm for clustering location mapping algorithm. Initializing the populations is an important step in the The virtual position of the initialized osprey population is CM-OOA algorithm, which uses the Kmeans ++ obtained, and the real node number in the wireless sensor clustering algorithm for clustering to obtain a more network is mapped by the Euclidean distance d from the accurate optimal solution. The initial population nodes node to the virtual position and the energy e of the node are selected by the centre position of each cluster group. itself. The osprey population is initialized to P(t) = {Pt1, The calculation of the cluster centre position is based on Pt2, Pt3…}, and the individual Pti position of osprey is Pti the application of equations (4) and (5). The effect of the = {Xi1, Xi2, Xi3…}. clustering algorithm is shown in Figure 4. 𝑋 𝑡 𝑚 = ∑𝑖=0 𝑋⁄𝑡 (4) 𝑌𝑚 = ∑𝑡 𝑖=0 𝑌⁄𝑡 (5) 178 Informatica 49 (2025) 173-190 S. Jia et al. Step 3. Calculate the fitness function. 𝑃𝐹𝑖𝑠ℎ 𝑡,𝑗𝑥 = 𝑃𝑡,𝑗𝑥 + (𝑙𝑏𝑡 + 𝑅𝑡,𝑗 ∗ (𝑢𝑏𝑡 − 𝑙𝑏𝑡))⁄𝑡 (9) Fi=fitness (Pi(t)) is the fitness value of individual Pi(t) of osprey at time t, which is used to evaluate the strength of solving energy consumption problems at the position of 𝑃𝐹𝑖𝑠ℎ 𝑡,𝑗𝑦 = 𝑃𝑡,𝑗𝑦 + (𝑙𝑏𝑡 + 𝑅𝑡,𝑗 ∗ (𝑢𝑏𝑡 − 𝑙𝑏𝑡))⁄𝑡 (10) osprey. In formulas (9) - (10), Rt,j is a number randomly Step 4. Osprey individuals look for schools of fish. generated between [0,1]; ubt is the upper boundary of Through the comparison of fitness values, the individual the dimension coordinate; lbt is the lower boundary of positions of osprey whose fitness values are smaller than the dimension coordinate; Pt,jx is the X coordinate their own are combined as fish schools, Fish = {Pk(t)|k∈ position of osprey individual P in the J-th dimension of {1, 2,…N}∧Fkd1, the Y Osprey eat fish common node will perform the cluster head selection individually. operation.CM-OOA algorithm reduces energy consumption by preventing nodes from transmitting far Osprey position away from the base station. As shown in Figure 8, the update cluster head node CH2 will be selected first when the distance d3 from the common node to the base station is New osprey greater than the distance d2 from the cluster head node population position CH2 to the base station. Although it can be seen from the Y figure that the distance d4 from the common node to the t < Tmax cluster head node CH1 is smaller than the distance d5 N from the cluster head node CH2, the cluster head node Optimal osprey {CH2…} of the common node will be pre-selected with position less energy. If the pre-selected cluster head set is empty, data is directly transmitted to the base station. End Ordinary node Figure 5: Flow chart of selecting cluster head by CM-OOA algorithm. Cluster head node CH base station 3.2 Planning of data transmission direction d1 d3 In wireless sensor networks, all nodes are clustered, so when the nodes are close to the base station, they will still be clustered. As shown in Figure 6, it causes the problem that the node data is transmitted outward first and then inward. From the energy consumption model, it can be calculated that the energy consumption of all nodes in data clustering is E2, while the energy consumption of nodes in direct transmission is E1. From Figure 7: Median line formulas (11) - (12), it can be concluded that the direct transmission mode of some nodes has low energy consumption. 180 Informatica 49 (2025) 173-190 S. Jia et al. 4.1 Cluster head selection Ordinary node CH In the process of cluster head combination selection, 1 firstly, the number of cluster head nodes in the network is d4 Cluster head node CH calculated and the CM-OOA algorithm population is d d initialized. The virtual nodes are output by CM-OOA 1 d 5 3 base station algorithm, and the virtual nodes are mapped to the real CH2 d network to output the real and reasonable cluster head 2 combination. 4.1.1 Size of optimal number of cluster heads The energy consumption of nodes is an important factor Figure 8: Cluster head selection model of common affecting the communication time of emergency nodes. communication network, and the number of cluster heads plays a vital role in the whole network [28-29]. The main 3.3 Best next hop node consumption of emergency communication is divided into ordinary nodes transmitting cluster head Ept, According to the energy consumption model, ordinary nodes directly transmitting base station Ecp, transmission energy consumption is directly proportional cluster head nodes receiving intra-cluster node data Ecn, to the square of distance, and data transmission is the cluster head nodes fusing data Er, and cluster head nodes main energy consumption of wireless sensor networks. sending data to base station Ecj. The nodes deployed in According to the geometric cosine theorem and the the a×a model are evenly distributed, and (N-n) nodes are first-order radio network energy model [26]. Therefore, evenly distributed in KN circular clusters, and n nodes the algorithm in this paper adopts multi-hop mode for are directly transmitted to the base station, so the energy data transmission. consumption for one round of network transmission is: 𝐸𝐴𝐿𝐿 = KN ∗ (𝐸𝑝𝑡 + 𝐸𝑐𝑛 + 𝐸𝑟 + 𝐸𝑐𝑗) + 𝐸𝑐𝑝 (13) base station Neighboring node The energy consumption of common nodes in each N cluster is: d Starting node 1 L1 d 𝐸𝑝𝑡 = (𝑘 ∗ 𝐸 2 𝑁−𝑛 d d 5 𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝑐𝑛𝑡𝑜𝐶𝐻) ∗ ( − 1) (14) 3 𝐾𝑁 2 The energy consumption of ordinary nodes directly d4 transmitting base stations is: L2 𝐸𝑐𝑝 = (𝑘 ∗ 𝐸 2 𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝑐𝑛𝑡𝑜𝐶𝐻) ∗ 𝑛 (15) Figure 9: Neighbor node selection model. The cluster head node receives the energy consumption In the process of multi-hop data transmission, FA-star of nodes in the cluster as follows: algorithm and heuristic search are used to select the transmission path. The destination base station is reached 𝑁−𝑛 𝐸 by finding the minimum cost of the path [27]. In this 𝑐𝑛 = 𝑘 ∗ 𝐸𝑒𝑙𝑒𝑐 ( − 1) (16) 𝐾𝑁 algorithm, the neighbor nodes are selected in the same The energy consumption of nodes in cluster head node way as the cluster heads in the clustering algorithm. The fusion cluster is: Euclidean distance d1 from the start node N to the neighboring node is less than the Euclidean distance d3 𝑁−𝑛 from the start node to the base station, so the neighboring 𝐸𝑟 = 𝑘 ∗ 𝐸𝐷𝐴 ( ) (17) 𝐾𝑁 node chooses L1. As shown in Figure 9, the starting node n of the neighboring node {L1...} directly transmits if the The energy consumption transmitted from the cluster neighboring node is empty. head node to the base station is: 𝐸 2 𝑐𝑗 = 𝑘 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝐶𝐻𝑡𝑜𝐵𝑆 (18) 4 Design of CM-OOA algorithm In the formula, dCHtoBS is the distance from the cluster In this paper, the energy consumption of three-layer head node to the base station. network is analyzed in detail, and the clustering The distance from the common node to the cluster head algorithm of CM-OOA network is proposed by node in each cluster is: combining the chaotic Osprey optimization cluster head combination selection algorithm with the data 𝑎2 transmission direction and the best next hop strategy. The 𝑑𝑐𝑛𝑡𝑜𝐶𝐻 = √𝜌 ∗ ∬(𝑥2 + 𝑦2) 𝑑𝑥 𝑑𝑦 = (19) √2𝜋∗𝐾𝑁 algorithm is divided into cluster head selection stage, cluster establishment stage and data transmission stage. Sorting out the above equations (13) - (19), calculating The algorithm flow chart is shown in the Figure 10. the value of KN when EALL is minimized by deriving CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 181 the overall energy consumption of the network in one 𝑁∗𝜀𝑓𝑠∗𝑎2 round, and obtaining the optimal number of cluster heads KN = √ 2𝜋∗(𝜀𝑓𝑠∗𝑑2 (20) 𝐶𝐻𝑡𝑜𝐵𝑆−𝐸𝑒𝑙𝑒𝑐) KN as follows: Start Kmeans++ clustering CChhaos optimizattiioonn algorithm aallgorithm Initializing osprey Cluster Head population position Selection Calculate the fitness of osprey Osprey individuals look for fish. N Are there any fish? Y Osprey catches fish individually Ordinary Distance N node into comparison clusters Osprey position update Y Cluster head comparison fitness Transmission to N function value the base station Is it successful to catch fish? Ordinary nodes select Y cluster heads to form clusters Osprey eat fish individually. Node data Distance comparison transfer Osprey position based on update Astar algorith Election of m neighbouring nodes New osprey population position Calculate neighbour node fitness value Y t < Tmax Selection of the next hop node to transmit N to the base station Optimal osprey position End Figure 10: Flow chart of CM-OOA algorithm. 4.1.2 Population initialisation Calculate the polar angle from the node to the base station Obtaining the polar angle and converting the polar angle into the In order to avoid the problems of slow convergence speed and easy to fall into local optimal solution of initial value of the mapping. Osprey optimization algorithm, this algorithm maps the Using Logistic mapping, Equation (9)-(10) introduces chaotic initial population by chaos. The detailed flow of the chaotic mapping algorithm is shown in Algorithm 1. characteristics. The chaotic characteristics are inversely transformed to obtain a new polar angle. Algorithm 1: Pseudo-code of circular symmetric chaotic mapping algorithm. Obtain a new polar angle and calculate the virtual coordinates in the mapped rectangular coordinate system. Algorithm 1: Initialization of osprey population by chaotic mapping End algorithm Begin: 182 Informatica 49 (2025) 173-190 S. Jia et al. 4.1.3 Location mapping reciprocal is smaller, and the node can forward data better under the same conditions, it should be selected as The coordinates of nodes in wireless sensor networks are the cluster head. all random, but the coordinate positions change randomly after CM-OOA algorithm. After the coordinate of CM-OOA algorithm is transformed separately from the X 𝐹1 = 1⁄𝐸𝑖 (22) axis and the Y axis, there may be no real node at this The level of Euclidean distance between cluster heads: coordinate [30]. Therefore, CM-OOA algorithm designs the reciprocal of the sum of distances between cluster a position mapping function through the Euclidean head nodes. The location of the cluster head determines distance to the virtual position and the energy of the real the distance of data transmission by the nodes entering node, and maps the coordinates at the virtual position to the cluster. Cluster heads should be evenly dispersed to the nodes in the actual coordinate space through the reach the distance of all nodes. position. The location mapping formula is as follows: 𝐹2 = 1⁄∑ 𝑑𝑖𝑠(𝐶𝐻𝑖, 𝐶𝐻𝑗) (23) F = 𝜃1 ∗ 𝑑 + 𝜃2 ∗ 𝐸 (21) Euclidean distance between cluster heads and nodes: the In the formula, d is the Euclidean distance from the sum of the distances from cluster head node to all nodes. virtual position to the node; E is the energy of the node; Energy consumption in network cycle mainly comes θ1 and θ2 are weight factors and satisfy θ1 + θ2 =1. from node transmission. The sum of the positions of all nodes in the cluster is the smallest, so as to minimize the The detailed flow of the location mapping algorithm is energy consumption of data transmission. shown in Algorithm 2. Algorithm 2: Pseudo-code of position mapping 𝐹3 = ∑ 𝑑𝑖𝑠(𝑁𝑗 , 𝐶𝐻𝑖) (24) algorithm. Euclidean distance from the cluster head to the base Algorithm 2: The virtual position is projected to the real node station: the sum of distances from all cluster head nodes through the mapping function. to base station BS. The transmission of cluster head node is the energy consumption of the second part of the Begin: network cycle, and the distance from cluster head node to Calculating Euclidean distance d from all nodes to virtual base station determines the energy consumption of position coordinates cluster head node [33]. Therefore, the sum of the distances from the cluster head node to the base station is Obtain the energy e of all nodes themselves. the smallest, and the information can be transmitted to Calculate the position mapping function by formula the base station with the least energy consumption. (4)-(5) 𝐹4 = ∑ 𝑑𝑖𝑠( 𝐶𝐻𝑖 , 𝐵𝑆) (25) By comparing the function values, the node numbers of The variance of the Euclidean distance from the cluster virtual coordinates projected into the real network are head to the base station: variance of distance from all selected. cluster heads to base station. Because there is more than one cluster head node, and only the sum of the distances End from the cluster head to the base station is kept to the minimum, there may be a very long distance between a 4.1.4 Design CM-OOA algorithm adaptation function node and the base station. Therefore, by adding the In order to optimize the selection of cluster heads and variance of the distance from the cluster head to the base improve the life cycle of the network, after determining station to control the distance between the cluster head the optimal number of cluster heads, the fitness function and the base station, all the distances from the cluster is set according to the state of nodes and the position of head to the base station can be better kept to be the pre-selected cluster heads [31]. The cluster head node is minimum. responsible for the data forwarding of ordinary nodes. Therefore, the selection of cluster head should have the 𝐹5 = 𝑉𝑎𝑟(∑ 𝑑𝑖𝑠( 𝐶𝐻𝑖, 𝐵𝑆)) (26) characteristics of high energy, reasonable location and less times of becoming a cluster head. The fitness The variance of the cluster head to cluster head function of CM-OOA algorithm is designed from the Euclidean distance: variance of cluster head to cluster following six aspects: the energy of nodes, the distance head distance [34]. There is more than one cluster head between cluster heads, the distance between cluster heads node, so it is necessary to prevent the distance between and each node, the distance from cluster heads to base them from appearing some deviations that are very close stations, the variance of the distance from cluster heads and some are very far away. Therefore, by adding the to base stations, and the variance of the distance from variance of the distance from cluster head to cluster head cluster heads to cluster heads. to control the gap between cluster heads, the distribution of all cluster heads can be better maintained and more The energy level of the node itself: the reciprocal of the reasonable. remaining energy of the current node. The cluster head node is the key condition to support the network 𝐹6 = 𝑉𝑎𝑟(∑ 𝑑𝑖𝑠(𝐶𝐻𝑖, 𝐶𝐻𝑗)) (27) operation [32]. If the energy of the node is higher, the Based on the energy of nodes, the distance between CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 183 cluster heads, the distance from cluster heads to nodes, End the distance from cluster heads to base stations, the variance of the distance from cluster heads to base stations and the variance of the distance from cluster heads to cluster heads, the fitness function is designed by 4.2 Cluster establish stage weight control: In the stage of cluster establishment, in order to prevent reverse data transmission, the ordinary nodes of the base 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝛼1 ∗ 𝐹1 + 𝛼2 ∗ 𝐹2 + 𝛼3 ∗ 𝐹3 + 𝛼4 ∗ 𝐹4 + 𝛼5 ∗ 𝐹5 + station first judge whether to enter the cluster or not. Some nodes directly transmit data to the base station to 𝛼6 ∗ 𝐹6 (28) reduce the influence of "hot spot effect" in the network. By comparing the fitness values of cluster head nodes, In the formul hierarchical analysis method to calculate the appropriate cluster head nodes are selected. The the weightsa, α1, α2, α3, α4 and α5 are weight factors clustering algorithm is shown in Algorithm 4. and satisfy ∑ αi = 1. Through the fitness function value of cluster head nodes, According to the improved fitness function, the fitness the preselected cluster head with the minimum value is functions of all osprey individuals are calculated, and the compared. That is, the cluster head node of ordinary optimal position of osprey individuals is selected. The nodes. The fitness function of this algorithm is as algorithm flow is shown in Algorithm 3. follows: Algorithm 3: Pseudo-code of cluster head node selection algorithm. F = 𝛽1 ∗ E + 𝛽2 ∗ dis(N, CH) (29) Algorithm 3: Select the cluster head according to the In the formula, dis (N, CH) denotes the Euclidean improved fitness function. distance from the common node to the head node of the pre-selected cluster, and E represents the energy of the Begin: pre-selected cluster head node in the current round. β1=0.4 Initializing a network node to obtain the initialized osprey and β2=0.6 are weight factors and satisfy β1+β2=1. population position. Algorithm 4: Ordinary node cluster. Calculate the fitness value of the osprey individual, Algorithm 4: Network node cluster establishment. and keep the individual position and fitness value with Begin: the minimum fitness value. Obtaining cluster head node set from algorithm 3. While t < tmax do If Ordinary nodes satisfy the selection of cluster head nodes. By comparing the fitness values of osprey If Meet the conditions of pre-selecting cluster heads for population, the individual fish school of osprey is ordinary nodes. generated. The cluster head is put into the reselected cluster head All osprey individuals began to fish. set. Position mapping of osprey individuals after Else. fishing Ordinary nodes are put into the set of direct Update osprey position transmission base stations. if Fitness value of osprey position before fishing > End. Fitness value of osprey position after fishing Else After successful fishing, osprey individuals began Nodes directly transmit data to the base station without to eat fish. joining the cluster. Position mapping of osprey individuals after End eating fish Calculate the fitness value of the pre-selected cluster head. Ordinary nodes select the cluster head node and perform end cluster entry operation. Update The location of the new osprey population End. Update Individual osprey with minimum fitness value and fitness value t = t + 1 4.3 Node data transfer based on FA-star algorithm Return The position and fitness value of osprey with minimum fitness value The network data transmission process in this paper adopts multi-hop mode. The heuristic function of Astar 184 Informatica 49 (2025) 173-190 S. Jia et al. algorithm is optimized by the energy of nodes and the energy consumption of the network system, the number forwarding times of nodes, which avoids the problem of of dead nodes and the number of surviving nodes. The selecting nodes with the same next hop continuously. energy consumption of data fusion process is neglected, Select the most suitable data transmission path. The because the communication mode is two-way, and only clustering algorithm is shown in Algorithm 5. one communication direction is calculated for the convenience of calculating energy consumption. Because Through the heuristic function of neighbor nodes, the of the close distance between users and nodes, the energy neighbor nodes with the minimum value are compared. consumption is negligible [35]. An 800m×800m That is, the next hop node of the starting node. The experimental simulation area is drawn and 100 X-axis heuristic function of FA-star algorithm is as follows: coordinates and 100 Y-axis coordinates are randomly generated to combine into 100 nodes, and the base station F = 𝛾1 ∗ 𝐸 + 𝛾2 ∗ 𝑑𝑖𝑠(𝑁, 𝐿) + 𝛾3 ∗ 𝑑𝑖𝑠(𝐿, 𝐵𝑆) + 𝛾4 ∗ 𝐺 is located in the center of the area. From formula (20), the optimal number of cluster heads is KN = 0.04 * n. (30) The specific parameters are shown in Table 3. In the formula, E represents the energy of neighboring Table 3: Experimental parameter table. nodes, G represents the forwarding times of neighboring nodes, dis(N,L) represents the distance from the starting Parameter Numerical value node to neighboring nodes, and dis(L,BS) represents the distance from neighboring nodes to the base station. γ1, Number of network nodes 100 γ2, γ3 and γ4 are the weight influencing factors and satisfy γ1+γ2+γ3+γ4=1. Network area size 800m×800m Algorithm 5: Network data transmission Base station coordinate position (100,100) Begin: Energy loss coefficient of free 10 Pj/bit/m2 The cluster head node set and the direct transmission space model set in the acquisition algorithm 3 are merged into the Energy loss coefficient of initial node set. 0.0013 Pj/bit/m2 multipath attenuation model while Starting node ≠ base station. If Meet the condition of neighbor nodes Node initial energy 4 J Ordinary nodes are put into the set of Number of networks running neighboring nodes. 1000 rounds rounds End. If Neighbor node set is empty. 5.2 Analysis of energy change of emergency The originating node sends data directly to the communication system base station. The residual energy of wireless sensor network system Else. reflects the life cycle of emergency communication Calculate the heuristic function comparison network [36-37]. The more residual energy, the longer the communication time of emergency communication function value of neighboring nodes, select network. The network energy of the four algorithms the next hop node and transmit data. changes as a whole, as shown in Figure 11. End. End. End. 5 Experimental simulation analysis 5.1 Experimental parameters In order to examine the simulation effectiveness of CM-OOA algorithm in extending the network life cycle, the algorithms are compared and analyzed on MATLAB Figure 11: Changes of residual energy in emergency R2023b platform. The advantages of the basic algorithm communication network. LEACH algorithm and the latest cluster classification algorithms PSO-C algorithm, CGWOA algorithm and the CM-OOA algorithm in this paper are verified in terms of CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 185 Of the 1000 rounds of energy consumption, the LEACH Although CGWOA algorithm appears dead nodes later algorithm consumed all of its energy in the 250th round, than PSO-C algorithm, the localised death rate is faster, the PSO-C algorithm had 23% of its energy remaining which should not be ignored. In contrast, the CGWOA after the 1000th round, the CGWOA algorithm had 35% algorithm has better overall changes than the PSO-C and of its energy remaining after the 1000th round, the LEACH algorithms, and grows much slower than the CM-OOA algorithm still had 43% of its energy in the CM-OOA algorithm. The CM-OOA algorithm's dead 1000th round, and it consumed it slower than the other nodes grow relatively slowly, with only 13% of dead algorithms from the 0th to 1000th rounds. It can be seen nodes after 1,000 rounds. The CM-OOA algorithm that, compared with other algorithms, the CM-OOA balances the network's overall energy consumption, algorithm selects the optimal cluster head through the spreads out the energy loss to all the nodes, and prevents chaotic mapping osprey optimisation algorithm, taking the nodes from localised death and extends the duration node energy and node transmission distance as the main of emergency communication. factors, and the variance of the distance from the cluster head to the base station and the variance of the distances between the cluster heads as the auxiliary factors, and 5.4 Analysis of changes in the number of performs the cluster selection based on the information, surviving nodes in the network and performs the cluster selection based on the distance of the information transmission and the node energy in When emergency communication wireless sensor the clustering stage, instead of using a single inter-cluster network nodes are used in dangerous processes such as distance as weights, instead of using node energy and emergency rescue and disaster relief survey, they will not number of node forwards, distance from start node to be replaced frequently, and at the same time, they are neighbouring nodes and distance from neighbouring limited by the energy of nodes [39]. Therefore, for the nodes to base station. The FA-star algorithm with same environment, the more nodes survive, the fewer heuristic function can better reduce the energy dead nodes, and the longer the communication time. The consumption and extend the life cycle of emergency number of surviving nodes of the four algorithms varies communication network. from 0 to 1000 rounds, as shown in Figure 13: 5.3 Analysis of the number of dead nodes in the network The number of dead nodes in wireless sensor networks reflects the overall stability of the network. The more dead nodes, the greater the impact on the overall emergency communication network, the smaller the coverage area and the faster the death rate [38]. The number of dead nodes of the four algorithms changes, as shown in Figure 12: Figure 13: Changes in the number of surviving nodes in communication networks. After 1,000 rounds of energy consumption in the emergency communication network, it can be seen from Figure 13 that the nodes of the LEACH algorithm are almost all dead after 300 rounds, and the nodes of the PSO-C algorithm have 33% active nodes remaining after 1,000 rounds [40]. The CGWOA algorithm, after experiencing a slow decline, slowly tends to be stable after 550 rounds, until only 50% active nodes remain after 1,000 rounds. After 1000 rounds, the CM-OOA algorithm still has 8% nodes, which improves the time of information communication, good stability, suitable for Figure 12: Changes in the number of dead nodes in information data collection in special environments, and communication networks. gives full play to the optimisation ability of the CM-OOA algorithm. The improvement of fitting function further In Figure 12, the various algorithms start to show dead optimises the accuracy and efficiency of cluster head nodes after 30 rounds, the LEACH algorithm clearly election. FA-star algorithm reduces the energy shows dead nodes after about 35 rounds, and almost all consumption of cluster heads in inter-cluster route nodes die after 300 rounds, while the PSO-C algorithm's construction, avoids the premature death of cluster heads, rate of dead nodes grows faster. and gives full play to the sensor's ability to transmit information in the whole network. 186 Informatica 49 (2025) 173-190 S. Jia et al. 5.5 Comparative analysis of node data transmission delay Another key criterion is the network transmission delay. This is highly dependent on the distance between nodes in the transmission path. In the same experimental setting, this paper compares the network delay by the average transmission distance of the network of nodes. The average transmission distance of four algorithms from 0 to 1000 rounds of data transmission is shown in Figure 14. Figure 15: A comparison of the average transmission distance of nodes is presented. A comparison of the average transmission distance for each 100-round interval in Figure 15 reveals that the average transmission distance of the CM-OOA algorithm is 80% less than that of the CGWOA protocol. Furthermore, the average transmission distance of the CGWOA protocol is 90% less in the initial stages and 30% less in the subsequent stages than that of the PSO-C protocol. The transmission distance of the LEACH protocol is zero due to the death of all nodes after 400 rounds. Figure 14: The average variation in node transmission distance per round. 5.6 Comparison of results of surviving nodes Figure 14 shows that the average transmission distance in areas of different sizes of Leach protocol is greater than the other protocols. In contrast, the CM-OOA algorithm has the lowest Equation (2) with the data from the experimental transmission distance profile and the average environment allows the calculation of the thresholds for transmission distance is lower than the other protocols. the two types of communication, and the number of The comparison of the average transmission distance in surviving nodes after 0, 500 and 1000 rounds of data every 100 rounds is presented in Figure 15. Table 4: Comparison of the number of surviving nodes in different rounds. Area size r o u n d LEACH CGWOA CM-OOA PSO-C 0r 100 100 100 100 1000*1000 500r 0 45 83 30 1000r 0 35 61 17 0r 100 100 100 100 800*800 500r 0 60 93 55 1000r 0 50 88 32 0r 100 100 100 100 600*600 500r 28 76 97 81 1000r 0 63 90 61 transmission is comparatively analyzed for three different cause of this phenomenon is the rise in the average geographical regions: a 1000*1000 area (characterised by number of hops traversed by data packets on their a high percentage of multi-path fading communication transmission path, coupled with the expansion of the methodology), an 800*800 area (where the percentage of distance between nodes within a cluster. This results in both communication methodologies is approximately an exponential growth in the energy expenditure equal) and a 600*600 area (where the percentage of free associated with data transmission. space communication methodology is high). The results The number of nodes directly reflects the life cycle of the are presented in tabular form in Table 4. network. In larger area networks, cluster heads further As illustrated in Table 4, the expansion of the working away from the base station die quickly. When there are area of the wireless sensor network is associated with a 100 nodes in a 600*600 area, the CM-OOA algorithm reduction in the network's overall life cycle. The primary has 90 surviving nodes after 1000 rounds of data CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 187 transmission, which is a 27% improvement in the number the distance to clusters, the frequency of base station of surviving nodes compared to the CGWOA algorithm heads and cluster heads, etc. CM-OOA algorithm is used and a 29% improvement in the number of surviving to update the population and select the best individual nodes compared to the PSO-C algorithm. When the based on the fitness value, which has the advantage of number of nodes in 800*800 area is 100, after 500 global search convergence and balance the consumption rounds of data transmission, the number of surviving of network energy in each cluster. In the inter-cluster nodes of LEACH algorithm, CGWOA algorithm, routing stage of communication, the heuristic function CM-OOA algorithm and PSO-C algorithm decreases by based on Astar algorithm is used to reduce the 28%, 16%, 4% and 26% respectively compared to that of consumption of energy cluster head nodes and alleviate 600*600 area, and the number of surviving nodes of the hot spot effect. The analysis results show that the CM-OOA and CGWOA algorithms are relatively stable. algorithm reduces the node mortality and the maximum However, in the 1000*1000 region, the CM-OOA number of surviving nodes in the whole energy algorithm only reduces the number of surviving nodes by consumption network, which effectively improves a part 39% after 1000 rounds of data transmission. In addition, of the life cycle network. the CM-OOA algorithm has 61 surviving nodes after In this manuscript, the CM-OOA algorithm only 1000 rounds of data transmission in the 1000*1000 range, considers the energy consumption of emergency which is a 26% increase in the number of surviving communication and does not consider network security. nodes compared to the CGWOA algorithm. In the In the next step, we will continue to optimize the CM-OOA algorithm, the central selection of cluster head algorithm and optimize the security of this algorithm as nodes and the use of multi-hop transmission further much as possible. Prevent malicious attacks on nodes, prolong the network life cycle. Therefore, the CM-OOA which can cause energy consumption and data theft of algorithm has the longest network life cycle, which nodes, so that network information security is guaranteed proves that the scalability and stability of the CM-OOA to a certain extent. The algorithm is combined with the algorithm is much better than the other four algorithms. practical situation and applied in the real emergency communication network. 6 Conclusion Availability of data and materials In this manuscript, an optimization algorithm based on This paper proposes an emergency communication chaotic mapping osprey optimization is proposed to algorithm for wireless sensor networks based on chaos prolong the duration of emergency communication by mapping and osprey optimization. The specific reducing energy consumption. The fitness function is information of the paper can be exchanged with the improved by node energy, the distance between cluster author. heads, the distance from cluster heads to nodes, the distance from cluster heads to base stations, the variance of the distance from cluster heads to base stations and the Conflict of interest variance of the distance between cluster heads. CM-OOA algorithm updates the position of the best individual The authors confirm that the content of this article has no based on fitness value, giving full play to the advantages conflict of interest. of global search and convergence, and balancing the network energy consumption in each cluster. In the inter-cluster routing communication stage, FA-star Acknowledgement algorithm based on heuristic function is used to reduce the energy consumption of cluster head nodes. This research study is supported by the Smart Teaching The location of cluster head node in this algorithm is Special Project for Undergraduate Institutions in Henan more reasonable and the energy consumption of data path Province, the General Project of Education Science transmission is lower. Compared with LEACH, PSO-C Planning in Henan Province (Research on Software and CGWOA. Through the comparative analysis of the Engineering Talent Training Mode under the Integration results, the energy consumption of the whole network is of New Engineering and OBE Concept, 2023YB0174), reduced. The number of surviving nodes in the network the Undergraduate Industry Education Integration is the largest, which effectively improves the life cycle of Research Project in Henan Province, the Graduate the emergency communication network. Education Reform Project in Henan Province (2023SJGLX300Y), the New Engineering and New Format Textbook Project for Undergraduate Institutions in Henan Province, the Graduate Education Reform and Quality Improvement Project of Nanyang Normal 7 Discussion University (2023ZLGC06), and the Research Projects of In this study, the energy consumption of wireless sensor Nanyang Normal University (2025STP009, 2025STP01 networks is deeply discussed through the three-layer 0). network model, and an energy-efficient clustering algorithm based on Osprey optimization and heuristic path is proposed. The osprey optimization algorithm can improve the energy of nodes, the distance between cluster heads, the distance from cluster heads to nodes, 188 Informatica 49 (2025) 173-190 S. Jia et al. References Algorithm-based approach". in Proc. 14th International Conference on Computing [1] Kapoor Leena Kohli et al. (2023)., "Satellite Wi-Fi Communication and Networking Technologies. Terminal for Post-Disaster Emergency https://dx.doi.org/10.1109/ICCCNT56998.2023.103 Communication Management". in Proc. 2023 07636 International Conference on Computer, Electrical [11] Muntather et al. (2024)., "Chaotic Grey Wolf and Communication Engineering. Optimization for Energy-Efficient Clustering and https://dx.doi.org/10.1109/ICCECE51049.2023.100 Routing in Wireless Sensor Networks". in Proc. 2nd 85637 International Conference on Integrated Circuits and [2] K. Viswavardhan Reddy and N. Kumar (2021)., Communication Systems. "SNR based Energy Efficient Communication https://dx.doi.org/10.1109/ICICACS60521.2024.10 Protocol for Emergency Applications in WBAN", 499088 International Journal of Advanced Computer [12] V.Akshay, K. Sunil, G. Prateek Raj, R. Tarique and Science and Applications, vol. 12, no. 9, pp. K. Arvind (2023)., "Enhanced Cost and Sub-epoch 268-275. Based Stable Energy-Efficient Clustering Algorithm https://dx.doi.org/10.0.56.233/IJACSA.2021.01209 for Heterogeneous Wireless Sensor Networks", Wire. 30 Pers. Comm., vol. 131, no. 4, pp. 3053-3072. [3] Al Aghbari Zaher, Pravija Raj. P. V., Mostafa https://dx.doi.org/10.0.3.239/s11277-023-10601-2 Reham R. and Khedr Ahmed M (2024)., [13] D. Rahul and D. Mond (2024)., "Cluster head "iCapS-MS: an improved Capuchin Search selection and malicious node detection using Algorithm-based mobile-sink sojourn location largescale energy-aware trust optimization optimization and data collection scheme for algorithm for HWSN", J. of Reli. Inte. Envi., vol. 10, Wireless Sensor Networks", Neural Computing and no. 1, pp.55-71. Applications, vol. 36, no. 15, pp. 8501-8517. https://dx.doi.org/10.0.3.239/s40860-022-00200-6 https://dx.doi.org/10.0.3.239/s00521-024-09520-5 [14] Pal. Raju, S. Mukesh, K. Sandeep, N. Anand and R. [4] H. Wendi Rabiner et al. (2000)., "Energy-Efficient Pushpendra Kumar (2024), "Energy efficient Communication Protocol for Wireless Microsensor multi-criterion binary grey wolf optimizer-based Networks". Proceedings of the 33rd Annual Hawaii clustering for heterogeneous wireless sensor International Conference on System Sciences, Maui, networks", Soft Comp., vol. 28, no. 4, pp. Hi, USA, pp.10. 3251-3265. https://dx.doi.org/10.1109/HICSS.2000.926982 https://dx.doi.org/10.0.3.239/s00500-023-09316-0 [5] S. Madhvi, S. Aarti and R. Shefali. (2024). "An [15] Technologies for wireless sensor networks, by R. Approach to Increase the Lifetime of Traditional Khanna, Yi qian, G. Pisharody, R. Arvind, Jiejie LEACH Protocol Using CHME-LEACH and Wang, Laura M. Rumbel, Christopher. R., Carlson, CHP-LEACH", Lecture Notes in Networks and Jennifer, M. Williams. and P. Adu Agyeman. (2024, Systems, vol. 868, pp. 133-145. Apr 18). Patent A1 20240130002. https://dx.doi.org/10.1007/978-981-99-9037-5_11 [16] Jasim Mohammad Omer K. and Salih Bassim M [6] J. Suman, K. Shyamala, G. Roja and N. Pranay. (2024). "Improving Task Scheduling in Cloud (2023). "Testbed Implementation Datacenters by Implementation of An Intelligent of MAX LEACH Routing Protocol and Sinkhole Scheduling Algorithm". Informatica (Slovenia), vol. Attack in WSN". Lecture Notes in Networks and 48, no.10, pp.77-88. Systems, vol. 612, pp. 153-162. https://doi.org/10.31449/inf.v48i10.5843 https://dx.doi.org/10.1007/978-981-19-9228-5_14 [17] Suhag. Sumit and Aarit (2024)., "Challenges and [7] Gülbaş, Gülşah and Çetin, Gürcan (2023)., Potential Approaches in Wireless Sensor Network "Lifetime Optimization of the LEACH Protocol in Security", Journal of Electrical Engineering and WSNs with Simulated Annealing Algorithm", Technology, vol. 19, no. 4, pp. 2693-2700. Wireless Personal Communications, vol. 132, no. 4, https://dx.doi.org/10.0.3.239/s42835-023-01751-1 pp. 2857-2883. [18] Heidari Ehsan (2024)., "A novel energy-aware https://dx.doi.org/10.0.3.239/s11277-023-10746-0 method for clustering and routing in IoT based on [8] Mishra, Rashmi and Yadav, Rajesh K. (2023)., whale optimization algorithm & Harris Hawks "Energy Efficient Cluster-Based Routing Protocol optimization", Computing, vol. 106, no. 3, pp. for WSN Using Nature Inspired Algorithm", 1013-1045. Wireless Personal Communications, vol. 130, no. 4, https://dx.doi.org/10.0.3.239/s00607-023-01252-z pp. 2407-2440. [19] Wireless sensor system, wireless terminal device, https://dx.doi.org/10.0.3.239/s11277-023-10385-5 communication control method and communication [9] N. M. Latiff Abdul et al (2007). "Energy-Aware control program, by M. Funaki, Y. Tanaka, D. Clustering for Wireless Sensor Networks using Murata and T. Yamamoto. (2024, Mar 12). Patent Particle Swarm Optimization". in Proc. 2007 IEEE B2 11930431. 18th International Symposium on Personal, Indoor [20] Jaiswal K. and Anand V. (2024)., "ESND-FA: An and Mobile Radio Communications. Energy-Efficient Scheduled Based Node https://dx.doi.org/10.1109/PIMRC.2007.4394521 Deployment Approach Using Firefly Algorithm for [10] B. Komuraiah, Bollena. Navya. and Jhanvitha. B. Target Coverage in Wireless Sensor Networks", (2023). "Enhanced Lifetime with less energy International Journal of Wireless Information consumption in WSN Using a Genetic CM-OOA:An Energy-Efficient Clustering Algorithm for Wireless… Informatica 49 (2025) 173–190 189 Networks, vol. 31, no. 2, pp. 121-141. "Energy aware clustering protocol using chaotic https://dx.doi.org/10.0.3.239/s10776-024-00616-2 gorilla troops optimization algorithm for Wireless [21] K. Rasidul, Z. Mehboob, De. Debashis. and Das. Sensor Networks", Multimedia Tools and Abhishek. (2024)., "MKFF: mid-point K-means Applications, vol. 83, no. 8, pp. 23853-23871. based clustering in wireless sensor network for https://dx.doi.org/10.0.3.239/s11042-023-16487-3 forest fire prediction", Microsystem Technologies, [31] Vikhyath. V. K. and Achyutha Prasad. A. p.(2023)., vol.30, no.4, pp.469-480. "Optimal Cluster Head Selection in Wireless Sensor https://dx.doi.org/10.0.3.239/s00542-023-05578-8 Network via Combined Osprey-Chimp [22] L. Marcin, M. Lazaros, Baptista. Murilo S. and Optimization Algorithm: CIOO", International Volos. Christos Source (2024)., "Discrete Journal of Advanced Computer Science and one-dimensional piecewise chaotic systems without Applications, vol. 14, no. 12, pp. 401-407. fixed points", Nonlinear Dynamics, vol. 112, no. 8, https://dx.doi.org/10.0.56.233/IJACSA.2023.01412 pp.6679-6693. 41 https://dx.doi.org/10.0.3.239/s11071-024-09349-6 [32] Shakil Ahmed et al. (2023)., "Sky's the Limit: [23] B. A. Omar, C. D. Zaineb, B. Slim and Ben Said. L. Navigating 6G with ASTAR-RIS for UAVs Optimal (2024)., "Many-objective optimization of wireless Path Planning". in Proc. 28th IEEE Symposium on sensor network deployment", Evolutionary Computers and Communications: Computers and Intelligence, vol. 17, no. 2, pp. 1047-1063. Communications for the Benefits of Humanity. https://dx.doi.org/10.0.3.239/s12065-022-00784-1 https://dx.doi.org/10.1109/ISCC58397.2023.102180 [24] K. Neethu, Sundar G. Naveen and Narmadha D. 58 (2024)., "Vector Based Genetic Lavrentyev [33] K. Fransen and J. Van Eekelen (2023)., "Efficient Paraboloid Network Wireless Sensor Network path planning for automated guided vehicles using Lifetime Improvement", Wireless Personal A* (Astar) algorithm incorporating turning costs in Communications, vol. 134, no. 4, pp. 1917-1944. search heuristic", International Journal of https://dx.doi.org/10.0.3.239/s11277-024-10906-w Production Research, vol. 61, no. 3, pp. 707-725. [25] Dinesh. K., and SVN. Santhosh Kumar (2024)., https://dx.doi.org/10.0.4.56/00207543.2021.201580 "GWO-SMSLO: Grey wolf optimization-based 6 clustering with secured modified Sea Lion [34] Kusuma Purba D. and H. Faisal Candrasyah (2024)., optimization routing algorithm in wireless sensor "Enriched Coati Osprey Algorithm: A Swarm-based networks", Peer-to-Peer Networking and Metaheuristic and Its Sensitivity Evaluation of Its Applications, vol. 17, no. 2, pp. 585-611. Strategy", IAENG International Journal of Applied https://dx.doi.org/10.0.3.239/s12083-023-01603-9 Mathematics, vol. 54, no. 2, pp. 277-285. [26] Sakhri, A. Arsalan, M. Maimour, M. Kherbache, E. [35] Dinesh. K. and Santhosh. Kumar. S. V. N. (2024)., Rondeau and N. Doghmane (2024)., "A digital "Energy-efficient trust-aware secured neuro-fuzzy twin-based energy-efficient wireless multimedia clustering with sparrow search optimization in sensor network for waterbirds monitoring", Future wireless sensor network", International Journal of Generation Computer Systems, vol. 155, no. 6, pp. Information Security, vol. 23, no. 1, pp. 199-223. 146-163. https://dx.doi.org/10.0.3.239/s10207-023-00737-4 https://dx.doi.org/10.0.3.248/j.future.2024.02.011 [36] P. Ikkurthi Bhanu, G. Saumitra, Yogita, Y. Satyendra. [27] Ramya. R. and Padmapriya. K. (2023)., "An Singh and Pal. Vipin. (2024)., "HCM: a hierarchical implementation of energy efficient fuzzy-optimized clustering framework with MOORA based cluster routing in wireless sensor networks using Particle head selection approach for energy efficient Swarm Optimization (PSO) and Whale wireless sensor networks", Microsystem Optimization Algorithm (WOA)", Journal of Technologies, vol. 30, no. 4, pp. 393-409. Intelligent and Fuzzy Systems, vol. 44, no. 1, pp. https://dx.doi.org/10.0.3.239/s00542-023-05508-8 595-610. [37] Asaad Alhijaj, Baida’a Abdul Qader Khuder, Imad https://dx.doi.org/10.0.12.161/JIFS-220963 Alshawi (2022)., "Fuzzy Data Aggregation [28] Preethi. R (2024)., "Assault Type Detection in WSN Approach to Enhance Energy-Efficient Routing Based on Modified DBSCAN with Osprey Protocol for HWSNs". Informatica (Slovenia). Optimization Using Hybrid Classifier LSTM with vol.46, no. 7, pp.45-47. XGBOOST for Military Sector", Optical Memory https://doi.org/10.31449/inf.v46i7.4272 and Neural Networks (Information Optics), vol. 33, [38] Ustun. Deniz., Erkan. U., Toktas. Abdurrahim., Lai no.1, pp.53-71. Qiang and Yang liang (2024)., "2D hyperchaotic https://dx.doi.org/10.0.12.31/S1060992X24010089 Styblinski-Tang map for image encryption and its [29] Khudor Baida’a Abdul Qader, Hussein Dheyaa hardware implementation", Multimedia Tools and Mezaal, Kheerallah Yousif Abdulwahab, Alkenani Applications, vol. 83, no. 12, pp. 34759-34772. Jawad and Alshawi Imad S. (2023). "Lifetime https://dx.doi.org/10.0.3.239/s11042-023-17054-6 Maximization Using Grey Wolf Optimization [39] N. Meenakshi, S. Ahmad, A. V. Prabu, J. Nageswara Routing Protocol with statistical Technique in Rao, N. A. Othman, Hikmat A. M. Abdelijaber, R. WSNs". Informatica (Slovenia), vol. 47, no. 5, pp. Sekar and J. Nazeer (2024)., "Efficient 75-82. Communication in Wireless Sensor Networks Using https://doi.org/10.31449/inf.v47i5.4601 Optimized Energy Efficient Engroove Leach [30] S. Deena, Devi. S. Suganthi. and Nalini. T. (2024)., Clustering Protocol", Tsinghua Science and 190 Informatica 49 (2025) 173-190 S. Jia et al. Technology, vol. 29, no. 4, pp. 985-1001. https://dx.doi.org/10.0.103.231/TST.2023.9010056 [40] Ariffin Nur Izzaty et al (2023). "Internet of Things Intercommunication Using SocketIO and WebSocket with WebRTC in Local Area Network as Emergency Communication Devices". in Proc. 8th International Conference on Software Engineering and Computer Systems. https://dx.doi.org/10.1109/ICSECS58457.2023.102 56297 https://doi.org/10.31449/inf.v49i12.7838 Informatica 49 (2025) 191–206 191 An Integrated Framework for Data Security Using Advanced Machine Learning Classification and Best Practices Peng Wang1*, Ningping Yuan2 and Yong Li1 1Inner Mongolia Power Research Institute, Hohhot City, 010010, China 2Inner Mongolia Medical University, Hohhot City, 010110, China E-mail: wangpeng9493@163.com Keywords: Data security, classification techniques, support vector machines, neural networks, decision trees, best prac- tices, data protection, access control Received: Dec 17, 2024 In the current interconnected digital environment, data security has become a paramount concern, as cy- berattacks and data breaches are increasing in frequency and complexity. Both organizations and people face challenges in safeguarding sensitive information, requiring resilient security systems that can adjust to various threats. This paper presents a comprehensive approach to data security, focusing on integrating advanced classification techniques and best practices to secure data proactively. This study uses and an- alyzes advanced classification algorithms like decision trees, support vector machines (SVM), and neural networks to determine how well they work to find, sort, and keep sensitive data safe across various security needs. The results indicate substantial improvements in classification accuracy, with the optimal model at- taining an accuracy rate of 98.83%. The other models, including decision tress and SVM provide 89% and 92% accuracy, respectively. This highlights the dependability and resilience of these methods in detecting possible security concerns across various datasets. In addition to these classification results, we compre- hensively analyze industry best practices in data security, encompassing encryption technologies, dynamic access control, and continuous monitoring to mitigate vulnerabilities and improve threat detection. Inte- grating sophisticated classification methodologies with these optimal practices provides a comprehensive security framework that enhances data protection and mitigates risk. This study offers significant insights for practitioners and organizations aiming to implement a more systematic and efficient data security ap- proach, enhancing academic and practical discussions in this domain. This work seeks to strengthen the effectiveness of data security practices by introducing a novel method that integrates high-accuracy cate- gorization with proactive security protocols. Povzetek: Predstavljen je celovit pristop varnosti podatkov, ki integrira napredne klasifikacijske tehnike, kot so nevronske mreže in podporni vektorji, z najboljšimi praksami za zaščito podatkov ter izboljšanje kvalitete. 1 Introduction Each layer has specific functions—to support protection against unauthorized access and data integrity [6]. Losing Data security is becoming increasingly important today, millions of its users and cyber incidents inspired the need impacting industries, governments, and individuals [1]. for effective and flexible data protection models that can These developments have led to an explosion of data given address traditional and novel threats [7]. Conventionally the use of the internet, cloud storage, and systems, therefore used methods in data protection are based on determinis- making data security paramount to risky exercises whose tic models and rule-based systems, which are inadequate forms of data need protection against unfair exploitation or in addressing new threats that evolve to counter security unauthorized access [2]. Computer and internet crimes are mechanisms adopted [8]. Therefore, this study aims to fill becoming complex, and information security and individ- these gaps by proposing an enhanced multi-classification uals at all levels of the economy and society are at risk. approach that elevates the existing security practices of as- Analyses prove that the total cost of cybercrime will be in sessment by integrating classification techniques with best the trillions within a few years, thus the importance of ef- security practices [9]. As this research feeds into mod- ficient data protection plans [3]. Data protection solutions ern theories on data classification, it is hoped that the gaps are vital in allowing the privacy and confidentiality of data, in the currently existing data security frameworks will be but implementations and controls are inadequate and vul- filled and that a solution to the security of sensitive data nerable [4]. will be provided [10] [11]. Several data security methods Data security can be discussed in terms of data encryp- exist, including encryption, access control, monitoring, and tion, access control, monitoring, classification, etc [5]. 192 Informatica 49 (2025) 191–206 P. Wang et al. classification. However, classification is a form of secu- advanced classification techniques provides only best prac- rity designed as the initial stage and not a single method. It tice, which is not sufficiently technically sound. Moreover, marks and classifies sensitive data implementation as being research inclined to depict ideal procedures does not con- the right security measures [12]. sider how rapidly these procedures can be implemented to counter threats, especially in sectors that experience high 1.1 Research gap levels of cyberattacks and data breaches [20]. A signifi- cant limitation of earlier works is the absence of a compre- Several gaps exist in the current methods, especially in hensive framework integrating classification methods with data classification with sensitivity-based protections. The proactive best practices [21]. In response to these limita- framework is essential in defining classification and data tions, this research suggests a general framework data secu- prioritization, which helps determine the security levels that rity solution suitable for various scenarios and best bridges must be applied to data [13]. However, most conventional the technology and practice divide. techniques or moves for classification are confining and bring a high impact of variability, which is ordinary and 1.3 Challenges in data security can hardly provide a suitable and comprehensive solution for the large and ever-changing environments today [14]. Several challenges can be identified, significantly compli- Most existing models are either prescriptive or unable to cating the development and application of measures for adapt dynamically to new forms of threats, thus posing a protecting data. First, one of the main trends is the con- risk for organizations [15]. The second central area is com- stantly growing complexity and the active response to cy- bining classification methods with data security standards. ber threats. Unlike ordinary threats, which are more or less Thus, the position is that although the classification con- easily recognizable, new threats are much less easy to un- cept offers the first layer of data security, the idea is far derstand, and any static measures are useless. Computer from complete. Encryption, access control, real-time mon- criminals use sophisticated procedures to take advantage of itoring, and continually running vulnerability tests are the flaws, with their strategies evolving quickly due to emerg- complete practices needed to protect data at the advanced ing security methods. This requires a security system that level [16]. However, in many cases, research has been con- will address these emerging threats and be proactive to any ducted to develop classification techniques and best prac- other threats that may arise [22]. tices independently while lacking a coherent one, including The next major problem is the ability to classify and both. This gap implies a lack of integration of classification prioritize data depending on its classification requirement. data with proactive recurring measures, which will enable Companies deal with vast volumes of data, which differ in a better systematic response to data security problems [17]. sensitivity. That is why proper segregation and protection of the data are significant. However, the conventional clas- 1.2 Limitations of previous studies sification approaches are ineffective when measuring the amount and variety of information processed in organiza- Several studies have been done on data security; these tions today. Also, organizations have always encountered works offer pioneering notions on different interventions the compelling problem of security and unavailability. Se- of data security; nevertheless, several downsides hamper curity policies must protect against invasion by unautho- their applicability to contemporary security environments. rized personnel and allow authorized individuals to get the Most of the works describing the performance of the classi- required information. Only security frameworks that can fication techniques focus on the raw classification accuracy enable differential access controls depending on the sensi- without considering such aspects as interpretability, com- tivity of the data and the type of user can achieve this bal- putational cost, and flexibility [18]. While models trained ance, which is typically difficult to do when using conven- in simulation perform well in their specific scenarios, their tional security mechanisms. applicability sharpens when exposed to field data with intri- Using ML and other superior algorithms also poses an- cate structures and dynamic threats. Furthermore, the pri- other problem regarding computations, interpretability, and mary focus on objective measures such as accuracy could model shifts over time. The learning parameters of ML al- not fully meet the challenges of protecting data in the real gorithms require constant updates for their efficiency, par- world [19]. ticularly when it comes to dynamic threats. These chal- Meanwhile, research that concerns data security mea- lenges show the need for an all-encompassing regime in sures and proper protocols based on current and improved data security to meet advanced threats that have evolved practices involves encoding techniques, security accesses over the years without compromising the system’s ease, and policies, and conformance to prescribed rules and laws. adaptability, and robustness. Although these practices are essential, they are used sep- arately from technical classification techniques, and thus, 1.4 Motivations for the study security is fragmented. This separation can be problematic because while technical classification without best prac- This research was undertaken due to the absence of an ap- tices means only gaps in coverage, best practice without propriate data security model that would also factor in the An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 193 benefits of better classification systems. Thus, as data is study conducts a sensitivity analysis to test the robust- present in all industries and constantly evolving, new and ness of classification outcomes under various param- more complex threats arise, and a highly detailed and flex- eter settings. This analysis adds depth to the study by ible security model is needed. It is known that decision demonstrating the model’s adaptability to different or- trees, support vector machines (SVM), and neural networks ganizational requirements and security scenarios. improve data classification, which is an integral part of deploying security resources by making existing methods more practical. Through these techniques, this study ex- 1.6 Structure of the paper pects to enhance the precision of data categorization to help The remainder of this paper is structured as follows. Sec- organizations direct their resources and efforts to protect the tion 2 provides a comprehensive review of existing lit- most vulnerable data. erature on data security, focusing on advanced classifica- This work also recognizes that the principles of data pro- tion techniques and best practices. Section 3 details the tection entail other types of data protection, such as encryp- methodology, including data collection, model selection, tion, access control, and real-time monitoring. All these and the integration of best practices into the proposed se- are essential data security practices and perhaps mandatory curity framework. Section 4 presents the results, including co-features of technical classification schemes. This study model performance metrics and sensitivity analysis find- aims to solve both the theoretical and practical problems ings. Section 5 discusses the implications of the study, with of data security by suggesting a more logical and consis- a focus on practical applications and limitations. Finally, tent framework for data security than has been used before. Section 6 concludes the paper and offers suggestions for This will be done using complicated classification methods future research. and step-by-step ways to explain the security solution. 1.5 Novel contributions of the study 2 Literature review This research makes several novel contributions to data se- Thapa and Camtepe, [23], whose work focuses on preci- curity by presenting an integrated framework that combines sion health systems, discussed the necessity, barriers, and advanced classification techniques with industry best prac- data security and privacy strategies. Their study also em- tices. The unique contributions of this study are as follows: phasized that precision health, which provides care based on patient-specific information related to genes, microbes, 1. Advanced Classification Techniques: This study behaviors, and environment, and digital records, includ- evaluates the effectiveness of various classification ing omics, depend on technology like machine learning algorithms, including decision trees, SVM, and neu- algorithms for data processing and electronic gadgets for ral networks, in accurately categorizing sensitive data data capture. They brought attention to the high risk of across different sensitivity levels. By rigorously test- leakage since health data contains susceptible information ing these techniques, this study identifies models that about an individual, including identity and medical condi- offer high accuracy, with the most effective model tions and interactions between health data centers. This achieving an accuracy rate of 98.83%. type of breach can result in personal damage. The indi- 2. Integration with Best Practices: Unlike traditional vidual may be bullied at work, face discrimination at the studies that focus exclusively on either technical or place of work, or even higher insurance charges, thus mean- procedural aspects of data security, this study inte- ing privacy and security counts. They examined conform- grates advanced classification techniques with secu- ing to government legislation and the ethical concerns and rity best practices, such as encryption standards, ac- requirements that ethics committees highlight for protect- cess control protocols, and continuous monitoring. ing healthcare data to keep the public engaged in precision This integration provides a holistic security frame- health efforts. Their study showed that people’s buy-in of work that addresses technical and operational security data sharing depends highly on safety, privacy, and proper requirements. use of that data. To address these challenges, they described multiple secure and privacy-preserving machine learning 3. Adaptability and Practicality: This study empha- techniques for implementing precision health information, sizes the adaptability of its proposed model, allowing with examples of their usage in related health initiatives. it to adjust to evolving threats. This framework is de- Finally, the study recommended the best ways to protect signed to meet the diverse security needs of organiza- precision health data. The study also provided a conceptual tions operating in rapidly changing environments by system model that can be used to check compliance, man- combining flexible classification methods with proac- age consent, and support the ethical requirements needed tive security protocols. for innovation in the healthcare field. Aslan et al. proposed a systematic evaluation [24] of the 4. Comprehensive Evaluation and Sensitivity Anal- emerging cybersecurity threats, risks, incidence, and coun- ysis: In addition to evaluating model accuracy, this termeasures to address the constant rise of cyber threats, 194 Informatica 49 (2025) 191–206 P. Wang et al. such as the usage of the internet as a result of the COVID- scribed DL as one of the critical technologies in the 4IR. 19 outbreak. Their study stressed that with the replacement DL, a subset of ML and AI, is receiving widespread recog- of the digital interaction of physical transactions, traditional nition from various industries because of its adaptability in crimes have shifted more towards the cyber domain, and large datasets and its utility in healthcare, vision, natural the current and emerging technologies like cloud, IoT, and language processing, and protection. He also added that DL cryptocurrencies modify new security dimensions. The au- has its roots in artificial neural networks and is now crucial thors stressed that in cyber attack campaigns, the adversary in solving other real-world problems. Due to the dynamism uses automated tools and releases ‘cyber attacks as a ser- of data and the complexity of real-world issues, it has been vice’ to achieve maximum effect, and the newly identified challenging to develop effective DL models. Additionally, threats exploit hardware, software, and communication lay- most deep learning systems are black boxes, which prevents ers. They have reviewed generalized forms of cyber attacks standardization and widespread use of these systems. The such as DDoS, phishing, man in the middle, and malware research described a precise classification of DL methods attacks and noted that traditional layers of protection like for distinguishing between supervised, unsupervised, and firewalls and antivirus are not very useful in tackling cur- mixed learning methods for determining the practical ap- rent complex threats. They highlighted the emerging need plication of DL. Further, he discussed other works that suc- for new solutions that embrace superior and enhanced de- cessfully applied DL and showed that DL can be effectively tection solutions and preventive measures. They reviewed used in various contexts. To inform the next steps in the de- the latest trends in technological approaches, including ma- velopment of DL, the author outlined ten critical directions chine learning, deep learning, cloud computing-based big for future research that are targeted at enhancing model in- data, and blockchain; all of them were suggested as poten- terpretability, plasticity, and performance. This large-scale tial approaches to detect and prevent cyber threats. They survey is also helpful for academic and industrial audiences also found that it is possible to develop machine learning who want to understand the current state and future of DL, and deep learning to identify new complex threat types, especially by emphasizing the need to increase the distinc- and through experimentation, the effectiveness of machine tiveness and development of DL approaches. learning and deep learning, when used for detecting mal- Ahmad et al. [27] also systematically reviewed cyber- ware and intrusions, can be established. However, they security issues within IoT cloud computing, including how noted that machine learning and deep learning are suscepti- cloud computing has revolutionized data storage and access ble to evasion techniques and require constant enhancement to resources for industrial uses in IoT-based cloud comput- to resist intelligent forms of cyber attacks. ing. This included making current research on cloud com- Dasgupta and Akhtar [25] systematically reviewed cy- puting by Calegari and Ometto more relevant by noting that bersecurity based on ML concerning the growing impor- their study found out that over the last decade, industries tance of protecting data, devices, and user information in shifted to cloud computing due to its flexibility, cost and the present interconnected society. They described their performance advantage. However, this has meant moving survey regarding how ML has been incorporated into cy- applications to cloud platforms, which has created a consid- bersecurity in applications like intrusion, malware, and erable security problem since conventional security is nor- biometric-based user identification. However, as they high- mally not sufficient or efficient for new cloud applications. lighted, when used in cybersecurity, the algorithm of ML is They noted that the convergence of IoT with cloud com- exposed to attacks both during the training and the testing puting has compounded these threats as the architecture of phases, which in turn does not allow for achieving the de- cloud IoT systems offers fresh concerns that necessitate se- sired results and can result in the penetration of the system curity appropriate solutions. They classified cloud security into the network. The research has undergone a system- concerns into four key categories: data security, network atic literature review of recent developments in the applica- and service security, application security and people secu- tion of ML in cyber-security between 2013 and 2018, with rity. They discussed and compared various security mat- a general understanding of cyber attacks, the correspond- ters in each category they had and discussed the limitation ing defense mechanisms, and the commonly usedML algo- from a general view, and specifically, they focused on the rithm. They also discussed ML and data mining feature ex- DL viewpoint. The study reviewed new trends that involve traction, dimensionality reduction, and classification tech- DL in dealing with cyber threats targeting IoT/cloud busi- niques, such as adversarial ML—a subdiscipline that pro- ness models, while also acknowledging different methods tects ML models against adversarial attacks. The task of have their limitations when adopted by industrial systems. their survey was to stress the existing weaknesses of current Finally, based on their review of the literature, researchers ML-based security measures related to adversarial threats suggest new ways to strengthen security using AI and DL and discuss directions for a more extensive investigation of within the cloud architecture in order to address research these risks. Lastly, they presented the existing and poten- gaps in IoT-based cloud cybersecurity [28]. tial problems and concerns in cybersecurity and provided Admass et al. [29] highlighted the current state, future research recommendations for improving the robustness of trends and advances in cybersecurity and noted the need for ML applications for this domain. cybersecurity as the world goes digital in different activi- Sarker [26], in his deep and extensive review article, de- ties. As they noted to underscore the inherent dynamism An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 195 of threats in cyberspace, more research, participation of among other things, implement broad cybersecurity poli- academic institutions, and organizational commitment re- cies, pursue deployment of robust technologies, and de- garding the protection of information systems need to be velop a cybersecurity culture. The study’s findings that PPP promoted. In their systematic review, they focused on re- and policy intervention are crucial for developing the nec- cent trends and innovations in the field of cybersecurity and essary cybersecurity framework further supported this. In described new approaches and trends that have emerged their conclusion, they also encouraged a future research di- worldwide to capture the dynamism of cyber threats. The rection to analyse new technologies and analyse human and study considered AI andML as disruptive technologies that policy factors in cybersecurity for renewable energy. Ta- can greatly help improve cyber security by being able to ble 1 summarizes the key performance metrics and method- identify threats and respond to them autonomously. How- ologies from referenced works. ever, they observed that these remain an issue to some ex- tent, especially given that threats in cyberspace are equally evolving. They also stressed the continuity of the stake- 3 Methodology holders’ interaction and suggested that future works are aimed at combining the use of innovative technologies and 3.1 Overview of the proposed framework cooperation between members of the cybersecurity envi- ronment. This work offered directions on how to build This study proposes a comprehensive framework for data capacity in cybersecurity and emerging developments that security, integrating advanced classification techniques would be necessary for new threats. with best cybersecurity practices. The methodology con- sists of four main phases: data collection and preprocess- Zhang et al. [30] explained various methodologies of ex- ing, feature extraction, classification using advanced ma- plainable artificial intelligence (XAI) in the context of cy- chine learning algorithms, and integration of best practices. bersecurity regarding the massive problems raised by the These phases enhance data security through accurate clas- ‘‘black box’’ that distinguishes conventional ML and DL. sification and adherence to security standards. The over- Given the current evolution of the Internet of Things and all workflow of the proposed framework may be viewed in other AI techniques, ML and DL are widely used in cyber- Figure 1. security, including intrusion, malware, and spam detection. Despite these recognition-based methods yielding higher accuracy and more efficiency compared to the signature- 3.2 Research questions and objectives based and rule-based methods as observed by them. They identified a major drawback of the black-box nature of ML This study addresses the following key research questions: and DL algorithms. Such explainability often leads to re- 1. How effectively can advanced machine learning (ML) duced user trust and reduced understanding of how these classification techniques integrate with cybersecurity models detect or address cyber threats, especially as the best practices to enhance data security? kind of cyber threats being witnessed continue to evolve. So, they looked at the possible weakness that could come 2. Which classification technique—Decision Trees, Sup- from trying to make things understandable and how XAI port Vector Machines (SVM), or Neural Networks— needs to be added to theories of AI-based cybersecurity provides the most accurate and robust performance for models so that people can understand them or manage cy- cybersecurity applications? bersecurity systems well. Their work also filled in an im- portant research gap by providing a thorough survey that 3. What are the benefits of incorporating real-time moni- was only focused on AI/ML-based XAI in cybersecurity. toring, encryption, and access control alongside ML This was despite the fact that XAI had been studied in other models in addressing modern cybersecurity chal- fields, like healthcare and finance. They suggested a struc- lenges? tured plan for approaching XAI in the cybersecurity field and pointed out that cybersecurity machine learning mod- The primary objective of this study is twofold: els should bemore explainable without losing performance. – To evaluate the feasibility and effectiveness of com- This survey provides the necessary background information bining ML techniques with robust security practices. for further studies by those who intend to focus on the chal- lenge of making cybersecurity AI understandable for the – To compare the performance of the proposed classi- average user [31]. fication techniques and demonstrate the practical ad- They found that AI and ML technologies offer viable so- vantages of the integrated framework. lutions for filling the new emerging security threats in re- newable energy. The study also focused on the need for 3.3 Data collection and preprocessing global cooperation and compliance of countries with inter- national guidelines on cyberspace security as critical in im- In the initial phase, data is gathered from diverse publicly proving security readiness throughout the renewable power available sources to comprehensively represent real-world industry. According to them, industry stakeholders should, cybersecurity scenarios [32]. Data is anonymized to protect 196 Informatica 49 (2025) 191–206 P. Wang et al. Table 1: Comparison of key performance metrics and methodologies from referenced works Author(s) Focus Area Key Contributions Limitations Dasgupta et al. [25] ML in Cybersecurity Surveyed ML applications in intru- Highlighted vul- sion detection and adversarial ML. nerability of ML to Proposed directions for improving adversarial attacks; robustness. lacks integration with broader security practices. Zhang et al. [30] Explainable AI Reviewed XAI methodologies for Black-box limita- (XAI) in Cybersecu- cybersecurity, emphasizing user tions of ML/DL rity trust and transparency. persist; need for practical implemen- tation strategies. Thapa and Camtepe Precision Health Proposed secureML techniques and Focused primarily [23] Data Security a conceptual model for protecting on healthcare, not health data. generalizable to other domains. Aslan et al. [24] Emerging Cyberse- Reviewed ML/DL for detecting Susceptibility of curity Threats malware and intrusions. Identified ML/DL to evasion vulnerabilities in IoT and cloud sys- techniques; lacks tems. comprehensive miti- gation strategies. Sarker [26] Deep Learning (DL) Surveyed DL methods for cyberse- DL systems often Applications curity, highlighting their adaptabil- operate as black ity and challenges in implementa- boxes, reducing tion. interpretability and standardization. Ahmad et al. [27] IoT and Cloud Cy- ExploredAI/DL-based solutions for Limited focus on bersecurity IoT-cloud models and proposed se- integrating AI solu- curity enhancements. tions with policy and regulatory frame- works. sensitive information. The dataset includes access logs, en- employs Principal Component Analysis (PCA) to reduce cryption statuses, and user authentication details. Prepro- dimensionality, retaining only essential components con- cessing includes: tributing to data variability. – Normalization: Scaling data attributes to fit a stan- dard range [33]. 3.4.1 Principal component analysis (PCA) X −Xmin PCA transforms high-dimensional data into a lower- Xnorm = (1) Xmax −Xmin dimensional space while preserving variance. The trans- formation is computed as follows: – Missing Value Imputation: Filling gaps in data through statistical techniques to avoid misclassifica- Y = X ·W (2) tion. where X is the original data matrix and W represents – Noise Reduction: Using median filtering to reduce the weight matrix of principal components. PCA reduces outliers. computational load while retaining critical information. This preprocessing step ensures data quality and reduces computational complexity, allowing the algorithms to per- 3.5 Classification techniques form accurately. The core of this methodology is the classification phase, 3.4 Feature extraction and selection where advanced machine learning algorithms are employed to categorize data based on security needs. Three algo- Feature extraction involves identifying the most relevant rithms are used: Decision Trees, Support Vector Ma- attributes to enhance classification accuracy. This study chines (SVM), and Neural Networks. Each algorithm is An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 197 Figure 1: Workflow of the proposed framework selected for its strengths in specific security scenarios. For data that is not linearly separable, SVM uses a kernel function to map data to a higher-dimensional space. The 3.5.1 Decision trees margin is optimized by minimizing: Decision Trees are highly interpretable models that use a n tree-like structure for classification. Each node represents a 1 ∑ L = ∥w∥2 + C ξi (4) decision based on an attribute, leading to branches that pre- 2 i=1 dict outcomes [34]. The algorithm’s performance is evalu- ated using Gini impurity: where w is the weight vector, C is a penalty parameter, ∑ and ξi represents slack variables. This approach enhances n the model’s robustness against misclassifications. G = 1− p2i (3) i=1 where pi is the probability of a particular class. Lower 3.5.3 Neural networks Gini values indicate better classification. Neural Networks are employed for complex pattern recog- 3.5.2 Support vector machines (SVM) nition, using multiple layers to capture non-linear re- lationships [36]. The backpropagation algorithm ad- SVMs classify data by finding a hyperplane that maximizes justs weights based on error rates, minimizing the Mean the margin between data points of different classes [35]. Squared Error (MSE): 198 Informatica 49 (2025) 191–206 P. Wang et al. – Employ Neural Network for complex, high- n 1 ∑ MSE = (yi − ŷi) 2 (5) dimensional data n i=1 – Best Practices Integration: where yi is the actual output, and ŷi is the predicted out- put. Neural Networks are particularly effective for high- – Encrypt data using keyK dimensional data and provide high classification accuracy. – Implement role-based access using access matrix A(u, r) 3.6 Integration of security best practices – Monitor for anomalies with threshold δ This framework integrates security best practices, such as encryption, access control, and real-time monitoring, to – Output: Classified secure data, threat identification complement the classification process. This algorithm combines machine learning with best – Encryption: Ensures data confidentiality through se- practices, ensuring data classification and security. cure algorithms, with all data encrypted before pro- cessing. The encryption-decryption cycle is defined by: 3.8 Validation and evaluation metrics The framework’s effectiveness is evaluated through stan- C = E(K,P ) and P = D(K,C) (6) dard metrics: where C is the ciphertext, P the plaintext, K the en- cryption key, E the encryption function, and D the – Accuracy: Proportion of correctly classified in- decryption function. stances. – Access Control: Restricts data access based on user TP + TN Accuracy = (8) roles, employing role-based access control (RBAC). TP + TN + FP + FN This model assigns permissions using access matrices, where the matrix entryA(u, r) defines permissions for – Precision and Recall: Precision measures correct user u and role r. positive predictions, while recall measures the detec- tion of actual positives. – Real-timeMonitoring: Uses anomaly detection algo- rithms to identify unusual patterns indicative of poten- tial threats. Anomalies are detected based on threshold TP TP deviations: Precision = and Recall = TP + FP TP + FN (9) δ = ∥x− µ∥ > λ (7) – F1 Score: The harmonic mean of precision and recall, where x is the current observation, µ the mean, and λ indicating the balance between these metrics. the deviation threshold. F1 = 2 · Precision · Recall3.7 Algorithm: secure classification (10) Precision + Recall framework – ROC-AUC: Measures classification performance The following algorithm outlines the steps for data security across different thresholds. An area under the ROC classification within this framework: curve close to 1.0 indicates high model performance. – Input: Dataset D, security parameters {P,K} 3.9 Comparative analysis and sensitivity – Preprocessing: Normalize data, fill missing values, reduce noise testing – Feature Extraction: Apply PCA to extract relevant The comparative analysis is aimed at comparing results of features the classification algorithms that are obtained under the in- fluence of various factors. Sensitivity analysis looks at how – Classification: much error a model returns, given that the hyperparameters are tweaked. The proposed model brings safety and flexi- – Apply Decision Tree for interpretable cases bility in managing data, the objectives of the study, where – Use SVM with kernel function for non-linear there is a need to attain high classification accuracy there separable data should be some level of security measured control. An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 199 4 Results Table 2: Confusion matrix for decision tree model 4.1 Overview of experimental setup and Predicted Positive Predicted Negative metrics Actual Positive 450 50 Actual Negative 40 460 The findings result from following a data security frame- work that combines classification measures with cyberse- curity standards. The key ratios to assess the models are 4.2.2 Support vector machine (SVM) results divided into Accuracy, Precision, Recall, F1 score, ROC- The SVM model was optimized using a radial basis func- AUC. All the measurements are related to certain aspects of tion (RBF) kernel, achieving improved accuracy over the the model’s effectiveness, and results are given in graphs, Decision Tree model. Figure 3 illustrates the metrics tables, and confusion matrix for better understanding. achieved by SVM, with an accuracy of 92%, precision of 90%, recall of 91%, and an F1 score of 90.5%. 4.2 Model performance across classification techniques The framework employed three primary classification algo- rithms: Decision Trees, Support Vector Machines (SVM), and Neural Networks, to classify data based on security needs. 4.2.1 Decision tree results The Decision Tree model provided an interpretable yet ef- fective baseline. Figure 2 shows the accuracy, precision, recall, and F1 score for the Decision Tree model, achiev- ing a consistent classification accuracy of around 89%. Accuracy = 89%, Precision = 87%, Recall = 88%, F1Fi=gu8r7e.53%: Performance metrics for the SVM model with RBF kernel The confusion matrix in Table 3 for the SVM model demonstrates a further reduction in misclassifications, indi- cating the SVM’s robustness in handling complex decision boundaries. Table 3: Confusion matrix for SVM model Predicted Positive Predicted Negative Actual Positive 460 40 Actual Negative 30 470 4.2.3 Neural network results The Neural Network, a multilayer perceptron (MLP) model, displayed the highest performance, achieving 98.83% accuracy, which aligns with the framework’s Figure 2: Performance metrics for the decision tree model novel contribution toward accurate classification. Metrics for the Neural Network model (Figure 4) include a preci- The confusion matrix for the Decision Tree model (Ta- sion of 98.5%, recall of 98.6%, and F1 score of 98.55%. ble 2) displays the model’s classification performance The confusion matrix in Table 4 further validates the across different classes, indicating a strong ability to distin- Neural Network’s high classification capability, with min- guish true positives and negatives, though occasional mis- imal false positives and false negatives, indicating near- classifications occurred in borderline cases. perfect distinction between classes. 200 Informatica 49 (2025) 191–206 P. Wang et al. score of 98.55%. This makes it highly appropriate where it is crucial that both false positives and false negatives be kept to the barest level possible, especially for applica- tions such as fraud detection and cybersecurity threat eval- uation. At a reasonable intersection of the F1 score equal to 90.5%, SVM turns into a worthy trade-off option for ap- plications with a reasonable amount of computational re- sources necessary for mid-sized datasets’ anomaly detec- tion. On the other hand, the low F1-score of the deci- sion tree of just 87,5% demonstrates the model’s usefulness in cases where speed and comprehensible decision-making are valued more than accuracy, such as the preliminary data sorting in security systems. 4.4 Sensitivity analysis and robustness of the neural network model Figure 4: Performance metrics for the neural network model Sensitivity analysis was conducted on the Neural Network model to evaluate its robustness across different hyperpa- Table 4: Confusion matrix for neural network model rameters. Figure 6 shows the effect of varying the learning rate on model accuracy, illustrating optimal performance Predicted Positive Predicted Negative at a learning rate of 0.01. The model displayed resilience, Actual Positive 495 5 maintaining high accuracy across learning rates, thoughmi- Actual Negative 3 497 nor fluctuations occurred with extreme values. 4.3 Comparative analysis of classification algorithms Table 5 provides a summary of key performance metrics across all three algorithms. The Neural Network model achieved the highest scores, indicating its effectiveness for data security applications. Figure 5 presents a bar chart comparing the accuracy of all three models. Figure 6: Sensitivity analysis of neural network model with varying learning rates 4.5 Integration of security best practices To verify the framework’s effectiveness in a secure envi- ronment, additional security best practices such as encryp- tion and real-time monitoring were integrated and tested. Figure 5: Accuracy comparison for decision tree, SVM, and Data was encrypted using AES-256 encryption (Equation 6 neural network models in Methodology), ensuring data confidentiality. The access control measures limited user permissions based on roles, The F1 scores are used to emphasize practical signifi- securing the model against unauthorized access. Real-time cance of each of the classification models in the evalua- monitoring, implemented through anomaly detection, suc- tion of the given metrics. The neural network has proven cessfully identified potential security breaches with an ac- to deliver improved precision as well as recall and an F1 curacy of 96%. An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 201 Table 5: Comparative analysis of model performance Model Accuracy Precision Recall F1 Score Decision Tree 89% 87% 88% 87.5% SVM 92% 90% 91% 90.5% Neural Network 98.83% 98.5% 98.6% 98.55% 4.6 Analysis of security metrics 5 Implications and limitations The framework was evaluated based on its ability to main- tain data confidentiality, integrity, and availability. Fig- 5.1 Practical applications ure 7 presents the security metrics obtained during test- The paper provides a practical outlook on the proposed ing, with encryption providing a data confidentiality rate framework for data security by incorporating classification of 100%, access control measures ensuring 99% integrity, techniques with cybersecurity principles into a heteroge- and real-time monitoring achieving a 96% availability rate. neous system. Due to its high accuracy, this framework is most effective in fields critical to data accuracy and se- curity, such as healthcare, finance, government, and cloud services. Table 7 provides a comparison of the proposed framework with state-of-the-art (SOTA) methods. – Healthcare Sector: In healthcare, keeping patients’ data and preventing leakage or ensuring safe data transmission is very important. This framework could improve the patient’s privacy by making it difficult for intruders to access the database system and also guar- antee data security. With an accuracy level of 98.83%, the proposed neural network model can be considered suitable for predicting and preventing security threats in medical data systems. – Financial Institutions: In this modern world, entities Figure 7: Security metric analysis for data confidentiality, dealing with cash give cash and deal with people’s fi- integrity, and availability. nancial records, such as transaction history and credit records, and they become targets for hacker attacks. Hence, the duplication of this framework can help fi- nancial organizations strengthen their protective mea- 4.7 Discussion of novel contributions sures against different types of fraud schemes. The The results substantiate the framework’s novel contribu- real-time monitoring capability, with an availability tions, as outlined in the introduction. The high classifi- rate of 96%, means the program can immediately iden- cation accuracy achieved by the Neural Network model tify such patterns and possible violations. demonstrates the framework’s capacity for accurate threat detection, with the 98.83% accuracy surpassing traditional – Government and Public Sector: This framework can models in complex security scenarios. In addition, secu- be implemented into government agencies, which nec- rity best practices pillars including encryption and real time essarily have large databases containing personal or monitoring gave a security boost to the framework in ad- nationally important data, thus increasing data protec- dition to guaranteeing the accuracy of data classification. tion. Thus, together with access control based on job As anticipated the study proofs that the proposed data se- positions, real-time monitoring helps to timely detect curity framework of incorporating machine learning with violations in working government databases. security practices not only improves security but also the accuracy of classification. Table 6 provides a summary of – Cloud Computing and IoT Environments: Cloud the core findings. While performing sensitivity analysis, a services and Internet of Things (IoT) networks are de- scalability problem arose, showing that neural networks are centralized environments. The monitoring, anomaly restricted by GPU memory and SVMs by the kernel calcu- detection, and encryption framework provided in this lation of the big data. These include helping choose models work can protect data in such environments and scale according to specific available resources and scalability for to accommodate the dynamics of the cloud architec- a certain application. ture’s application. 202 Informatica 49 (2025) 191–206 P. Wang et al. Table 6: Summary of findings Aspect Result Highest Classification Accuracy 98.83% (Neural Network) Best Security Metric 100% confidentiality through AES-256 encryption Robustness in Monitoring 96% availability in real-time monitoring Table 7: Comparison of proposed framework with state-of-the-art (SOTA) methodologies Author(s) Focus Area Key Contributions Limitations Ad- dressed by This Study Dasgupta et al. [25] ML in Cybersecurity Surveyed ML applications Improved model in intrusion detection and robustness and adversarial ML. Highlighted classification ac- vulnerabilities in adversarial curacy (98.83%). scenarios. Incorporated proac- tive monitoring to address evolving threats. Zhang et al. [30] Explainable AI (XAI) in Cy- Reviewed XAI methodolo- Achieved high per- bersecurity gies to enhance transparency formance (98.83%) and user trust in cybersecu- while ensuring ro- rity AI models. bust implementation. Proposed future integration of XAI for enhanced inter- pretability. Thapa and Camtepe Precision Health Data Secu- Proposed secure ML tech- Generalized frame- [23] rity niques and conceptual mod- work applicable els for health data. across domains with real-time monitoring for evolving cyber threats. Aslan et al. [24] Emerging Cybersecurity Highlighted the need for en- Combined AES- Threats hanced detection measures 256 encryption against IoT/cloud threats. with adaptive ML Reviewed ML/DL methods methods for robust for malware detection. security in IoT/cloud systems. Ahmad et al. [27] IoT and Cloud Cybersecu- Explored AI/DL-based so- Unified classifica- rity lutions for IoT-cloud inte- tion techniques with gration. Addressed security access control and gaps in cloud environments. monitoring for com- prehensive IoT/cloud protection. Sarker [26] Deep Learning (DL) Appli- Discussed DL challenges Enhanced DL robust- cations such as black-box nature ness with sensitivity and adaptability in cyberse- analysis and adapt- curity. ability in real-time monitoring. 5.2 Limitations of the study – Complexity of Implementation: Implementing this framework in existing systems involves significant complexity. Integrating multiple machine learning al- Despite its strengths, the framework has several limitations gorithms with advanced encryption and monitoring that may affect its application. An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 203 measures demands substantial resources and expertise, – AutomatedModel Updating: Developing automated which may not be available in all organizations. methods for periodic model retraining would help the framework stay effective against evolving threats by – Scalability Concerns: However, the neural network integrating new data patterns into the learning process. model proposed in this paper had high testing accu- racy; there may be a problem of scalability when ap- plying this framework to large systems. However, as Future research will concentrate on improving scalability the amount of data and classification types increases, through approaches such as parallel processing, batch nor- real-timemonitoring and accuracymaintenance can be malization, and model pruning to improve large-scale data demanding on resources in a deficient environment. management. Emerging technologies will be examined for secure data sharing and privacy-preserving model training, – Dependency on Data Quality: Usually, the classifi- including blockchain and federated learning. Furthermore, cation models depend on the quality of the given data. systems such as continuous learning pipelines and auto- When input data is inconsistent or incomplete, then the mated hyperparameter tuning frameworks will be incorpo- model will not perform effectively. However, main- rated to provide dynamic model updates and maintain per- taining the quality of the inputs even today poses a formance in changing cybersecurity landscapes. problem, especially in environments where data can be created perpetually and might not have been checked. – Adaptability to Emerging Threats: Security risks 6 Conclusion concern are never ending and keep changing from time to time. While using machine learning improves the This research offers a strong foundation for data protection spectrum of detection, there are sophisticated attack by integrating sophisticated classification systems into cy- tactics that may fail to be modeled. This needs con- bersecurity fundamentals to provide higher classes of data stant update and training to detect new patterns out confidentiality, integrity, and accessibility. Based on ma- there. chine learning algorithms, especially the neural network model, with an accuracy as high as 98.83 %, the frame- – Computational Overheads: Integration of high- work’s performance shows that, in principle, text classifi- complex models such as neural networks with real- cation and anomaly detection can accomplish high accu- time monitoring might actually slow down computa- racy. These security measures enhance the proposed frame- tion time, thus is not well suited for applications where work’s usefulness in organizations requiring high data se- response time is critical. The efficient use of available curity levels, including health, financial, and government resources is also desirable in order to propagate lower organizations. However, the challenges are still present in powered systems. practice, such as difficulty implementing the framework in an actual setting, concerns for its scalability, and a strong – Privacy and Compliance Constraints: Employing emphasis on data quality. Further, there is a continually ris- the best of machine learning in data security poses pri- ing danger of hacks and malicious activities that make up- vacy and regulatory issues because the two fields are dates and retraining of models essential. We can look into sensitive in motherhood, such as health and finance. the following possible directions for these kinds of research Data protection regulation like GDPR presents a chal- advances, as we already talked about the gaps: scaling lenge, especially when it comes to training, handling up optimization strategies, adding more general technolo- the training data, and the general handling of personal gies to machine learning for privacy, like quantum encryp- data. tion, and seeing improvements in advanced machine learn- ing practices that protect privacy. This framework protects 5.3 Future directions data and defines a new horizon for protecting secure data. As organizations increasingly rely on digital systems, im- To address these limitations and expand the potential of this plementing such adaptable frameworks becomes crucial to framework, future research could explore: countering cyber threats and safeguarding sensitive infor- mation. This study contributes to the growing field of cy- – Optimization for Scalability: Research focused on bersecurity by providing a practical and adaptable solution optimizing neural networks and other complex models that meets the demands of contemporary data security. to reduce computational costs could improve scalabil- ity, enhancing adaptability to large-scale systems. – Incorporation of Emerging Technologies: Emerg- Acknowledgement ing technologies like quantum computing and blockchain may further enhance security. Quantum This research is funded by the Science and Technology encryption, for example, could offer robust protection Project of Inner Mongolia Power Group Limited Company, against sophisticated cyber threats. Project No. 2023-5-34. 204 Informatica 49 (2025) 191–206 P. Wang et al. References [11] A. U. R. Butt, M. Asif, S. Ahmad, and U. Imdad, “An empirical study for adopting social computing in [1] A. B. Ige, E. Kupa, and O. Ilori, “Best practices global software development,” in Proceedings of the in cybersecurity for green building management sys- 2018 7th International Conference on Software and tems: Protecting sustainable infrastructure from cy- Computer Applications, 2018, pp. 31–35. ber threats,” International Journal of Science and Re- search Archive, vol. 12, no. 1, pp. 2960–2977, 2024. [12] A. U. R. Butt, M. A. Qadir, N. Razzaq, Z. Farooq, and I. Perveen, “Efficient and robust security implementa- [2] R. Kaur, D. Gabrijelčič, and T. Klobučar, “Artifi- tion in a smart home using the internet of things (iot),” cial intelligence for cybersecurity: Literature review in 2020 International Conference on Electrical, Com- and future research directions,” Information Fusion, munication, and Computer Engineering (ICECCE). vol. 97, p. 101804, 2023. IEEE, 2020, pp. 1–6. [3] Z. Yang, X. Liu, T. Li, D. Wu, J. Wang, Y. Zhao, and [13] D. Chen, P. Wawrzynski, and Z. Lv, “Cyber security H. Han, “A systematic literature review of methods in smart cities: a review of deep learning-based appli- and datasets for anomaly-based network intrusion de- cations and case studies,” Sustainable Cities and So- tection,” Computers & Security, vol. 116, p. 102675, ciety, vol. 66, p. 102655, 2021. 2022. [14] M.A. Ferrag, O. Friha, D. Hamouda, L.Maglaras, and [4] A. Fatani, A. Dahou, M. A. Al-Qaness, S. Lu, and H. Janicke, “Edge-iiotset: A new comprehensive real- M. A. Elaziz, “Advanced feature extraction and se- istic cyber security dataset of iot and iiot applications lection approach using deep learning and aquila op- for centralized and federated learning,” IEEE Access, timizer for iot intrusion detection system,” Sensors, vol. 10, pp. 40 281–40 306, 2022. vol. 22, no. 1, p. 140, 2021. [15] Z. Zhang, H. Ning, F. Shi, F. Farha, Y. Xu, J. Xu, [5] X. Sun, F. R. Yu, and P. Zhang, “A survey on F. Zhang, and K.-K. R. Choo, “Artificial intelligence cyber-security of connected and autonomous vehicles in cyber security: research advances, challenges, and (cavs),” IEEE Transactions on Intelligent Transporta- opportunities,” Artificial Intelligence Review, pp. 1– tion Systems, vol. 23, no. 7, pp. 6240–6259, 2021. 25, 2022. [6] A. U. R. Butt, T. Saba, I. Khan, T. Mahmood, A. R. [16] A. Khraisat and A. Alazab, “A critical review of intru- Khan, S. K. Singh, Y. I. Daradkeh, and I. Ullah, sion detection systems in the internet of things: tech- “Proactive and data-centric internet of things-based niques, deployment strategy, validation strategy, at- fog computing architecture for effective policing in tacks, public datasets and challenges,” Cybersecurity, smart cities,” Computers and Electrical Engineering, vol. 4, pp. 1–27, 2021. vol. 123, p. 110030, 2025. [17] T. O. Oladoyinbo, O. O. Adebiyi, J. C. Ugonnia, O. O. Olaniyi, and O. J. Okunleye, “Evaluating and estab- [7] S. Nifakos, K. Chandramouli, C. K. Nikolaou, P. Pa- lishing baseline security requirements in cloud com- pachristou, S. Koch, E. Panaousis, and S. Bonacina, puting: an enterprise risk management approach,” “Influence of human factors on cyber security within Asian journal of economics, business and accounting, healthcare organisations: A systematic review,” Sen- vol. 23, no. 21, pp. 222–231, 2023. sors, vol. 21, no. 15, p. 5119, 2021. [18] R. Vallabhaneni, S. Pillai, S. A. Vaddadi, S. R. Ad- [8] A. U. R. Butt, T. Mahmood, T. Saba, S. O. Bahaj, F. S. dula, and B. Ananthan, “Secured web application Alamri, M. W. Iqbal, and A. R. Khan, “An optimized based on capsulenet and owasp in the cloud,” Indone- role-based access control using trust mechanism in e- sian Journal of Electrical Engineering and Computer health cloud environment,” IEEE Access, 2023. Science, vol. 35, no. 3, pp. 1924–1932, 2024. [9] M. I. Khan, A. Imran, A. H. Butt, A. U. R. Butt et al., [19] M. K. Hasan, A. A. Habib, Z. Shukur, F. Ibrahim, “Activity detection of elderly people using smart- S. Islam, and M. A. Razzaque, “Review on cyber- phone accelerometer and machine learning methods,” physical and cyber-security system in smart grid: International Journal of Innovations in Science & Standards, protocols, constraints, and recommenda- Technology, vol. 3, no. 4, pp. 186–197, 2021. tions,” Journal of network and computer applications, vol. 209, p. 103540, 2023. [10] M. Ghiasi, T. Niknam, Z. Wang, M. Mehrandezh, M. Dehghani, and N. Ghadimi, “A comprehensive re- [20] K. U. Qasim, J. Zhang, T. Alsahfi, and A. U. R. view of cyber-attacks and defensemechanisms for im- Butt, “Recursive decomposition of logical thoughts: proving security in smart grid energy systems: Past, Framework for superior reasoning and knowledge present and future,”Electric Power Systems Research, propagation in large languagemodels,” arXiv preprint vol. 215, p. 108975, 2023. arXiv:2501.02026, 2025. An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 205 [21] I. H. Sarker, M. H. Furhad, and R. Nowrozy, “Ai- [33] M. S. Yadav and R. Kalpana, “Data preprocessing driven cybersecurity: an overview, security intelli- for intrusion detection system using encoding and gence modeling and research directions,” SN Com- normalization approaches,” in 2019 11th Interna- puter Science, vol. 2, no. 3, p. 173, 2021. tional Conference on Advanced Computing (ICoAC). IEEE, 2019, pp. 265–269. [22] M. A. Ferrag, O. Friha, L. Maglaras, H. Janicke, and L. Shu, “Federated deep learning for cyber secu- [34] P. Li, M. Abouelenien, R.Mihalcea, Z. Ding, Q. Yang, rity in the internet of things: Concepts, applications, and Y. Zhou, “Deception detection from linguistic and experimental analysis,” IEEE Access, vol. 9, pp. and physiological data streams using bimodal convo- 138 509–138 542, 2021. lutional neural networks,” in 2024 5th International Conference on Information Science, Parallel and Dis- [23] C. Thapa and S. Camtepe, “Precision health data: Re- tributed Systems (ISPDS). IEEE, 2024, pp. 263–267. quirements, challenges and existing techniques for data security and privacy,” Computers in biology and [35] M. A. Selvan, “Svm-enhanced intrusion detection medicine, vol. 129, p. 104130, 2021. system for effective cyber attack identification and mitigation,” 2024. [24] Ö. Aslan, S. S. Aktuğ, M. Ozkan-Okay, A. A. Yilmaz, [36] G. S. Kumar, K. Premalatha, G. U. Maheshwari, P. R. and E. Akin, “A comprehensive review of cyber se- Kanna, G. Vijaya, and M. Nivaashini, “Differential curity vulnerabilities, threats, attacks, and solutions,” privacy scheme using laplace mechanism and statis- Electronics, vol. 12, no. 6, p. 1333, 2023. tical method computation in deep neural network for [25] D. Dasgupta, Z. Akhtar, and S. Sen, “Machine learn- privacy preservation,” Engineering Applications of ing in cybersecurity: a comprehensive survey,” The Artificial Intelligence, vol. 128, p. 107399, 2024. Journal of Defense Modeling and Simulation, vol. 19, no. 1, pp. 57–106, 2022. [26] I. H. Sarker, “Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions,” SN computer science, vol. 2, no. 6, p. 420, 2021. [27] W. Ahmad, A. Rasool, A. R. Javed, T. Baker, and Z. Jalil, “Cyber security in iot-based cloud comput- ing: A comprehensive survey,” Electronics, vol. 11, no. 1, p. 16, 2021. [28] K.Wang and X.Wang, “Application of fuzzy decision theory in multi objective logistics distribution center site selection,” Informatica, vol. 48, no. 23, 2024. [29] W. S. Admass, Y. Y. Munaye, and A. A. Diro, “Cy- ber security: State of the art, challenges and future directions,” Cyber Security and Applications, vol. 2, p. 100031, 2024. [30] Z. Zhang, H. Al Hamadi, E. Damiani, C. Y. Yeun, and F. Taher, “Explainable artificial intelligence applica- tions in cyber security: State-of-the-art in research,” IEEE Access, vol. 10, pp. 93 104–93 139, 2022. [31] A. K. Marzook and J. Alkenani, “Hybrid kalman filter and optimization-based routing for energy efficiency in heterogeneous wireless sensor networks,” Infor- matica, vol. 48, no. 23, 2024. [32] Y. Li and T. Wang, “Intelligent management process analysis and security performance evaluation of sports equipment based on information security,” Measure- ment: Sensors, vol. 33, p. 101083, 2024. 206 Informatica 49 (2025) 191–206 P. Wang et al. https://doi.org/10.31449/inf.v49i12.6903 Informatica 49 (2025) 207–220 207 Dynamic Anti-Mapping Network Security Using Hidden Markov Models and LSTM Networks Against Illegal Scanning Min Guo 1, Dongjuan Ma 1, Feng Jing 1, Xueqin Zhang 1, Hengwang Liu 2* 1State Grid Shanxi Electric Power Research Institute, Taiyuan 030006, Shanxi, China 2Anhui Jiyuan Inspection and Testing Technology Co., Ltd, Hefei 230097, Anhui, China E-mail: hengwang_liu@outlook.com *Corresponding author's Keywords: illegal network scanning, anti-mapping techniques, secure access, dynamic ip addresses, port obfuscation Received: August 14, 2024 This paper deeply explores an innovative network anti-mapping security access technology to cope with the increasingly frequent illegal network scanning behaviors, aiming to build a more robust network security protection system. First, we analyze the threats of illegal scanning to network infrastructure, including but not limited to information leakage, service interruption, and the risk of being a springboard for subsequent attacks. Subsequently, a comprehensive security strategy is proposed, combining dynamic IP address allocation, port obfuscation, traffic camouflage, and behavior analysis to improve the system's concealment and anti-detection capabilities.This paper introduces the collaborative working mode of intelligent firewall and intrusion prevention system (IPS), using hidden Markov model (HMM) and long short-term memory network (LSTM) to identify and block malicious scanning behaviors, and optimize access control list (ACL) to achieve efficient release of legitimate traffic and accurate interception of illegal scanning traffic. Experimental results show that the proposed network anti-mapping security access technology has achieved significant results in improving network security. Specifically, we conducted experimental verification on the UNSW-NB15 dataset, which covers a variety of attack types and is very suitable for evaluating illegal network scanning defense mechanisms. Experimental results show that the accuracy of the Bi-LSTM+Attention model on this dataset reaches 98%, and the false alarm rate is reduced by 30% compared with the traditional LSTM model. In the pilot network area, this technology can effectively identify and intercept illegal scanning behaviors while maintaining low false alarm and missed alarm rates. By comparing with existing methods (such as honeypots, traffic obfuscation, etc.), we found that the Bi-LSTM+Attention model showed significant advantages in multiple key performance indicators. Although the model has high computing resource requirements and implementation complexity, its significant effect in improving detection accuracy and reducing false alarm rates makes it a technical solution worthy of promotion. In addition, we also discussed the trade-offs observed during the implementation, such as computational overhead and complexity, and proposed directions for future optimization. Povzetek: Članek obravnava inovativno tehnologijo za zaščito omrežij pred nezakonitim skeniranjem z uporabo dinamičnih IP-naslovov, skrivanja vrat in modelov HMM ter LSTM. 1 Introduction obstacle restricting the healthy development of the digital world. Illegal network scanning, as an outpost of cyber In the digital era, the Internet has become an attacks, frequently threatens the safe and stable operation indispensable infrastructure for global economic and of all kinds of network systems, ranging from government social activities, carrying massive information exchange agencies, financial institutions to small and medium-sized and service delivery. However, with the dramatic enterprises and even individual users. Such scanning expansion of network scale and the continuous expansion activities aim to collect information about the topology, of technical boundaries, network security issues have open services, operating system types and their become increasingly prominent, and have become a major vulnerabilities of the target network, paving the way for subsequent targeted attacks [1]. Hacking Intrusion Networks Attacks Points Automation Mass Attackers Networks Tools Scanning Figure 1: Flow of network attack 208 Informatica 49 (2025) 207–220 M. Guo et al. The rise of illegal network scanning of networks is detailed as follows: (1) We will conduct a comprehensive rooted in the complex ecology of network security attack and in-depth research to finely deconstruct the current and defense confrontation. With the popularization of technical characteristics of illegal network scanning, hacking techniques and automated tools, attackers are able popular tool sets and advanced attack strategies. This in- to launch large-scale scans at a very low cost to find depth analysis will not only reveal the specific risks they potential points of intrusion. These scanning behaviors are pose to network infrastructures, but also lay a solid often silent and difficult to be effectively screened and foundation for the design of subsequent technical blocked by traditional security measures. Once the solutions, ensuring that our countermeasures hit the nail network is exposed to scanning, it will not only lead to on the head [3]. (2) We are committed to designing a sensitive information leakage and service interruption, but comprehensive defense mechanism that integrates also may become the starting point of distributed denial- dynamic IP address management, port obfuscation of-service attacks (DDoS), ransomware propagation, data policies, traffic emulation techniques, and intelligent theft and other serious security incidents. Therefore, the behavioral analysis. The system increases the complexity development of advanced anti-scanning technology to and uncertainty faced by attackers by continuously improve the network's stealth and resilience has become changing the external manifestation of the network, thus an urgent problem in the current network security field, significantly reducing the likelihood of the network being and the specific network attack process is shown in Figure successfully scanned and effectively thwarting illegal 1 [2]. scanning attempts. (3) Leveraging cutting-edge AI Currently, illegal network scanning is characterized algorithms such as Hidden Markov Models (HMM) and by diversification and intelligence. On the one hand, the Long Short-Term Memory Networks (LSTM), we intend evolution of scanning tools and botnets has made scans to strengthen the synergy between the Intelligent Firewall more frequent, covert and difficult to track. Attackers use and Intrusion Prevention System (IPS), and to improve the botnets to disperse scanning sources and bypass detection accuracy and response speed of the two in identifying mechanisms based on IP reputation and frequency; on the malicious scanning behaviors. This integration not only other hand, Advanced Persistent Threat (APT) enables immediate threat awareness and effective organizations use customized scanning strategies to interception, but also maintains a high degree of adaptivity conduct in-depth reconnaissance for specific targets, in complex network environments. which increases the difficulty of defense. In addition, the This paper proposes cryptographic techniques such application of emerging technologies such as cloud as RSA and Diffie-Hellman to protect the security of the computing and the Internet of Things (IoT) further extends session. To consolidate the effectiveness of these network boundaries and provides scanners with a broader algorithms in ensuring secure communication within the attack surface. In the face of these challenges, traditional system, we cite their standard security proofs. protection strategies such as static firewall rules and Specifically, the security of RSA is based on the large simple port blocking are no longer adequate. integer factorization problem, while the security of Diffie- In recent years, illegal network scanning behaviors Hellman relies on the discrete logarithm problem. These have become increasingly frequent, posing a serious threat algorithms have been widely verified in academia and to network security. To address this challenge, researchers industry and are widely used in various security protocols. have proposed a variety of technologies, including By citing these standard security proofs, we ensure the honeypots, dynamic address translation (NAT), traffic security of the proposed system and provide readers with obfuscation, and behavior-based detection systems. These a credible technical foundation. methods have their own advantages and disadvantages, but generally face problems such as high false alarm rates 2 Literature review and high resource consumption. This study aims to propose an innovative network reverse mapping security 2.1 Illegal network scanning threat analysis access technology by combining dynamic IP address In the field of cybersecurity, illegal network scanning allocation, port obfuscation, traffic camouflage, and activities pose a constant and serious threat, not only as a behavior analysis. We use the UNSW-NB15 dataset for critical step in the hacker's attack chain, but also as a experimental verification, which covers a variety of attack behavior that cyberspace security maintainers must be types and is suitable for evaluating illegal network wary of. This section will take an in-depth look at the types scanning defense mechanisms. By introducing the Bi- of network scanning and the motives behind them, risk LSTM+Attention model, our method shows significant assessment of information leakage, the impact of service advantages in improving detection accuracy and reducing disruption and availability, and an analysis of the hazards false alarm rates. exhibited by illegal scanning as a prelude to an attack. Therefore, the core objective of this research is to Illegal network scanning can be broadly categorized conceptualize and propose an innovative network anti- into several types: basic port scanning, service probing, mapping security access technology architecture, which vulnerability scanning, operating system fingerprinting, aims to strongly counteract illegal network scanning and so on. Port scanning is the most basic form, in which behaviors and significantly enhance the resilience of the an attacker discovers open services and potential entry network's own protection through a set of multi- points by trying to connect to different ports of the target dimensional and dynamically changing strategy matrices. host one by one. Service probing goes a step further by Specifically, the detailed objectives of this research are sending specific probe packets to known open services in Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 209 order to identify the specific version of the service and With the increasing sophistication of Internet thus determine the presence of known vulnerabilities [4]. security threats, illegal network mapping (cyber While vulnerability scanning focuses on finding security reconnaissance) has become an outpost of cyber attacks. weaknesses at the system and application level, OS To defend against such threats, a series of anti-mapping fingerprinting is used to obtain precise information about techniques have emerged, aiming to obfuscate attackers the target system in order to customize more effective and protect the true layout and sensitive information of attack strategies. The motivations behind these scanning network infrastructure. This section provides a activities are multiple and complex. The first and foremost comprehensive overview of several mainstream anti- is information gathering, i.e., attackers prepare for mapping techniques, including but not limited to spoofing subsequent attacks and need to understand the structure, techniques, dynamic address translation, traffic protection measures and potential weaknesses of the target obfuscation, network segmentation and micro- network [5]. segmentation, and behavior-based detection and response The risk of information leakage due to illegal systems [9]. network scanning should not be underestimated. Even the Deception techniques are active defense strategies simplest port scan can reveal the layout of an that mislead attackers by deploying fake resources and organization's network, the specific services it uses, and services. This includes Honeypots, Honeynets, and their active status, which is enough information to help an Honeyflows, which mimic the characteristics of real attacker build an initial picture of the target. More in-depth systems or networks to attract and capture malicious service probes and vulnerability scans can expose deeper scanning behavior. When an attacker attempts to scan, vulnerabilities in the system, such as outdated software probe, or exploit these fake resources, their behavior is versions, which can become breakthroughs for intrusion. recorded and analyzed to give early warning and block Once such information falls into the wrong hands, it can potential threats. Not only do spoofing techniques drain not only lead to immediate data breaches or service attacker resources, they also provide security teams with disruptions, but also put the organization in a long-term valuable intelligence to help understand adversary tactics, security risk, as the exposed information can be used to techniques and procedures (TTPs). Dynamic Address devise more insidious and targeted attacks. While network Translation (DAT) or Network Address Translation scanning does not usually cause direct service disruptions, (NAT) technologies make it difficult for external entities it can raise indirect availability issues. A large number of to accurately map internal network structure by changing scanning requests can consume target system and network IP addresses between internal and external networks.DAT resources, including CPU, memory, and bandwidth, hides the true IP addresses of actual servers and devices, resulting in slower response to service requests from making it difficult for illegitimate scanners to access them. legitimate users, and in severe cases, denial of service may The ability of DAT to hide the real IPs of actual servers even occur. In addition, continuous scanning activities and devices makes it difficult for illegal scans to directly may trigger alarms on firewalls and intrusion detection locate specific targets, significantly increasing the systems, generating a large number of false positives, difficulty for attackers to identify valuable assets. consuming the security team's energy and interfering with Meanwhile, the strategy of regularly rotating IP addresses normal operations and maintenance [6,7]. further enhances this defense effect. Traffic obfuscation Illegal network scanning is often a harbinger of large- techniques make it difficult for external observers to parse scale attacks. It is a prelude to an elaborate attack plan by the true source, purpose, and content of packets by altering cybercriminals, whether it is data theft against a specific the patterns and characteristics of network organization, ransomware deployment, or resource communications. This includes altering port numbers, probing for a distributed denial of service (DDoS) attack. protocol characteristics, timestamps, and other network By conducting comprehensive reconnaissance of the traffic attributes, making it impossible for scanning tools target, attackers can precisely select attack paths, to correctly identify service type or version information. customize attack payloads, increase attack success rates Combined with encryption techniques, such as SSL/TLS, and reduce the risk of detection. Therefore, timely traffic obfuscation can more effectively hide the true identification and effective response to illegal network nature of network activity, increasing the cost and scanning activities are crucial for stopping potential complexity of illegal mapping [10,11]. network attacks and are an indispensable part of the Network segmentation is the division of a large network defense system [8]. network into multiple small areas that are logically or To summarize, illegal network scanning, as a physically isolated, limiting the ability to move laterally pervasive network threat with complex and varied hidden and making it difficult for an attacker to get a full grasp of motives behind it, poses direct and indirect threats to the layout of the entire network even if he or she breaks information security, service availability and the overall through a portion of the network. Micro-Segmentation network environment. goes one step further by realizing fine-grained access control, with strict access rules even between different resources within the same subnet. This strategy greatly improves the difficulty for attackers to navigate the 2.2 Overview of existing antimapping internal network and reduces the efficiency and success techniques rate of illegal mapping [12]. Modern cybersecurity frameworks are increasingly relying on artificial 210 Informatica 49 (2025) 207–220 M. Guo et al. intelligence and machine learning techniques, where which they can quickly identify scanning behaviors that behavior-based detection and response systems are able to deviate from the norm, and even predict and block future automatically analyze network traffic patterns, identify attack attempts. Through real-time monitoring, intelligent anomalous behaviors, and instantly respond to potential analysis, and automatic response, the efficiency and mapping activities. Such systems are able to learn a accuracy of countering illegal mapping is greatly behavioral baseline of normal network activity, from improved [13]. Table 1: Research findings Key Research/Technology Method Dataset Limitations Performance Metrics High resource Detection Rate: consumption, requires Deploying fake 85% continuous resources and services Custom or public False Positive Honeypot Technology maintenance to attract and mislead datasets Rate: 10% Can be identified attackers Resource and bypassed by Consumption: High advanced attackers Detection Rate: Limited defense 75% Changing IP against complex attack Laboratory False Positive Dynamic Address addresses between strategies environments or Rate: 5% Translation (NAT) internal and external Difficult to enterprise networks Resource networks handle large-scale Consumption: scanning Moderate Detection Rate: Limited 70% effectiveness against Altering network Public datasets False Positive advanced scanning Traffic Obfuscation communication such as UNSW-NB15 Rate: 8% strategies patterns and features Resource May affect Consumption: Low legitimate traffic Detection Rate: Complex 65% Dividing the configuration, high False Positive Network network into multiple Enterprise operational costs Rate: 3% Segmentation logically isolated networks Limited defense Resource segments against lateral Consumption: movement attacks Moderate Detection Rate: Requires large 80% amounts of data for Using machine Behavior-Based Public datasets False Positive model training learning to analyze Detection Systems such as CICIDS2017 Rate: 12% Limited network traffic patterns Resource generalization to new Consumption: High types of attacks As shown in Table 1, we compare different research but require large amounts of data for training and have and technologies in the context of illegal network scan limited generalization to new types of attacks. defense, including their methods, datasets, key performance metrics, and limitations. From the table, it 2.3 Status of research can be seen that honeypot technology, while effective in Honeypot technology has evolved from a single collecting attacker behavior information, has high decoy system to a complex system containing advanced resource consumption and requires continuous interactive honeypots and honeynets. Advanced maintenance, making it vulnerable to being identified and honeypots are able to simulate the behavior of real bypassed by advanced attackers. Dynamic Address systems, including operating system vulnerabilities, Translation (NAT) increases the difficulty for attackers by service responses, etc., as a way to collect the behavioral hiding internal IP addresses but is limited in its patterns and tool usage of attackers [14]. And by effectiveness against complex and large-scale scanning constructing a honeypot system containing multiple activities. Traffic obfuscation alters network interconnected honeypots, the honeynet not only increases communication patterns, making it difficult for scanning the difficulty for attackers to identify real assets, but also tools to correctly identify service types, but it is less traces the attack path and provides richer analysis data for effective against advanced scanning strategies and may security teams. With the development of automation and impact legitimate traffic. Network segmentation reduces intelligence, adaptive honeynet technology is emerging, the lateral movement capabilities of attackers through which dynamically adjusts honeypot configurations based logical isolation but is complex to configure and has high on attack behavior for more efficient intelligence operational costs. Behavior-based detection systems use gathering and defense response. Dynamic address machine learning models to automatically analyze translation (NAT) and network segmentation are effective, network traffic patterns, improving detection accuracy, but in the face of complex and changing attack methods, it is difficult to meet the demand with static strategies Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 211 alone [15]. Dynamic network architectures, such as necessity of improving data security and reliability in software-defined networking (SDN) and network function network environments. virtualization (NFV), are emerging as the new frontiers of anti-mapping. SDN allows administrators to flexibly 3 Innovative network anti-mapping configure network routing and security policies from a centralized controller to quickly respond to network security access technology threats, while NFV enables on-demand allocation and on- 3.1 Technical architecture design the-fly adjustment of resources by virtualizing the When designing the technical architecture of an functions of traditional network devices, which enhances advanced networked anti-mapping security access system, network flexibility and stealth. Although traffic we need to comprehensively consider a variety of factors obfuscation can effectively interfere with enemy including, but not limited to, security, availability, detection, it is a major challenge to implement it scalability, and performance optimization. In this section, accurately without affecting legitimate services. The we will delve into how to build such a system through combination of Deep Packet Inspection (DPI) and specific technical principles, algorithmic formulations, machine learning algorithms provides a possible solution and implementation details to ensure its effectiveness and to this problem [16]. DPI techniques can deeply parse robustness in complex network environments. network traffic to identify and classify different We adopt a dynamic IP address allocation policy application layer protocols, while machine learning (denoted as DIPA policy), which, in combination with models learn normal and abnormal behavior patterns by geolocation obfuscation techniques, can effectively analyzing huge amounts of network traffic data, thus improve the anonymity of the system. Let there be N pools achieving accurate identification of hidden mapping of available IP addresses in the network, and the behaviors. In addition, the application of unsupervised probability of dynamically changing addresses in each learning and adaptive learning algorithms enables the cycle T is P. The degree of obfuscation of the system is C. system to self-optimize in a constantly changing threat environment, enhancing the dynamic adaptability of the Where log2(N) reflects the entropy value of the size of defense. the address pools, which represents the uncertainty of Although the above technologies provide a powerful address selection. By adjusting the values of P and T, arsenal for anti-mapping, they still face many challenges security can be balanced with network maintenance cost. in actual deployment. First, the cost and complexity of In the port obfuscation technique, assuming that there are operation and maintenance are factors that cannot be M legitimate ports and K emulation protocols, the ignored, especially for small and medium-sized complexity S of port obfuscation can be quantified by the enterprises (SMEs), for which high-level anti-mapping solutions may be beyond their financial and technical M  i  capacity. Second, the synergistic operation between following equation: S = M + K 1−  technologies is also one of the difficulties, how to ensure i=1  M  that different defense mechanisms can complement each M  i  other while avoiding mutual interference requires careful Here,1−  represents the contribution of the planning and tuning [17,18]. In addition, legal compliance i=1  M  is also a point of consideration, as certain anti-mapping measures may involve regulatory restrictions on user randomness of the port usage to the obfuscation privacy protection and cross-border data transmission. effect [19], and the reuse of ports decreases and the In terms of data storage and transmission, Yang et al. obfuscation effect improves as i increases. Deep data [31] proposed a data sharing scheme for cloud storage obfuscation involves not only the header camouflage of services based on the concept of message recovery, which packets, but also the transformation of load data. Let the improves the reliability and security of data by introducing original data X be changed into Y by the obfuscation redundant information. This data sharing mechanism not function F. Ideally, F should satisfy irreversibility, i.e., the only enhances the integrity of the data, but also improves complexity of recovering X from Y is extremely high. A the ability of data to resist attacks during transmission. simple example of obfuscation is to use the XOR Similarly, Muthusenthil et al. [32] proposed a location operation with the key K: Y = X K However, in verification technology in cluster-based geolocation practice, more complex encryption algorithms such as routing, which enhances the security of mobile ad hoc AES are usually used, whose security is based on the size networks (MANETs) by verifying the location of the key space, i.e., 2n , where n is the key length [20]. information of nodes. Both methods emphasize the 212 Informatica 49 (2025) 207–220 M. Guo et al. Public Key A Private key Symmetric Key Signature Signature Original text cipher cipher Original Symmetric User 1 key User 2 Public Key B Digital envelope Digital B Private key envelope Symmetric key Figure 2: Two-way authentication process In the two-way authentication process, it is assumed each subnet are defined by access control lists (ACLs), the that RSA public key encryption and DH key exchange complexity E of which can be measured by the number of protocols are used.The security of RSA encryption is subnets and the number of ACL rules R: E = nR based on the large number decomposition puzzle.Let the Combined with role-based access control (RBAC), where public key be (e,n), the private key be (d,n), and the the role R_i corresponds to the set of permissions P_i, the message M be encrypted to be C, then we have: user U is assigned roles through the mapping function f: C =M e modn The receiver decrypts the message U ⎯⎯f→Ri  P In this way, a user can only perform i by using the private key: M =Cd modn Whereas, in the operations that are permitted by his or her role In this the Diffie-Hellman protocol, both parties compute a way, users can only perform the operations allowed by shared key, K, by sharing the parameters g and p: their roles, which enhances the security control within the system. A= ga mod p , . B = gb mod p [21], K = Ba mod p = Ab mod p . This dynamic key 3.2 Intelligent defense mechanism exchange ensures the security independence of each The synergistic operation of intelligent firewalls and session, and the specific two-way authentication process intrusion prevention systems (IPSs) is particularly is shown in Fig. 2. important in the evolving network threat landscape. We The network micro-segmentation technique realizes propose an innovative dual-engine architecture that the least privilege principle by partitioning the network combines traditional rule-based static defense with into multiple logical subnets. Assuming that the network advanced machine learning dynamic adaptation is partitioned into n subnets, the trust boundaries within capabilities, the framework of which is shown in Fig. 3 [22]. Fuzzy Logic Abnormal Screening Systems Traffic Markov Behavioral Intelligent Analyzing Processes Sequences Defense Mechanism LSTM Prediction Traffic trend networks Figure 3: Intelligent defense mechanism framework Fuzzy Logic System (FLS) plays a key role in this IP, etc.; A1,..., An is the affiliation function of these architecture by building a flexible set of rules to evaluate feature vectors, which defines the "fuzzy" degree of each the network events, which is expressed in the form of: feature in a set of linguistic variables; and y as the decision Ri : IF x1 is A1 AND ... AND xn is An THEN y is B output indicates the degree of suspicion of this network where (xi , ..., x event; and B is the decision output affiliation degree. is a n ) represents multiple feature vectors of function of these feature vectors, defining the degree of the network traffic, such as packet size, frequency, source "fuzziness" of each feature in the set of linguistic Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 213 variables; and y, as the decision output, indicates the indicator for anomaly detection. To further improve the degree of suspicion of the network event, while B is the model performance, we introduce the attention linguistic variable affiliation of the decision output. This mechanism, which is an effective method for guiding the mechanism allows the firewall to quickly identify and model to focus on the key pieces of information in the respond to anomalous traffic patterns, while the linkage traffic sequence. Attention weights are computed as with the IPS can instantly block potential intrusions, forming a multi-layered, intelligent defense network. e T t = v tanh(Whht )First-order Markov processes (Markov Chain of Order 1, MC1) are widely used in the prediction and exp(e follows: t ) [ t = 25] , v and W T h are analysis of behavioral sequences, especially in identifying abnormal and malicious activities in networks. By exp(ek ) constructing a matrix P = [ pij ]i, j reflecting the k=1 probability of state transfer for normal network behavior, model parameters, andt denotes the attention weights at where pij denotes the probability of transferring from the tth time step, which are subsequently used to weight state i to state j, we are able to use the model to assess the and sum the hidden states to generate context vectors that fit of the test sequences with the predefined normal focus on the information that is most critical for behavior model. Specifically, the likelihood L(X) of prediction. The use of Bi-LSTM (Bi-LSTM) greatly sequence X under the model can be expressed as: enhances the model's ability to capture complex temporal dependencies by simultaneously considering both past T (forward LSTM) and future (backward LSTM) contextual L(X ) = P(X | Model) = px [22] When t−1xt information of the sequence, as shown in Equation 6. t=2 the likelihood of a sequence is significantly lower ht = LSTM forward (xt ,ht−1) than the threshold of the normal behavior model, the sequence is considered to contain malicious behavior. ht = LSTM backward (xt ,h (6) t+1) This approach not only improves the accuracy of detection, but also dynamically adapts to changes in ht = [ht ;ht ] network behavior, further enhancing the system's intelligent response capability. Combining the above techniques, we not only For the dynamic nature of network traffic, Long construct a model that can accurately predict traffic trends, Short-Term Memory (LSTM) networks are preferred tools but also directly identify potential network anomalous for anomaly detection due to their powerful time-series behaviors by comparing the difference between the model modeling capabilities. LSTM units efficiently deal with prediction and the actual observed values, providing both long-term dependencies through their unique gating a powerful and sensitive early warning system for the mechanisms (forgetting gates f , input gates i network security protection system. This comprehensive t t , and strategy not only improves the generalization ability of the output gates ot , whose update formulas are specified in model and enhances its adaptability to emerging threats, Eqs. 1-5 [23,24]. but also brings more refined monitoring and protection f tools to the field of network security [26]. t = (Wf [ht−1, xt ]+bf ) (1) it = (Wi [ht−1, xt ]+bi ) (2) 3.3 Access control policy optimization o In the face of increasingly complex and changing t = (Wo [ht−1, xt ]+bo) (3) network access demands and security threats, traditional c static access control lists (ACLs) can no longer meet the t = ft ct−1 + it tanh(Wc [ht−1, xt ]+bc) (4) requirements of efficient and accurate traffic management, and the general access control model is shown in Fig. 4. ht = ot tanh(ct ) (5) Therefore, we introduce an innovative adaptive weighting algorithm, which aims to dynamically adjust the priority where represents the Sigmoid activation function, of ACL entries so as to achieve efficient processing of tanh is the hyperbolic tangent function, odot denotes the legitimate traffic and keen identification of potential elementwise multiplication operation, and threats. The core formula of this policy is: W f ,Wi ,Wo ,Wc and bf ,bi ,bo ,bc are the weight matrices and bias terms for each gate and cell state and Wi (t +1) =Wi (t)+ (Hi −H)+  H i hidden state, respectively. Training the LSTM model with a large amount of In this formula,Wi (t) represents the weight of the historical traffic data not only predicts the future traffic trend, but also the deviation between the model predicted ith ACL rule at time t, which integrates the historical value and the actual traffic data can be used as a direct traffic data and real-time threat intelligence to realize the 214 Informatica 49 (2025) 207–220 M. Guo et al. adaptive adjustment of rule priority. Among them, Hi changing trends to make the adjustment more delicate and accurate. reflects the historical importance of the traffic matched by In order to further accelerate the recognition and the rule, H is the average importance of all rules, which processing speed of legitimate traffic, we design a high- aims to highlight the key rules by comparison; H speed matching mechanism that combines Deep Packet i Inspection (DPI) technique with machine learning. This quantifies the rate of change of the rule's importance to mechanism utilizes a pre-trained Support Vector Machine ensure that the policy can quickly respond to the changes (SVM) model to accurately determine the traffic features of network conditions; and the adjustment coefficients, with its powerful classification capability. The decision and  , balance the effects of historical performance and function of the SVM model is: g(x) =wT(x)+b [27]. Thematic Environ ment Object Permi t Access requests Denia l Figure 4: Access control model Here, w is the weight vector, (x) is the feature the time step, D is the input feature dimension, and H is the number of hidden layer units. The introduction of the transformation function that maps the original feature attention mechanism adds additional computational cost, vector x to a higher dimensional space, and b is the bias and its time complexity is O (T * H). Overall, the time term of the model. By learning from a large number of complexity of the training phase is O (T * (D * H^2 + H)). samples, the model is able to accurately distinguish the In the inference phase, the time complexity is relatively feature boundaries between legitimate and illegitimate low, O (T * (D * H + H)). traffic, and set the threshold , any traffic that satisfies Compared with traditional rule-based systems, the g(x)  is immediately released without further Bi-LSTM+Attention model has obvious advantages in checking, which greatly improves the throughput and dynamic adaptability and accuracy, although it has higher response speed of the network. The efficiency of this computational requirements. Traditional systems rely on mechanism lies in its deep integration of the fine-grained predefined rules and have difficulty in dealing with new parsing capability of DPI and the intelligent judgment attacks and changing network environments. The Bi- advantage of SVM model, which not only can quickly LSTM+Attention model can automatically learn and adapt identify and release regular legitimate traffic, but also can to new threat patterns, thereby maintaining efficient effectively resist advanced threats disguised as legitimate detection capabilities in a constantly changing network traffic, ensuring the security and smoothness of network environment. Despite the high demand for computing access. resources, its contribution to improving the level of Through the careful design and strategy optimization network security protection makes it a reasonable and of the above technical architecture, the network anti- necessary choice. mapping security access technology proposed in this chapter takes a solid step forward in ensuring the dynamic 4 Experimental design and analysis of adaptability and security of the network environment. This solution not only strengthens the defense against network results mapping attacks, but also significantly improves the 4.1 Experimental design operational efficiency of the network and user satisfaction, providing strong technical support for building a more In this study, we carefully built the experimental robust and flexible network security protection system. environment and selected appropriate datasets to ensure We elaborate on the time complexity of the proposed the reproducibility of the experiments and the validity of Bi-LSTM+Attention algorithm. In the training phase, the the results. The experimental environment includes a time complexity of LSTM is O(T * D * H^2), where T is high-performance server cluster with each node equipped Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 215 with an Intel Xeon E5-2690 v4 processor, 128GB RAM, performance testing phase, we deployed the model into a and NVIDIA Tesla V100 GPUs to provide powerful simulated network environment and tested its response computing power. For the software environment, we time, throughput, and resource consumption under chose the Ubuntu 18.04 operating system, the Python 3.7 different conditions to comprehensively evaluate the programming language, and the TensorFlow 2.3 deep model's performance. Through these steps, we ensured the learning framework, and the combination of these tools rigor of the experiments and the reliability of the results provided a stable and efficient platform for our [30]. experiments [28,29]. The experimental environment includes a high- The choice of dataset is crucial for model training and performance server cluster, each node is equipped with testing. We adopt the publicly available UNSW-NB15 Intel Xeon E5-2690 v4 processor, 128GB RAM and dataset, which contains 49,740 records covering normal NVIDIA Tesla V100 GPU to provide powerful computing network traffic and multiple attack types, and is well power. In terms of software environment, we chose suited for deep learning model training related to network Ubuntu 18.04 operating system, Python 3.7 programming security. In addition, we also built our own performance language and TensorFlow 2.3 deep learning framework to test dataset generated from a simulated network ensure the stability and efficiency of the experiment. environment, which simulates network traffic under We chose the UNSW-NB15 dataset as the main data different loads and is used to evaluate the performance source, which covers a variety of attack types, including impact of the models in real network environments. DoS, DDoS, SQL injection, etc., and is very suitable for In terms of technical implementation steps, we evaluating illegal network scanning defense mechanisms. follow a series of key steps including data preprocessing, The advantage of the UNSW-NB15 dataset lies in its model construction, training and tuning, and performance diversity and realism, which can better represent the testing. The data preprocessing phase includes operations security challenges in the real world. In contrast, although such as data cleaning, normalization, and time-series the CICIDS2017 dataset also contains a variety of attack partitioning to ensure the quality and consistency of the types, its scale is small and the sample size of some attack data. In the model construction phase, we design and types is insufficient. Therefore, the UNSW-NB15 dataset implement a bi-directional LSTM model with an has more advantages in comprehensiveness and integrated attention mechanism to improve the model's representativeness, and is more suitable as our ability to process time series data. In the training and experimental dataset. tuning phase, we used a cross-validation method to select the optimal hyperparameters, including the learning rate, 4.2 Experimental results batch size, and the number of hidden layer units, to optimize the performance of the model. Finally, in the Figure 5: Comprehensive defense effect Figure 5 shows the performance of different models model shows the best defense on all attack types with the in detecting various network attack types including DoS, lowest false alarm rate, indicating the high efficacy of this DDoS, SQL injection and XSS.The Bi-LSTM+Attention model in accurately identifying attacks. 216 Informatica 49 (2025) 207–220 M. Guo et al. Table 2: False alarm rate breakdown Overall False Normal Traffic False Anomalous but not attack mould Alarm Rate Alarms false positives LSTM model 3.2% 1.8% 1.4% LSTM+Attention 2.1% 1.2% 0.9% Bi-LSTM 2.8% 1.6% 1.2% Bi- 1.5% 0.9% 0.6% LSTM+Attention Table 2 breaks down the overall false alarm rates of alarm rate, indicating that it performs well in reducing the different models, as well as the false alarm rates for false alarms, which is crucial for improving the reliability normal traffic and abnormal but non-attacking traffic. The of network defense systems. Bi-LSTM+Attention model has the lowest overall false Table 3: Breakdown of underreporting rates Overall underreporting Known attack New Attack mould rate misses Leakage LSTM model 2.5% 1.3% 1.2% LSTM+Attention 1.8% 0.9% 0.9% Bi-LSTM 2.2% 1.1% 1.1% Bi- 1.3% 0.6% 0.7% LSTM+Attention Table 3 demonstrates the leakage rates of different Table 4 records the average response time and models in detecting known and novel attacks. The Bi- throughput of the different models in the simulated LSTM+Attention model has the lowest leakage rate on network environment. Although the introduction of the both attack types, which indicates that the model has a defense model leads to a slight increase in response time strong generalization ability in identifying novel attacks. and a slight decrease in throughput, the Bi-LSTM and Bi- LSTM+Attention models are better able to maintain high Table 4: Response time and throughput network performance compared to the other models. Response Throughput In order to comprehensively evaluate the model mould time average Average performance, we introduced statistical significance tests (ms) (Mbps) such as t-tests on the basis of existing evaluation indicators to verify the reliability of the results. In addition to false defenseless 2.3 98.7 positive and false negative rates, we also reported comprehensive indicators such as accuracy, recall, and F1 LSTM model 3.5 95.2 score. Specifically, the Bi-LSTM+Attention model LSTM+Attention 3.8 93.8 achieved an accuracy of 98%, a recall of 95%, and an F1 score of 96.5% on the UNSW-NB15 dataset. These Bi-LSTM 3.2 96.4 indicators not only demonstrate the high accuracy of the Bi- model in detecting illegal network scanning, but also show 3.6 94.6 LSTM+Attention that it has high practical value in practical applications. Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 217 Table 5: Model performance comparison F1 Accuracy Recall False Positive False Negative t-test (p- Model/Method Score (%) (%) Rate (%) Rate (%) value) (%) Bi-LSTM+Attention 98.0 95.0 96.5 1.5 5.0 < 0.05 Rule-Based System 80.0 75.0 77.4 10.0 25.0 - LSTM Model 85.0 82.0 83.5 8.0 18.0 < 0.05 LSTM with Attention 90.0 88.0 89.0 5.0 12.0 < 0.05 Mechanism Bidirectional LSTM 92.0 90.0 91.0 4.0 10.0 < 0.05 (Bi-LSTM) In Table 5, through t-tests, we found that the Bi- when dealing with complex and variable network traffic, LSTM+Attention model showed significant differences and can effectively reduce the false alarm rate while from the rule-based system and other LSTM variants in maintaining a high detection rate. These results show that multiple key performance indicators (p < 0.05), further the Bi-LSTM+Attention model is not only theoretically confirming the effectiveness and superiority of the new advantageous, but also has high practical value in practical method. In addition, the model performs particularly well applications. Table 6: Resource consumption mould Average CPU utilization (%) Average Memory Usage (MB) defenseless 3.1 230 LSTM model 5.8 320 LSTM+Attention 6.5 350 Bi-LSTM 4.9 280 Bi-LSTM+Attention 5.4 300 In summary, the Bi-LSTM+Attention model Table 7: Network latency and energy consumption performs the best in terms of comprehensive defense Average Average energy effect, false alarm rate and missed alarm rate, and at the mould network consumption same time has a relatively small impact on network performance, making it an efficient network defense latency (μs) (W) solution. defenseless 75 200 LSTM model 90 250 4.3 Discussion The technical architecture in this study demonstrates LSTM+Attention 95 270 significant innovative advantages, especially in terms of Bi-LSTM 85 230 dynamism and intelligence. The anonymity of the network is effectively improved through dynamic IP address Bi- 90 260 assignment and geolocation obfuscation, making it LSTM+Attention difficult for mapping attackers to target the real resource locations. The synergy of intelligent firewall and IPS, the Table 6 shows the average consumption of CPU and use of fuzzy logic system, and the application of Markov memory resources by the different models during model and LSTM not only enhances the ability to identify operation. The LSTM+Attention model is slightly higher malicious behaviors, but also significantly improves the in terms of resource consumption, but all the models are response speed. In particular, the LSTM model improves within acceptable resource usage, indicating that these the accuracy of anomaly detection through the attention models can effectively run-on existing network devices. mechanism and bi-directional structure, demonstrating the Table 7 evaluates the impact of the different models on great potential of deep learning in complex network network latency and energy consumption. The Bi-LSTM defense. model performs the best in terms of network latency and Honeypot technology deploys false resources and energy consumption, suggesting that it is effective in services to attract and mislead attackers, and can controlling operational costs while maintaining network effectively collect attacker behavior information. performance. However, honeypot technology consumes a lot of 218 Informatica 49 (2025) 207–220 M. Guo et al. resources, requires continuous maintenance, and is easily considering its significant advantages in improving the identified and bypassed by advanced attackers. In contrast, level of network security protection. Future work can the Bi-LSTM+Attention model is more economical in explore optimization algorithms to further reduce terms of resource consumption and does not require computational costs and make it more applicable in more additional hardware or continuous manual maintenance. scenarios. In addition, the Bi-LSTM+Attention model can Through the above comparison and analysis, we can automatically adapt to new threats by learning network conclude that the Bi-LSTM+Attention model has traffic patterns, reducing dependence on manual significant advantages in illegal network scanning intervention. Although honeypot technology has defense. It not only performs well in detection rate and advantages in collecting intelligence, the Bi- false alarm rate, but also can effectively adapt to complex LSTM+Attention model performs better in terms of false network environments. Despite the certain computational positive rate and false negative rate, reaching 1.5% and overhead and implementation complexity, the security and 5.0% respectively, which are significantly lower than the reliability improvements it brings make it a technical 10% and 25% of honeypot technology. solution worthy of promotion. Traffic obfuscation changes network communication Limitations: Despite the remarkable results, there are patterns and features, making it difficult for scanning tools some limitations of the proposed technical solution. The to correctly identify service types. Although traffic first one is the resource consumption issue, such as the obfuscation performs well in reducing false positive rates, high performance of the LSTM model which requires high it has limited effect on advanced scanning strategies and computational resources and may be difficult to deploy in may affect the normal transmission of legitimate traffic. resource-limited environments. Secondly, the false alarm The Bi-LSTM+Attention model uses deep learning and omission rates, although significantly reduced, need algorithms to more accurately identify and classify to continue to be optimized to reduce the interference with network traffic, which not only reduces false positive rates normal operations. Further, the complexity of the but also improves detection rates. Specifically, the Bi- technology implementation may pose a challenge to small LSTM+Attention model has a false positive rate of 1.5%, and medium-sized enterprises, requiring specialized while the traffic obfuscation technology has a false knowledge and maintenance costs. positive rate of 8%. In addition, the Bi-LSTM+Attention model performs particularly well when dealing with 5 Conclusion complex and changing network traffic, and can effectively In this study, we successfully developed and reduce false positive rates while maintaining high validated an innovative set of network anti-mapping detection rates. security access techniques, which have achieved Dynamic Address Translation (NAT) increases the significant results in enhancing network defenses, difficulty for attackers by changing IP addresses between improving anonymity and ensuring secure data internal and external networks. However, NAT is limited transmission. The comprehensive design of the technical in its effectiveness when dealing with complex and large- architecture, especially the integration of dynamic policies scale scanning activities. The Bi-LSTM+Attention model and intelligent algorithms, effectively counteracts the can automatically adapt to new threats by learning complex security threats in the modern network network traffic patterns, thereby showing higher detection environment. Experimental data analysis proves that the rates and lower false positive rates in complex and large- bidirectional LSTM model with the introduction of the scale scanning activities. NAT has a false positive rate of attention mechanism improves the accuracy of anomaly 5%, while the Bi-LSTM+Attention model has a false detection while reducing the false alarm rate of normal positive rate of only 1.5%. network activities, indicating that the combination of deep Behavior-based detection systems use machine learning and traditional security technologies is an learning models to automatically analyze network traffic effective way to enhance the performance of network patterns and improve detection accuracy. However, these defense. Despite the obvious advantages of the systems usually require a large amount of data for training technology, including dynamism, intelligence, and and have limited generalization capabilities for new efficient defense against multiple attack types, we also attacks. The Bi-LSTM+Attention model improves recognize some challenges in the implementation of the detection performance by introducing an attention technology. The resource consumption problem is a key mechanism to enhance the model's focus on key features. barrier to the deployment of current deep learning models, In practical applications, the Bi-LSTM+Attention model especially in scenarios with limited computational outperforms the behavior-based detection system in terms resources. In addition, the complexity of the technique of accuracy, recall, and F1 score. requires higher maintenance costs and specialized skills, Although the Bi-LSTM+Attention model performs which may limit its widespread adoption in SMEs. well on multiple key performance indicators, it has high Therefore, future research should focus on model computational overhead and implementation complexity. lightweighting, resource optimization, and simplifying the The time complexity of the training phase is O(T * (D * deployment process to facilitate the technology's H^2 + H)), and the time complexity of the inference phase popularity. Compared with existing antimapping is O(T * (D * H + H)). This makes it challenging to deploy techniques, the technical framework in this study shows the model in a resource-constrained environment. significant advantages in terms of dynamic adaptability, However, this computational overhead is reasonable Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 219 intelligent response, and accuracy, especially in dealing Engineering. 2022; 42(1): 133-48. with complex network behavior sequence prediction and https://doi.org/10.32604/csse.2022.020123 anomaly detection tasks. However, continuous [5] Chiu WY, Meng WZ, Jensen CD. my data, my performance optimization, further reduction of false alarm control: a secure data sharing and access scheme over and omission rates, and exploration of the convergence of blockchain. Journal of Information Security and new technologies, such as the application of quantum Applications. 2021; 63: 102994. computing and edge computing in security, will be the key https://doi.org/10.1016/j.jisa.2021.102994. directions for future development. [6] Yang D, Wang BC, Ban XH. Fully secure non- This paper proposes an innovative network reverse monotonic access structure CP-ABE scheme. KSII mapping security access technology to cope with the Transactions on Internet and Information Systems. increasingly frequent illegal network scanning behaviors. 2018; 12(3): 1315-29. By combining dynamic IP address allocation, port https://doi.org/10.3837/tiis.2018.03.019 obfuscation, traffic camouflage and behavior analysis, we [7] Suebsombut P, Sekhari A, Sureephong P, Belhi A, build a more robust network security protection system. Bouras A. Field Data Forecasting Using LSTM and Experimental results show that the Bi-LSTM+Attention Bi-LSTM Approaches. Applied Sciences-Basel. model achieves 98% accuracy on the UNSW-NB15 2021; 11(24): 11957. dataset and reduces the false alarm rate by 30%. This https://doi.org/10.3390/app112411957. technology effectively identifies and intercepts illegal [8] Sonkamble RG, Bongale AM, Phansalkar S, Sharma scanning behaviors in the pilot network while maintaining A, Rajput S. Secure Data Transmission of Electronic low false alarm and missed alarm rates. Compared with Health Records Using Blockchain Technology. existing methods, our method has significant advantages Electronics. 2023; 12(4): 1003. in detection accuracy and resource efficiency, providing a https://doi.org/10.3390/electronics12041003. more reliable solution for network security. [9] Agrawal R, Singhal S, Sharma A. Blockchain and This paper discusses the challenges that small and fog computing model for secure data access control medium-sized enterprises (SMEs) face when adopting mechanisms for distributed data storage and these technologies, including limited computing resources authentication using hybrid encryption algorithm. and deployment complexity. To alleviate these challenges, Cluster Computing. 2024; 27(1), 1–15. we recommend using model compression techniques, such https://doi.org/10.1007/s10586-023-04120-9 as pruning and quantization, to simplify the deployment [10] Sureshkumar T, Lingaraj M, Anand B, Premkumar process and reduce computing resource requirements. In T. Non-dominated sorting particle swarm addition, SMEs should consider leveraging off-the-shelf optimization (NSPSO) and network security policy solutions from cloud service providers to reduce initial enforcement for Policy Space Analysis. International investment costs. At the same time, potential regulatory Journal of Communication Systems. 2018; 31(10): issues, such as the impact of GDPR on network traffic e3576. https://doi.org/10.1002/dac.3576. monitoring, can help enterprises ensure compliance. With [11] Khan I, Ghani A, Saqlain SM, Ashraf MU, Alzahrani these measures, SMEs can implement and manage A, Kim D. Secure Medical Data Against cybersecurity solutions more effectively. Unauthorized Access Using Decoy Technology in Distributed Edge Computing Networks. IEEE Funding Access. 2023; 11: 144560-73. https://doi.org/10.1109/ACCESS.2023.3344168 This study was supported by State Grid Shanxi [12] Pinto S, Machado P, Oliveira D, Cerdeira D, Gomes Electric Power Company Science and Technology Project T. Self-secured devices: high performance and Research (No.52053023001U). secure I/O access in TrustZone-based systems. Journal of Systems Architecture. 2021; 119: 102238. References https://doi.org/10.1016/j.sysarc.2021.102238 [1] Adi K, Hamza L, Pene L. Automatic security policy [13] Yang J, Chen YH, Du SY, Chen BD, Principe JC. enforcement in computer systems. Computers & IA-LSTM: Interaction-Aware LSTM for Pedestrian Security. 2018; 73: 156-71. Trajectory Prediction. IEEE Transactions on https://doi.org/10.1016/j.cose.2017.10.012 Cybernetics. 2024; 57(4): 3904-3917, [2] Paananen H, Lapke M, Siponen M. State of the art in https://doi.org/10.1109/TCYB.2024.3359237. information security policy development. Computers [14] Meng YF, Huang ZQ, Shen GH, Ke CB. A security & Security. 2020; 88: 101615. policy model transformation and verification https://doi.org/10.1016/j.cose.2019.101615 approach for software defined networking. [3] Kanimozhi S, Kannan A, Devi KS, Selvamani K. Computers & Security. 2021; 100: 13206. Secure cloud-based e-learning system with access https://doi.org/10.48550/arXiv.2005.13206. control and group key mechanism. Concurrency and [15] Susilo W, Jiang P, Lai JC, Guo FC, Yang GM, Deng Computation-Practice & Experience. 2019; 31(12): RH. Sanitizable Access Control System for Secure e5106. https://doi.org/10.1002/cpe.5106 Cloud Storage Against Malicious Data Publishers. [4] Al-Amri B, Sami G, Alhakami W. An Effective IEEE Transactions on Dependable and Secure Secure MAC Protocol for Cognitive Radio Computing. 2022; 19(3): 2138-48. Networks. Computer Systems Science and https://doi.org/10.1109/TDSC.2021.3058132 220 Informatica 49 (2025) 207–220 M. Guo et al. [16] Sureshkumar T, Anand B, Premkumar T. Efficient computerized tools to design information security Non-Dominated Multi-Objective Genetic Algorithm policies. Computers & Security. 2020; 99: 102063. (NDMGA) and network security policy enforcement https://doi.org/10.1016/j.cose.2020.102063 for Policy Space Analysis (PSA). Computer [29] Merhi MI, Ahluwalia P. Predicting Compliance of Communications. 2019; 138: 90-7. Security Policies: Norms and Sanctions. Journal of https://doi.org/10.1016/j.comcom.2019.03.008 Computer Information Systems. 2023; 64(5), 683– [17] Hu T, Yang SQ, Wang YP, Li GL, Wang YL, Wang 697. G, Yin MY. N-Accesses: a Blockchain-Based https://doi.org/10.1080/08874417.2023.2241413 Access Control Framework for Secure IoT Data [30] Yang J H, Lin I C, Chien P C. Data Sharing Scheme Management. Sensors. 2023; 23(20): 8535; for Cloud Storage Service Using the Concept of https://doi.org/10.3390/s23208535. Message Recovery. Informatica, 2017, 28(2): 375- [18] Varma IM, Kumar N. A comprehensive survey on 386. https://doi.org/10.15388/Informatica.2017.134 SDN and blockchain-based secure vehicular [31] Muthusenthil B, Kim H, Prasath V B. Location networks. Vehicular Communications. 2023; 44: verification technique for cluster based geographical 100663. routing in MANET. Informatica, 2020, 31(1): 113- https://doi.org/10.1016/j.vehcom.2023.100663. 130. https://doi.org/10.15388/20-INFOR402 [19] Lin HY, Tsai TT, Wu HR, Ku MS. Secure access control using updateable attribute keys. Mathematical Biosciences and Engineering. 2022; 19(11): 11367-79. https://doi.org/10.3934/mbe.2022529 [20] Sivaselvan N, Bhat KV, Rajarajan M, Das AK. A New Scalable and Secure Access Control Scheme Using Blockchain Technology for IoT. IEEE Transactions on Network and Service Management. 2023; 20(3): 2957-74. https://doi.org/ 10.1109/TNSM.2023.3246120 [21] Wu YC, Sun R, Wu YJ. Smart City Development in Taiwan: From the Perspective of the Information Security Policy. Sustainability. 2020;12(7): 2916; https://doi.org/10.3390/su12072916. [22] Wang SP, Wang X, Zhang YL. A Secure Cloud Storage Framework with Access Control Based on Blockchain. IEEE Access. 2019; 7: 112713-25. https://doi.org/10.1109/ACCESS.2019.2929205 [23] Omala AA, Mbandu AS, Mutiria KD, Jin CH, Li FG. Provably Secure Heterogeneous Access Control Scheme for Wireless Body Area Network. Journal of Medical Systems. 2018; 42(6): 108. https://doi.org/10.1007/s10916-018-0964-z [24] Yang Y, Liu XM, Guo WZ, Zheng XH, Dong C, Liu ZQ. Multimedia access control with secure provenance in fog-cloud computing networks. Multimedia Tools and Applications. 2020; 79(15- 16): 10701-16. https://doi.org/10.1007/s11042-020- 08703-1 [25] Kumari A, Gupta R, Tanwar S, Kumar N. A taxonomy of blockchain-enabled softwarization for secure UAV network. Computer Communications. 2020; 161:304- 23. https://doi.org/10.1016/j.comcom.2020.07.042 [26] Calzavara S, Rabitti A, Bugliesi M. Semantics-Based Analysis of Content Security Policy Deployment. ACM Transactions on the Web. 2018; 12(2): 1-36. https://doi.org/10.1145/3149408 [27] Zhang J, Chen AM, Zhang P. Provably Secure Data Access Control Protocol for Cloud Computing. Symmetry-Basel. 2023; 15(12): 2111; https://doi.org/10.3390/sym15122111. [28] Rostami E, Karlsson F, Gao S. Requirements for https://doi.org/10.31449/inf.v49i12.9588 Informatica 49 (2025) 221–230 221 A Hybrid OCR-XGBoost-Transformer Pipeline for Resume Parsing with Spatial-Semantic Integration Rachid Ed-Daoudi1*, Fatima Zahra Zakka2, Mouslime Ouqassou1, Badia Ettaki1 E-mail: rachid.ed-daoudi@uit.ac.ma, fzakka@esi.ac.ma, mouqassou@esi.ac.ma, bettaki@esi.ac.ma. * Corresponding author 1LyRICA: Laboratory of Research in Computer Science, Data Sciences and Artificial Intelligence, School of Information Sciences Rabat-Instituts, Rabat, Morocco 2Knowledge and data engineer, School of Information Sciences Rabat-Instituts, Rabat, Morocco Keywords: resume information extraction, hybrid AI solution, optical character recognition, XGBoost, transformers Received: June 5, 2025 This study addresses the automation of resume information extraction using a hybrid Artificial Intelligence (AI) framework that integrates Optical Character Recognition (OCR), Machine Learning, and Deep Learning techniques. The system operates in three stages: text extraction using PaddleOCR, resume section classification via XGBoost, and semantic entity recognition using a Transformers-based Named Entity Recognition (NER) model. The dataset consists of 200 French resumes collected in PDF format and annotated for ten resume section classes and multiple named entities. Evaluation was conducted using standard multi-class classification metrics including accuracy, precision, recall, and F1- score. Experimental results show that XGBoost achieved 96.5% accuracy in section classification, while the Transformers model attained 82% accuracy in semantic entity extraction. This dual-stage pipeline captures both spatial and semantic structures of resumes, offering improved accuracy and adaptability over traditional parsing approaches. Povzetek: Članek predstavlja hibridno rešitev OCR-XGBoost-Transformer za avtomatizirano ekstrakcijo podatkov iz življenjepisov. Sistem dosega visoko točnost pri razvrščanju razdelkov z XGBoost in pri semantičnem prepoznavanju entitet s transformerjem. 1 Introduction by software and computer systems [3]. Several parsing approaches are commonly used. Keyword-based parsers In an unpredictable and complex business environment, it are prototypes of faster and more accurate parsers. These is important that organizations aim to realize the potential simplistic parsers search for specific words, key phrases, offered by the recruitment phase. Organizations are in a and patterns in resume text. However, this approach is ceaseless race to find new talent to support their teams and prone to errors (with an accuracy rate of about 70%) as corporate competitiveness. The reality is that collecting words can have multiple contexts within a resume [4]. candidate information from resumes is often difficult to Grammar-based parsers rely on grammatical rules to achieve [1]. interpret information. These relatively complex parsers Recruiters are required to read and analyse candidate require manual input during the coding process. When resumes manually for the information they need. This coding is done by a skilled linguistic engineer, they can manual practice is full of disadvantages. First, it is time analyze a resume quite accurately. However, if manual consuming and a labor-intensive activity for recruiters configuration is not done correctly, grammar-based who have to read many resumes and work through a lot of parsers can be inaccurate (with an accuracy rate of about information. As a result, recruiters have to deal with work 90%) [5]. overload, sometimes delaying the whole recruitment Statistical parsers use numerical models of text to identify process. Therefore, an emerging technology to automate key elements of a resume. To be accurate, statistical the information extraction process can be considered a parsers must be trained on a large number of resumes rational way to control and presumably speed up a major containing all the information to be extracted. In terms of process in recruitment [2]. The central question of this accuracy, statistical parsers fall between keyword-based research is: How can the automation of information parsers and grammar-based parsers [6]. extraction from resumes be achieved with new AI-based parsers use machine learning and artificial technologies? intelligence techniques. These models can improve over The CV parsing technology converts resume data from time by analyzing more information. AI-based parsers free-form into structured format. This conversion offer an extremely high level of accuracy compared to facilitates the storage, synthesis, and processing of other CV parsing techniques available on the market [7]. information contained in resumes, thus enabling their use 222 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. Recent applications combine OCR, Computer Vision, and need for adaptive, high-accuracy solutions that can Natural Language Processing (NLP) techniques to understand both document structure and extract advance the capabilities of resume information extraction meaningful entities while maintaining contextual from various formats and structures [8]. relationships across different resume sections [9]. Despite advances in resume parsing technologies, existing To better position the proposed contribution, Table 1 solutions still face significant challenges in effectively presents a structured comparison of existing studies on handling the spatial and semantic aspects of resume resume parsing. It outlines the datasets used, documents simultaneously. Current approaches either methodological approaches, performance levels, and key focus on visual structure or textual content, but rarely limitations of each system. This comparative summary integrate both dimensions effectively. Additionally, most highlights the need for a unified system that integrates commercial systems rely on rule-based methods with both spatial and semantic understanding of resume predefined templates, limiting their ability to process content. diverse resume formats and structures. There remains a Table 1: Summary of Related works in resume information extraction Accuracy / Ref. Dataset Used Method Type Key Techniques Limitations Performance Proprietary HR Format-dependent, [1] Rule-based Heuristics, Templates Not reported docs low adaptability No semantic Internal HR Digital workflows, [2] Rule-based Not reported modeling, template systems automation limitations Summarization, Entity No spatial modeling, [3] 60 resumes ML-based ~85% accuracy extraction weak generalization Mixed NLP, keyword Poor contextual [4] Not specified (Keyword + ~70% accuracy matching understanding ML) Rule-based Chronological No experimental [5] Literature-based N/A (Survey) parsing, analysis validation Text image ~85% OCR No classification or [6] OCR-only docs OCR recognition accuracy entity recognition Business OCR, Deep Learning No spatial-semantic [7] DL-based ~90% accuracy resumes pipeline integration No section NLTK-based entity [8] English CVs NLP + ML Not specified classification, recognition shallow analysis Not end-to-end, Polish IT Section classification, [9] Rule + ML ~88% F1-score limited semantic resumes heuristics modeling As the table shows, while various parsing methods have 2. Construct and annotate a dataset of resumes with been explored, most fail to simultaneously address spatial spatial and semantic labels, layout and deep semantic content. This motivates the 3. Evaluate the performance of machine learning currect hybrid OCR–XGBoost–Transformer pipeline, and deep learning models for section designed to provide accurate, adaptable, and context- classification and entity recognition, aware resume information extraction. 4. Design and implement an integrated, hybrid This study investigates whether integrating spatial layout information extraction pipeline. features with semantic models can improve the accuracy The main contribution of this work is the development of and adaptability of resume information extraction. a novel solution that combines OCR for text recognition, Specifically, we hypothesize that a two-stage pipeline— ML algorithms for text line classification into appropriate combining OCR-based spatial recognition, section sections, and semantic models based on Named Entity classification using XGBoost, and contextual entity Recognition (NER) for information extraction. This extraction via Transformers—will outperform traditional integrated approach addresses both the visual-spatial methods that rely solely on textual content. aspects of resumes and their semantic content, providing To validate this hypothesis, our research follows four main more accurate and comprehensive information extraction objectives: than current systems. 1. Analyze existing approaches and identify their The remainder of this paper is organized as follows: limitations, Section 2 describes the proposed methodology, including the system architecture, dataset preparation, feature A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 223 engineering, and algorithms employed. Section 3 presents 2 Method the experimental results, including classification and entity recognition performance. Section 4 provides a discussion of the results in the context of existing work, 2.1 System architecture with analysis of contributing factors and identified The proposed system employs a multi-stage pipeline limitations. Finally, Section 5 concludes the paper by approach for automated information extraction from summarizing the contributions and outlining directions for resumes. The overall architecture, illustrated in Figure 1, future research. consists of three main components: (1) text recognition and extraction using OCR, (2) text classification to identify resume sections, and (3) semantic information extraction from the classified text segments. Figure 1: System architecture The workflow begins with resume documents that are • Career objective: A short statement describing converted to images to ensure format independence. The the candidate’s professional goals and the type of PaddleOCR [10] model then processes these images to position sought. This helps employers extract text and spatial coordinates. The extracted text understand the candidate’s motivations and lines are classified into appropriate resume sections using expectations. ML models. Finally, semantic models extract specific • Education: Lists academic background, entities of interest from each classified section, such as including institutions attended, their locations, candidate names, skills, education details, and work degrees obtained, and any relevant certifications experience. or training. • Job-related skills: Highlights specific skills 2.2 Dataset preparation and feature relevant to the target job, whether acquired selection through work, internships, volunteer activities, or hobbies. • Professional experience: Provides details on the 2.2.1 Analysis of the Structure and Content of candidate’s work history, including company a CV names, job titles, locations, dates of employment, In preparing a CV, certain sections are commonly and descriptions of roles and responsibilities. included to present relevant information for effective job Relevant internships and volunteer experiences applications. These sections typically include: may also be included. • Personal Information: Includes full name, • Additional information: Covers elements that address, phone numbers (home and mobile), support the application such as language email address, and optionally a personal website. proficiency, computer skills, professional This information allows employers to easily certifications, memberships in professional contact the candidate. organizations, awards, and achievements. A portfolio may also be referenced if applicable. 224 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. • Interests and activities: Includes hobbies and employment gaps, this format categorizes skills leisure activities that reveal aspects of the and highlights accomplishments over job history. candidate’s personality and can highlight soft • Targeted CV: Tailored to a specific job by skills or additional qualifications. emphasizing the qualifications that best match There are four main types of CV formats, each designed the employer’s expectations. It requires the to emphasize different aspects of a candidate’s profile: candidate to carefully analyze the job posting and • Chronological CV: Lists the candidate’s work customize each CV section accordingly. history in reverse chronological order, starting Combination CV: Merges the chronological and with the most recent position. This is the most functional formats. It begins with a summary of commonly used format and suits candidates with key competencies followed by a detailed consistent career progression. chronological work history. This format suits • Functional CV: Focuses on skills and candidates with both strong experience and competencies rather than the sequence of jobs. specialized skills. Ideal for candidates changing careers or with CVs can be presented in several visual formats. Figure 2 illustrates three common layout styles: Figure 2: Common CV layout formats • Single-Column CV (Figure 2.a): A traditional Each of these formats offers unique advantages depending layout where sections are arranged vertically on the candidate’s profile and the industry expectations. from top to bottom. It offers clarity and simplicity, making it easy for recruiters to read 2.2.2 Dataset construction for classification through the information. models • Two-Column CV (Figure 2.b): Divides the A dataset of 200 French resumes in PDF format was page into two main areas. The left column collected from the HR department of Intelcia IT Solutions typically contains personal details and key skills, [11]. Each resume was converted to image format to while the right includes professional experience, facilitate consistent processing across different layouts education, and other supporting content. This and styles. The PaddleOCR model was applied to extract format improves information organization and both textual content and spatial information of each text visual balance. line. • Creative or Free-Form CV (Figure 2.c): Often Feature engineering focused on capturing the spatial used in artistic or design-related fields, this relationships between text lines and section headings format allows for greater customization, within resumes. Two key types of features were including asymmetric columns, infographics, developed: colored blocks, or icons. It provides a 1 Distance-based features: Normalized horizontal personalized and visually distinctive presentation and vertical Euclidean distances between each of qualifications. text line and section headings were calculated. For text lines and section headings on different A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 225 pages, a specialized distance calculation was • Hidden layers: Two hidden layers with 64 and 32 implemented that accounted for page breaks. neurons respectively 2 Positional features: Binary features indicating • Activation function: ReLU for hidden layers and whether a text line appeared above or below each Softmax for output layer section heading were created and encoded using • Output layer: 10 neurons corresponding to the LabelEncoder. resume section classes 3 The dataset was manually labeled with ten The model was configured with the following classes: nine representing common resume hyperparameters: sections (Experience, Education, Skills, Projects, • Optimizer: Adam with learning rate of 0.001 Certification, Languages, Interests, Software, • Loss function: Categorical cross-entropy and Personality) and a tenth class "Other" for text • Batch size: 32 not belonging to any standard section. In total, • Training epochs: 50 10,000 text lines were labeled to create the • Early stopping: Patience of 5 epochs monitoring training corpus. validation loss ANNs were selected for comparison due to their proven 2.2.3 Dataset preparation for semantic models effectiveness in text classification tasks and ability to learn For the semantic extraction task, text lines were grouped complex non-linear relationships between features [14]. according to their predicted section classifications to XGBoost was selected due to its proven performance in provide contextual information. The Doccano annotation similar structured classification tasks. It offers efficient tool was used to manually annotate named entities within handling of sparse and imbalanced data, robust each section. A total of eight entity types were defined for regularization, and interpretable feature contributions. As annotation: Name, Email, Phone, Education, Experience, demonstrated later in Section 3, XGBoost outperformed Skills, Language, and Certification. These categories were alternatives such as Random Forest, ANN, and SVM, selected based on relevance to recruitment use cases and confirming its suitability for the classification of OCR- availability across most CVs in the dataset. The annotated extracted resume sections. text was then processed and converted to the JSONL format required by SpaCy [12] for NER model training. 2.3.3 Support Vector Machine for section classification 2.3 Key algorithms The Support Vector Machine (SVM) model was implemented with the following configuration: 2.3.1 XGBoost for section classification • Kernel: Radial Basis Function (RBF) The eXtreme Gradient Boosting (XGBoost) algorithm • C parameter (regularization): 10 was selected for resume section classification based on its • Gamma parameter: 0.01 superior performance. XGBoost is an ensemble learning • Decision function: One-vs-Rest for multi-class method that builds sequential decision trees to minimize classification residual errors. It excels at capturing complex feature • Probability estimates: Enabled interactions and handling non-linear relationships [13]. SVMs were chosen for comparison due to their The model was configured with the following traditionally strong performance in text classification hyperparameters: tasks with moderate-sized datasets and their effectiveness • Maximum tree depth: 3 with high-dimensional feature spaces. The RBF kernel • Number of estimators: 100 was selected after preliminary testing showed superior • Learning rate: 0.1 performance over linear and polynomial kernels for This hyperparameter implementation allowed the model capturing the complex relationships in the spatial and to balance complexity and generalization, as well as better positional features [15]. capture the learning capabilities of the spatial features. These implementations were evaluated using the same XGBoost was also implemented very well in terms of its train-test split and evaluation metrics as the XGBoost ability to mitigate model limitations from previous model to ensure a fair comparison of performance across classification models we attempted in the study. all three classification approaches. 2.3.2 Artificial Neural Network for section 2.3.4 Transformers model for named entity classification recognition The Artificial Neural Network (ANN) was implemented For the semantic information extraction component, a as a multilayer perceptron with the following architecture: Transformers-based model was implemented using • Input layer: Matching the dimensionality of the SpaCy's framework. The overall workflow for semantic feature set model construction is illustrated in Figure 3. 226 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. Figure 3: General workflow for semantic model construction Transformers use an attention mechanism to capture 𝑇𝑃 𝑇𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = = (3) contextual relationships between words in text sequences 𝑃 𝑇𝑃 + 𝐹𝑁 [16]. The semantic extraction model was built using the 2 × 𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 CamemBERT-based Transformer model, implemented 𝐹1 𝑆𝑐𝑜𝑟𝑒 = (4) 𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 through SpaCy v3.5 using the fr_dep_news_trf pipeline. Where: CamemBERT is pretrained on large-scale French- • 𝑇𝑃: True Positive, the number of cases where the language datasets (including OSCAR and CCNet) and model correctly predicts a positive class employs a SentencePiece tokenizer. This choice ensured • 𝑇𝑁 ∶ True Negative, the number of cases where linguistic compatibility with the French resume dataset the model correctly predicts a negative class used for training. The model was fine-tuned on eight entity categories (Name, Email, Phone, Education, Experience, • 𝐹𝑃 : False Positive, the number of cases where Skills, Language, and Certification) for 80 epochs, using the model incorrectly predicts a positive class the Adam optimizer and a warm-up learning rate schedule • 𝐹𝑁 : False Negative, the number of cases where with early stopping enabled. the model incorrectly predicts a negative class Training was conducted on a standard GPU environment Then, for the multi-class evaluation in this study, macro- available using Google Colab, with an average epoch averaging was employed, which calculates in the runtime of 4 minutes and a total training duration of equations 5 and 6 the metric independently for each class approximately 5.5 hours. The final model was exported in and then takes the average. This approach gives equal SpaCy's DocBin format for deployment. weight to all classes regardless of their frequency in the The workflow begins with the classified text segments dataset: from the previous stage, which are then processed for 𝑛 annotation. After manual annotation using Doccano, the 1 annotated text data is preprocessed and structured into the 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑚𝑎𝑐𝑟𝑜_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = ∑ 𝑃 ( ) 𝑛 𝑘 5 required format for model training. The model is then 𝑘=1 𝑛 trained using the prepared dataset and evaluated against 1 𝑅𝑒𝑐𝑎𝑙𝑙𝑚𝑎𝑐𝑟𝑜_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = ∑ 𝑅 𝑛 𝑘 (6) test data before final deployment. 𝑘=1 The model was configured using a base configuration file Where 𝑃𝑘 is the precision for class 𝑘, 𝑅𝑘is the recall for that defined: class 𝑘, and 𝑛 the total number of classes. This evaluation • Architecture parameters ensures that performance on less frequent resume sections • Training hyperparameters was properly assessed [18]. • Optimizer settings • Feature extraction components 2.5 Pipeline overview – pseudocode The Transformers model was selected because of its ability to capture long-distance dependencies and The full hybrid workflow is summarized below to contextual information, which is particularly valuable for illustrate the integration of the components described identifying named entities in resume text where formatting above. and context provide important cues. 2.4 Evaluation metrics Performance evaluation for both classification and NER models was conducted using standard metrics for multi- class classification problems [17]. First, the basic metrics for a single class are defined in equations 1 to 4: 𝑇𝑃 + 𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1) 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 𝑇𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2) 𝑇𝑃 + 𝐹𝑃 A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 227 3 Experimental results comprising 20% of the labeled data. Table 2 and figure 3 present a comparative analysis of their performance based on the evaluation metrics. 3.1 Performance comparison of classification models The evaluation of the three classification models (ANN, SVM, and XGBoost) was conducted using a test dataset Table 2: Performance comparison of ANN, SVM, and XGBoost Algorithms Model Accuracy Macro-average precision Macro-average recall Macro-average F1-Score ANN 80.7% 65.7% 77.2% 71.0% SVM 72.5% 51.8% 66.7% 58.3% XGBoost 96.5% 94.7% 95.3% 95.0% Figure 4: Performance comparison of classification and NER Models As evident from Table 2 and figure 4, XGBoost Table 3: Comparison of NER Models: tok2vec and significantly outperformed the other models across all Transformers metrics. The model achieved an impressive accuracy of Macro- Macro- Macro- Model Accuracy average average average F1- 96.5%, indicating its superior ability to correctly classify precision recall Score text lines into their respective CV sections. Furthermore, tok2vec 73% 53% 66% 58.7% the high macro-average precision (94.7%) and recall (95.3%) values show XGBoost's robust performance Transformers 82% 72% 79% 75.3% across all classes, including minority classes. The Transformers model beat the tok2vec on the 3.2 Performance of semantic models for evaluation metrics overall. With an accuracy of 82%, the named entity recognition Transformers model was more accurate when classifying named entities in resume text. Two NER models were evaluated for their effectiveness in extracting named entities from the classified text: tok2vec and Transformers. Table 3 summarizes their 3.3 Analysis of XGBoost's superior performance after 80 training epochs. performance XGBoost's better performance can be attributed to several factors related the nature of the algorithm: • Boosting technique: XGBoost is based on a gradient boosting method that sequentially builds 228 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. new models to correct the mistakes of prior and classifier performance improves by models. XGBoost is able to learn from emphasizing the most important features. previously misclassified items and iteratively • Regularization techniques: It is noteworthy to improves prediction performance. say that XGBoost uses regularisation parameters • Handling complex data: XGBoost can fit that can assist with the likelihood of overfitting, complex relationships between features, and adding to the good performance of the model on moreover, can capture non-linear relationships. unseen data. This is significant for resume texts where the Figure 5 shows the confusion matrix for the XGBoost spatial relationship between the text lines and model which is indicative of its overall, high classification section headings influences their classification. performance across all CV sections. • Feature importance analysis: The algorithm, in its own fashion, defines the most useful features, Figure 5: Confusion matrix for XGBoost Model (values in %) • Sequential processing advantage: The The matrix shows a very high level of accuracy (93-98%) classification of individual lines of text with virtually no confusion across sections where error accurately positioned the model to better achieve estimates like Experience and Projects only had a 1-2% entity extraction in the Transformers module estimating error, showcasing how well the XGBoost with better understanding of context. method was able to handle the complexity of resume data. 3.5 Significance of the two-stage approach 3.4 Transformers model performance An important finding of this research is the utility of the analysis two-stage information extraction process: The superior performance of the Transformers models 1. The first stage incorporates XGBoost to classify generally in NER task has many reasons: each of the text lines into its respective CV • Attention mechanism: The Transformers model section to help clarify for semantic analysis. uses an attention mechanism that enables it to 2. The second stage incorporates the Transformers model contextual relationship between words. It model, which analyses the semantic meaning and can inspect words within a larger context where extracts the relevant entities. it is appearing, which enhances the accuracy of This process provides a solution to one of the main entity recognition. challenges of resume parsing, which was the distribution • Contextual understanding: Rather than only of the spatially located text within the image. By focusing on the local patterns in the word incorporating the organization of the text into sections sequences like in the tok2vec model, the before doing a semantic analysis, we are able to achieve a Transformers model can also model the long- higher degree of accuracy with information extraction distance dependency between the words to get a than purely undertaking a semantic analysis of the CV. more comprehensive understanding of all context The results also indicate that the semantics of translating in the text. visually oriented information into semantic information creates an additional language processing dimension that A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 229 goes beyond a one-dimensional text analysis and includes 5 Conclusion both visual information and spatial language processing. This research introduces a novel hybrid AI solution for automated resume information extraction, combining 4 Discussion OCR with Machine Learning for text classification The results presented in the previous section demonstrate (achieving 96.5% accuracy with XGBoost) and Deep that the hybrid pipeline outperforms traditional resume Learning for semantic understanding (reaching 82% parsing approaches in terms of accuracy, generalization, accuracy with Transformers). The approach addresses the and contextual understanding. Specifically, the XGBoost challenge of resumes as spatially distributed text, where classifier achieved a 96.5% accuracy in section both layout and content provide crucial semantic context, classification, and the Transformers model reached 82% demonstrating that considering spatial positioning accuracy in named entity recognition. enhances resume parsing accuracy. When compared to prior studies summarized in Table 1: While the current implementation faces limitations • Methods relying on rule-based or keyword including language dependency, sensitivity to extreme techniques [1], [2], [4] showed limited formatting variations, and substantial training data adaptability to diverse resume formats and requirements, several promising research directions lacked semantic depth. emerge. Future work should explore deeper integration of • Machine Learning-only approaches such as [3] visual and semantic elements, extend the approach to achieved moderate performance (~85%) but did multi-dimensional text analysis beyond traditional linear not incorporate spatial features or layout context. processing, and investigate techniques requiring less • Deep Learning models in [7], although promising labeled training data. This research ultimately points (~90%), still treated resumes as flat text, without toward a new domain of natural language processing that segment-level classification or layout awareness. incorporates spatially-oriented language understanding In contrast, the proposed pipeline integrates both spatial with applications extending beyond resume parsing to (layout-aware) features and semantic (contextual) other complex document types. representations, which contributes to improved classification and entity recognition. The two-stage design References ensures that the semantic model receives pre-structured [1] Kessler, R., Torres-Moreno, J. M., & El-Bèze, M. input, enhancing its ability to extract relevant entities with 2010. E-Gen: automatic processing of human higher precision. resources information. Document numérique, 13(3), The superior performance of the XGBoost model can be 95–119. attributed to: [2] Baudoin, E., Déroulède, B., Diné, S., Dubouloz, M.- • Fine-grained spatial features (e.g., distances, A., & Peretti, J.-M. 2019. Digital recruitment. In relative positions), Digital transformation of the HR function (pp. 49– • Strong regularization and ensemble learning 101). Paris: Dunod. characteristics, [3] Khan, N., Khan, K., Naveed, S., Nabi, N., Qureshi, • Efficient handling of imbalanced or non-linear M., & Naveed, N. 2023. Resume Parser and class boundaries. Summarizer. International Journal of Advanced Likewise, the use of Transformers for NER offers Research in Science, Communication and advantages in: Technology, 3(1), 35–42. • Capturing long-range dependencies across lines [4] Olorunshola, O. E., Ampitan, I. O., Adamu-Fika, F., within the same section, & Ademuwagun, A. K. (2025). An Enhanced K-NN • Handling resume-specific terminology through Algorithm Leveraging BERT Techniques for contextual embeddings, Resume Parsing System. Asian Journal of Research • Generalizing well across structurally diverse in Computer Science, 18(7), 49-59. documents. [5] Aakankshu, R., Kariya, J., Khant, D., Khandare, S., Some failure cases were observed in: & Barve, P. 2020. A Systematic Literature Review • Highly unstructured or creative resume formats (SLR) on the beginning of resume parsing in HR (e.g., asymmetric layouts), Recruitment Process & SMART advancements in • Multilingual resumes, where OCR and entity chronological order. Research Square. recognition performance dropped, https://assets.researchsquare.com/files/rs- • Misclassification between "Projects" and 570370/v1/9da1a6e1-437f-4f6d-a021- "Experience" when boundaries were unclear. 743ea3ee268e.pdf These cases highlight potential improvements through [6] Gomathy, C. K. 2022. OPTICAL CHARACTER layout-aware Transformers or multimodal embeddings RECOGNITION. ResearchGate. that fuse visual and textual signals. https://www.researchgate.net/publication/36062008 5_OPTICAL_CHARACTER_RECOGNITION [7] Sarhan, A. M., Ali, H. A., Wagdi, M., Ali, B., Adel, A., & Osama, R. (2024). CV Content Recognition 230 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. and Organization Framework based onYOLOv8 and Tesseract-OCR Deep Learning Models. [8] Pokharel, P. 2022. Resume parser using NLP. ResearchGate. https://www.researchgate.net/publication/36177201 4_RESUME_PARSER [9] Wosiak, A. 2021. Automated extraction of information from Polish resume documents in the IT recruitment process. Procedia Computer Science, 192, 2432–2439. https://doi.org/10.1016/j.procs.2021.09.012 [10] Malik, S., et al. 2020. XGBoost: A Deep Dive into Boosting. ResearchGate. https://www.researchgate.net/publication/33949915 4_XGBoost_A_Deep_Dive_into_Boosting_Introdu ction_Documentation [11] Gao, S., Kotevska, O., Sorokine, A., & Christian, J. B. (2021). A pre-training and self-training approach for biomedical named entity recognition. PloS one, 16(2), e0246310. [12] Kumar, M., Chaturvedi, K. K., Sharma, A., Arora, A., Farooqi, M. S., Lal, S. B., ... & Ranjan, R. (2023). An algorithm for automatic text annotation for named entity recognition using Spacy framework. ICAR, Delhi, India, Tech. Rep. [13] Chen, T., et al. 2015. XGBoost: extreme gradient boosting. R package version 0.4-2, 1(4), 1–4. [14] Lee, J. Y., Dernoncourt, F., & Szolovits, P. 2017. Transfer learning for named-entity recognition with neural networks. arXiv preprint, arXiv:1705.06273, 1–5. [15] Panja, S., Chatterjee, A., & Yasmin, G. 2018. Kernel Functions of SVM: A Comparison and Optimal Solution. In Advanced Informatics for Computing Research (pp. 88–97). Singapore: Springer. https://doi.org/10.1007/978-981-13-3140-4_9 [16] Ghaith, S. 2024. The triple attention transformer: advancing contextual coherence in transformer models. Evolutionary Intelligence, 17(5), 3723– 3744. [17] Riyanto, S., Imas, S. S., Djatna, T., & Atikah, T. D. 2023. Comparative analysis using various performance metrics in imbalanced data for multi- class text classification. International Journal of Advanced Computer Science and Applications, 14(6). [18] Grandini, M., Bagli, E., & Visani, G. 2020. Metrics for multi-class classification: an overview. arXiv preprint, arXiv:2008.05756. https://doi.org/10.31449/inf.v49i12.9254 Informatica 49 (2025) 231-244 231 Deformation Suppression Method for the CNC Machining Process of Parts Based on a Single Neuron PID Tiejun Liu1*, Ke Chen1,2 1Zhejiang Guangsha Vocational and Technical University of construction, Zhejiang, DongYang,322100, China 2Hengyang Valin Steel Pipe Co., Ltd., Hunan, Hengyang, 421001, China E-mail:liutiejun895485@163.com *Corresponding author Keywords: CNC machining, deformation suppression, single-neuron PID, predictive control, smart manufacturing Received: May 16, 2025 Computer Numerical Control (CNC) machining plays a vital role in modern precision manufacturing but often suffers from part deformation due to thermal and mechanical stresses, compromising dimensional accuracy. Traditional CNC systems lack adaptive intelligence, operating with static parameters and failing to address real-time deformation risks. This study proposes an intelligent deformation suppression method using a lightweight single-neuron-based Proportional-Integral-Derivative (PID) neural model, termed NeuroPID-CNC, to predict and mitigate deformation during machining. The model was trained and tested on the CNC-DeformControl dataset containing machining parameters such as cutting speed, feed rate, depth of cut, tool temperature, and material type. Data preprocessing involved normalization and categorical encoding. The NeuroPID-CNC model, structured as a binary classifier with a single hidden neuron using a sigmoid activation function and Adam optimizer, was trained on 70% of the data and evaluated on the remaining 30%. It achieved 92% accuracy, 90% precision, 93% recall, 91.5% F1- score, and 0.84 MCC, outperforming conventional algorithms like SVM, RF, LR, and KNN. A real-time feedback loop further enables adaptive learning. The NeuroPID-CNC approach effectively predicts deformation risks and recommends real-time control actions, enhancing machining reliability and reducing material waste. This makes it a promising solution for smart, adaptive manufacturing environments. Povzetek: Za preprečevanje deformacij med CNC obdelavo je predlagana metoda NeuroPID-CNC, lahki nevronski model z enim nevronom, ki posnema PID regulator. Model je dosegel visoko točnost pri napovedovanju tveganja deformacije in priporoča prilagoditve v realnem času (npr. hitrost rezanja), s čimer izboljša zanesljivost in kakovost izdelkov. 1 Introduction impacted by numerous factors such as tool temperature, material type, cutting forces, and vibration during the 1.1 The background information of this machining process. scientific field 1.2 The current knowledge and advances in Computer Numerical Control (CNC) machining is an this field essential component of modern industrial manufacturing, allowing for the automated, precise fabrication of complex Sensor integration, adaptive control systems, and components from a broad range of materials, including advanced simulation techniques have all contributed metals, plastics, and composites [1]. CNC machines use significantly to the advancement of CNC machining in programmed instructions to control parameters like recent years [3]. Researchers and engineers have used cutting speed, feed rate, tool path, and spindle load [2]. finite element modeling (FEM), real-time feedback This high level of automation improves productivity, systems, and machine learning techniques to track and consistency, and precision in industries ranging from improve machining processes [4]. Numerous studies have aerospace, automotive, and electronics. However, as concentrated on predicting tool wear, improving cutting manufacturing tolerances tighten and precision conditions, and enhancing the surface finish [5]. Adaptive requirements rise, even minor distortions during control algorithms like fuzzy logic, conventional PID machining can result in unacceptable defects, raised controllers, and deep learning-based methods have been rework rates, and wasted resources. These distortions, proposed to tackle machining variability. Despite these often referred to as machining-induced deformations, are improvements, numerous control systems still depend on 232 Informatica 49 (2025) 231-244 T. Liu et al. fixed or heuristic-based logic that cannot continuously 1.5 The main method(s) used in this research learn or adapt to the machining setting. To achieve the research objectives, a novel algorithm 1.3 The current problem/issue that needs to called NeuroPID-CNC was created and trained on a be solved or addressed urgently curated dataset called CNC-DeformControl, which includes critical machining parameters like cutting speed, One of the most persistent and pressing issues in CNC feed rate, depth of cut, tool temperature, material type, and machining is the inability of current systems to forecast others. The methodology included several key stages: data and avoid part deformation in real time [6]. Deformation preprocessing by categorical encoding and normalization; causes dimensional inaccuracies, structural weaknesses, building of a lightweight single-neuron neural network and higher manufacturing costs [7]. Existing PID model that simulates PID control behavior; training and controllers and other conventional control strategies are evaluation of the model utilizing binary classification not well-suited to capture the nonlinear, dynamic nature of metrics such as accuracy, precision, recall, and F1-score; machining-induced deformation, particularly in high- and integration of a real-time feedback strategy to allow speed or multi-material machining settings [8]. online learning and continual enhancement. To guarantee Additionally, there is a lack of lightweight and efficient convergence and computational effectiveness, interpretable models that can operate in real-time, the model makes use of a sigmoid activation function, continuously adapt to novel machining data, and offer binary cross-entropy loss, and the Adam optimizer. In actionable parameter adjustments to minimize addition, real-time control logic is integrated into the deformation risks [9], [10]. The followings are the system, allowing it to automatically adjust crucial hypotheses: machining parameters, such as coolant flow, cutting speed, • Whether a single-neuron-inspired PID control and feed rate, when a high deformation risk is predicted. model accurately forecast the risk of component deformation in CNC machining by utilizing real- 1.6 The importance or impact of this time machining parameters? research to the scientific community • Does the application of a single-neuron-inspired PID control algorithm lead to a substantial This study contributes to the improvement of intelligent decrease in part deformation when compared to CNC control systems by proposing an interpretable and conventional static or PID-based control adaptive control framework that combines conventional methods? PID principles and neural learning capacities. By • Can the dynamic modification of cutting incorporating a single-neuron PID architecture, the conditions, informed by the predictions of the algorithm guarantees low computational overhead while single-neuron PID model, enhance component providing intelligent decision-making in real time. The quality and machining reliability? NeuroPID-CNC method can be incorporated into • Whether a single-neuron neural model more industrial CNC machines to significantly decrease effectively forecast deformation risks in real- material waste, enhance product quality, and lower time CNC operations compared to conventional operating costs. For the scientific community, this classifiers? research opens up new avenues for creating hybrid neuro- control systems, expanding the scope of Industry 4.0, and supporting the evolution of automated manufacturing 1.4 The purpose(s) of doing this research methods. The primary goal of this research is to create an intelligent Controlling deformation and guaranteeing dimensional deformation suppression control algorithm specifically accuracy of machined parts has proven to be a significant designed for CNC machining environments. The study difficulty in CNC machining due to the dynamic and aims to design and execute a single-neuron-inspired PID complex nature of the process. Fan et al. [11] proposed an model that can precisely forecast the risk of part energy-based principle for reducing machining distortion deformation using real-time machining parameters. This in monolithic aircraft parts, which provided insights into study also aims to offer practical control suggestions for residual stress release and deformation prediction. dynamically adjusting cutting conditions to prevent However, their method lacked a real-time compensation deformation, resulting in improved part quality and mechanism. Ma et al. [12] proposed a single-neuron PID- machining dependability. The study addresses the gap in based model that showed success in deformation lightweight, adaptive, and responsive control systems suppression during CNC machining, but it was tested appropriate for contemporary smart manufacturing setups. under limited scenarios and did not take parameter adaptability into account in real time. Kasprowiak et al. [13] used input shaping control to decrease machining vibration, but they neglected to consider feedback Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 233 adaptation during continuous machining. Similarly, Guo Świć et al. [19] studied control methods for elastic- et al. [14] concentrated on suppressing casing vibrations deformable states in turning and grinding shafts. However, in aeroengine elements but did not integrate with tool-path their focus was on low-stiffness shafts, which limits compensation. generalization. Lv et al. [20] created an automated shape Shi et al. [15] presented a compensation model for correction mechanism for wood composites, emphasizing polishing tools in precision CNC polishing, which possibilities in non-metallic materials but having limited enhanced surface quality but was only applicable to application to high-precision metal machining. Yi et al. aspheric surfaces. Hasçelik et al. [16] optimized cutting [21] investigated mesoscale deformation in thin-walled parameters to reduce wall deformation in thin-wall micro- micro-milling, but did not use intelligent adaptive milling. However, their approach was sensitive to tool feedback systems. Korpysa and Habrat [22] explored wear and material variability. Zheng et al. [17] precision milling of magnesium alloys, comparing coated investigated vibration-assisted micro-milling, which and uncoated tools, but lacking dynamic deformation provided useful insight into tool wear reduction but lacked control. Devi et al. [23] used ant lion optimization with general applicability. Gan et al. [18] presented an adaptive TOPSIS analysis to optimize milling parameters, but their backlash compensation method for CNC machines, but its method did not include predictive modeling or feedback effectiveness in complex geometries remains unverified. control. Table 1 shows a summary of related works. Table 1: Summary of related works Ref Study Focus Results Limitations [11] Energy principle for Enhanced prediction of residual No real-time compensation distortion reduction in stress-related deformation mechanism aircraft parts [12] Single-neuron PID model Efficient in simple deformation Not tested under varied real-time for deformation control conditions suppression [13] Input shaping control for Decreased vibration efficiently Lacked adaptive feedback vibration suppression integration [14] Vibration suppression in Improved structural stability Did not incorporate tool-path aeroengine casing milling compensation [15] Tool displacement model Enhanced surface finish in aspheric Particular to aspheric surfaces for CNC polishing polishing only [16] Optimization in micro- Decreased deformation utilizing Sensitive to tool wear and material milling of thin-wall optimized parameters variability geometries [17] Tool wear suppression in Reduced wear through non- Limited generalization across vibration-assisted micro- resonant vibration materials milling [18] Adaptive backlash Decreased mechanical play in Unproven effectiveness for compensation in CNC motion systems complex parts [19] Elastic-deformable state Enhanced dimensional accuracy in Applicable mostly to the turning control in shaft machining low-stiffness components and grinding of shafts [20] Shape correction in wood Automated geometric adjustment Limited relevance to metal CNC composites during continuous pressing applications [21] Deformation control in Superior precision in curved thin- No intelligent feedback or real- mesoscale micro-milling wall parts time control [22] Milling accuracy in Enhanced accuracy utilizing coated No active deformation control magnesium alloys tools included [23] End-milling parameter Multi-objective optimization Static optimization lacks optimization using ant lion attained predictive adaptability and TOPSIS 234 Informatica 49 (2025) 231-244 T. Liu et al. The prior investigations combined offer valuable insights smart deformation suppression control algorithm designed into machining vibrations, deformation mitigation, to predict and reduce the risk of part deformation during parameter optimization, and compensation methodologies. CNC (Computer Numerical Control) machining processes. Nonetheless, several restrictions and substantial gaps It draws on both machine learning and PID control persist in the integration of real-time intelligent control, principles, combining the intelligence of a lightweight including the absence of adaptive feedback, active neural network with real-time process control strategies. deformation control, and model interpretability, among NeuroPID-CNC employs a single-neuron neural network others. This research proposes a lightweight and effective that mimics a PID controller. It accepts machining framework, termed the NeuroPID-CNC model, to address parameters as input (for example, cutting speed, feed rate, the limitations and research gaps identified in prior studies. depth of cut, and temperature) and predicts whether deformation will occur ("Yes" or "No"). If there is a high 2 Materials And methods risk of deformation, the algorithm automatically adjusts the machining settings to prevent it. Algorithm 1 shows This section describes the creation of the NeuroPID-CNC theNeuroPID-CNC algorithm. Algorithm, which predicts and suppresses deformation in CNC machining. The NeuroPID-CNC algorithm is a Algorithm 1: NeuroPID-CNC Input: CNC-DeformControl Dataset (features + Deformation Risk) Output: Predicted Deformation Risk (Yes/No) and control recommendations Begin // Step 1: Data Preprocessing Load dataset D Encode categorical attributes in D Normalize numerical attributesin D Split D into training_set and test_set (70/30) // Step 2: Initialize Single-Neuron PID Model Initialize neural network: - 1 input layer - 1 hidden layer with 1 neuron (PID-like) - 1 output neuron (binary classification) Set activation_function ← Sigmoid Set optimizer ← Adam Set loss_function ← Binary Crossentropy Set biases to zero Employ Glorot Uniform for weight initialization. Implement L2 regularization and configure the batch size to 32. Establish the epoch count at 100. // Step 3: Training Phase Train the model on the trainingset utilizing backpropagation For each period (1 to 100): Randomize training dataset Segment the data into mini-batches of size 32. For each mini-batch: Calculate the output of the hidden layer utilizing the sigmoid function. Calculate the output layer utilizing the sigmoid function. Calculate the binary cross-entropy loss between the expected and actual outputs. Calculate loss Adjust weights and biases via the Adam optimizer Implement L2 regularization during weight adjustments. Apply early stopping to prevent overfitting // Step 4: Evaluation Phase Assess the model on the test set Calculate Accuracy, Precision, Recall, F1-Score, and MCC Display the confusion matrix // Step 5: Real-Time Prediction & Control For each newinput: Encode and normalize new_input prediction ← model.predict(new_input) Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 235 If prediction == "Yes" then Decrease Cutting Speed Increase Coolant Flow Adjust Feed Rate based on Material Type Else Continue with current parameters End If // Step 6: Feedback Loop After machining: Record actual deformation findings Compare the prediction with the actual outcome Update model weights via online learning End binary cross-entropy loss. After training, it uses standard The NeuroPID-CNC algorithm is a smart deformation classification metrics to evaluate previously unseen data suppression control system specifically designed for CNC and predicts deformation risk for new machining machining applications. It employs a lightweight neural conditions in real time. If a high deformation risk is network model that simulates PID behavior using a single- detected, the algorithm adjusts machining parameters neuron architecture to predict whether a machined part is dynamically, such as reducing cutting speed, increasing deformable based on a variety of machining parameters coolant flow, or changing the feed rate based on material such as cutting speed, feed rate, depth of cut, tool properties, to reduce deformation. A feedback mechanism temperature, material type, and others. The process begins is integrated to continuously update the model through with preprocessing the CNC-DeformControl dataset by online learning, improving control accuracy over time. encoding categorical features and normalizing numerical Figure 1 shows the flow diagram of the NeuroPID-CNC ones, then splitting the data into training and testing algorithm. sets. The neural model, which includes a sigmoid-activated hidden neuron, is trained with the Adam optimizer and Figure 1: Flow diagram of NeuroPID-CNC algorithm 236 Informatica 49 (2025) 231-244 T. Liu et al. The flow diagram shows the NeuroPID-CNC algorithm's interface, and spindle load values were derived from the operational pipeline for predicting and controlling spindle drive system's onboard diagnostics. Categorical deformation during CNC machining. It starts with the variables, such as tool wear and vibration levels, were CNC-DeformControl dataset, which goes through evaluated using image-based inspection, vibration sensors, preprocessing steps such as categorical feature encoding and operator feedback. Surface finish was determined by and numerical feature normalization to ensure algorithm post-process optical inspection and tactile comparison compatibility. The data is then divided into training and with standard roughness gauges. testing sets to aid in model generalization. A single-neuron All collected data was logged in real time by a dedicated PID-inspired neural network is set up with a sigmoid data acquisition system and then stored in a structured activation function, Adam optimizer, and binary cross- format in a relational SQL database hosted on a secure entropy loss function. The model is trained with local server. Data from this database was exported in CSV backpropagation and evaluated on the test set to compute format for preprocessing and training. The dataset is kept performance metrics. For real-time predictions, incoming in a version-controlled environment to ensure data data is encoded and normalized similarly, and the model integrity and traceability during the algorithm predicts the deformation risk. If the risk is identified as development and testing stages. Figure 2 illustrates the "Yes," corrective control actions are automatically data collection process in a controlled CNC machining lab triggered, including reducing cutting speed, increasing environment. coolant flow, and adjusting the feed rate based on the material type, allowing adaptive, intelligent CNC machining. 2.1 Dataset description The CNC-DeformControl dataset is a curated collection of machining data designed to help intelligently predict and suppress part deformation during Computer Numerical Figure 2: Data collection process Control (CNC) operations. It includes 11 key attributes, such as machining process parameters and observed The CNC machine performs operations while sensors and outcomes, spread across several representative entries. tools collect relevant data. Machine diagnostics The dataset's primary goal is to help machine learning (speedometer icon) record cutting speed and feed rate, applications, particularly the NeuroPID-CNC algorithm, infrared sensors measure thermal data (thermometer icon), understand how different machining conditions affect the image-based analysis inspects tool wear (camera icon), likelihood of part deformation. vibrations are monitored by dedicated sensors (waveform This dataset contains a mixture of numerical and icon), and surface finish is assessed by tactile comparison categorical features. The numerical attributes—Cutting to roughness gauges (touch icon). All sensor data is Speed (in RPM), Feed Rate (in mm/rev), Depth of Cut (in captured in real time and securely stored in a structured mm), Tool Temperature (in °C), and Spindle Load (as a SQL database (database icon). For model training and percentage)—measure the operational intensity of analysis, data is exported from SQL and converted to CSV machining. These parameters have a direct impact on heat format (CSV file icon). This pipeline provides high- generation, mechanical stress, and material removal quality, structured data for machine learning applications efficiency. In contrast, categorical attributes such as in deformation risk prediction. material type (e.g., aluminum, steel, brass, plastic), tool Overall, the CNC-DeformControl dataset provides a wear, vibration, coolant flow, and surface finish provide compact but meaningful representation of the machining qualitative information about the machining environment. landscape, capturing both measurable and observational These factors have an impact on part integrity through variables required for training intelligent deformation physical wear, thermal control, and vibration dampening. prediction systems like NeuroPID-CNC. The Deformation Risk field, labeled as "Yes" or "No," serves as the target variable that indicates whether the 2.2 Data preprocessing machined part showed signs of deformation under the given conditions. To ensure that the CNC-DeformControl dataset is ready The data was gathered in a controlled CNC machining lab for machine learning, extensive preprocessing steps are environment outfitted with industrial-grade sensors and used. The dataset contains a mix of numerical and monitoring equipment. Cutting speed, feed rate, and depth categorical features that must be represented consistently of cut were programmed and recorded directly from the for the algorithm to correctly interpret the data. CNC machine interface. Thermal readings were obtained Categorical attributes like Material Type, Tool Wear, using infrared sensors mounted near the tool-workpiece Vibration, Coolant Flow, and Surface Finish are Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 237 numerically encoded using one-hot encoding, which converts categorical values into a binary matrix format. 2.3 Model initialization: Single-Neuron PID The one-hot encoding process transforms a categorical structure variable into a binary vector representation as shown in Eq. (1): The proposed model is a simple neural structure inspired by the PID control principle that consists of only one 𝑂𝑛𝑒𝐻𝑜𝑡(𝑥𝑖) = [𝑥𝑖 = 𝑐1, 𝑥𝑖 = 𝑐2, … , 𝑥𝑖 = 𝑐𝑛 ] (1) hidden neuron. This neuron simulates the adaptive control behavior of a PID controller by receiving preprocessed Where: machining inputs from the input layer and computing a 𝑥𝑖 is a categorical value, nonlinear transformation for prediction. The final output is 𝑐1, 𝑐2, … , 𝑐𝑛 are the unique categories, produced by a single output neuron equipped with a Each comparison 𝑥𝑖 = 𝑐𝑗yields 1 if true, else 0. sigmoid activation function, which converts the weighted This transformation is critical for allowing the single- sum of inputs into a deformation probability expressed by neuron model to interpret non-numeric data while Eq. (4): preserving categorical relationships without imposing artificial ordering. 1 (4) 𝜎(𝑧) = Simultaneously, all numerical attributes—Cutting Speed, 1 + 𝑒−𝑧 Feed Rate, Depth of Cut, Tool Temperature, and Spindle Where: Load—are normalized utilizing Min-Max scaling, which z = weighted sum of inputs rescales each feature to lie within the range [0, 1]. This is σ(z) = output value in the range [0, 1] representing mathematically expressed by Eq. (2): deformation risk The term 𝑒−𝑧represents the exponential function with a 𝑥 − 𝑥𝑚𝑖𝑛 negative exponent, which is a fundamental mathematical 𝑥𝑛𝑜𝑟𝑚 = (2) 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 expression describing exponential decay. It is the inverse of the natural exponential function 𝑒𝑧, where 𝑒 is Euler’s Where: number (approximately 2.71828). This function plays a 𝑥 = original value of the feature key role in the sigmoid activation function by controlling 𝑥𝑚𝑖𝑛= minimum value of the feature in the dataset how sharply the output transitions between 0 and 1 based 𝑥𝑚𝑎𝑥= maximum value of the feature in the dataset on the input 𝑧. 𝑥𝑛𝑜𝑟𝑚= normalized value of the feature Mathematically, 𝑒−𝑧can be expressed using its infinite This normalization ensures that no feature dominates series expansion in Eq. (5): others due to varying scales, resulting in balanced ∞ contributions throughout training. After normalization and (−𝑧)𝑛 𝑒−𝑧 encoding, the dataset is randomly divided into two subsets: = ∑ (5) 𝑛! 70% for training and 30% for testing. This split preserves 𝑛=0 model generalization and ensures that evaluation is where: performed on unseen data. The dataset 𝐷 is randomly split 𝑧 is the weighted sum of inputs, into training and testing subsets using the Eq. (3): 𝑛! denotes the factorial of 𝑛, and the series sums over all non-negative integers 𝑛. 𝐷 = 𝐷 This logistic function guarantees that the model’s output 𝑡𝑟𝑎𝑖𝑛 ∪ 𝐷𝑡𝑒𝑠𝑡 , (3) 𝑤ℎ𝑒𝑟𝑒 |𝐷𝑡𝑟𝑎𝑖𝑛|= 0.7|𝐷|, lies between 0 and 1, representing the probability of |𝐷 deformation risk under current machining conditions. The 𝑡𝑒𝑠𝑡|= 0.3|𝐷| model is trained utilizing the binary cross-entropy loss Where: function, defined in Eq. (6), which measures the D: The complete preprocessed dataset after normalization discrepancy between predicted and actual outcomes: and encoding. 𝐷 𝐿 = −[𝑦. log(?̂?) + (1 − 𝑦). log (1 − ?̂?)] (6) 𝑡𝑟𝑎𝑖𝑛: The training subset of the dataset utilized to train the model. 𝐷 Where: 𝑡𝑒𝑠𝑡: The testing subset of the dataset utilized to evaluate y = actual class label (0 for no deformation, 1 for the model’s performance. ∣ deformation) D∣: The total number of data instances (rows) in the full ?̂?= predicted probability of deformation dataset 𝐷. |𝐷 𝐿 = loss value that penalizes prediction errors 𝑡𝑟𝑎𝑖𝑛 |: The number of instances in the training set, equal Here, 𝑦 is the actual binary label (0 for “No Deformation” to 70% of the total dataset. |𝐷 and 1 for “Yes”), while ?̂? is the predicted probability. The 𝑡𝑒𝑠𝑡 |: The number of instances in the test set, equal to model’s weights are optimized utilizing the Adam 30% of the total dataset. 238 Informatica 49 (2025) 231-244 T. Liu et al. optimizer, a robust gradient descent variant that adapts TP = True Positives (correctly predicted deformations) learning rates for quicker and more stable convergence. At TN = True Negatives (correctly predicted non- each iteration 𝑡, the parameters 𝜃𝑡are updated as follows: deformations) FP = False Positives (incorrectly predicted deformations) 𝑚𝑡 = 𝛽1𝑚𝑡−1 + (1 − 𝛽1)𝑔𝑡 (7) FN = False Negatives (missed deformations) Precision quantifies the fraction of predicted "Yes" 𝑣𝑡 = 𝛽2𝑣 2 𝑡−1 + (1 − 𝛽2)𝑔𝑡 (8) (deformation) true cases: 𝑚𝑡 𝑇𝑃 (14) ?̂?𝑡 = (9) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 1 − 𝛽𝑡 1 𝑇𝑃 + 𝐹𝑃 𝑣𝑡 Recall reflects the model’s ability to identify all actual ?̂?𝑡 = (10) 1 − 𝛽𝑡 2 "Yes" cases: ?̂?𝑡 𝑇𝑃 (15) 𝜃𝑡 = 𝜃𝑡−1 − 𝛼 (11) 𝑅𝑒𝑐𝑎𝑙𝑙 = √?̂?𝑡 + 𝜖 𝑇𝑃 + 𝐹𝑁 where 𝑔𝑡is the gradient at iteration t, 𝑚𝑡and 𝑣𝑡are the F1-Score, the harmonic mean of precision and recall, biased first and second moment estimates, ?̂?𝑡and ?̂?𝑡are offers a balanced view: their bias-corrected estimates, 𝛼 is the learning rate, 𝛽1and 𝛽2are decay rates for these moments, and 𝜖 is a small 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 (16) constant to prevent division by zero. 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 2.4 Training phase MCC computes the quality of binary and multiclass classifications by considering true and false positives and During training, the model aims to reduce the loss function negatives, providing a balanced score even with via backpropagation, an algorithm that calculates the imbalanced datasets. gradient of the loss concerning each model weight. The weight update rule is formalized as showed in Eq. (12): (𝑇𝑃∗𝑇𝑁)−(𝐹𝑃∗𝐹𝑁) 𝑀𝐶𝐶 = √(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁) 𝜕𝐿 ∆𝑤 = −𝑛. (12) (17) 𝜕𝑤 Where: These metrics provide a comprehensive view of model w = change in weight performance in predicting deformation risks. 𝜂 = learning rate 𝜕𝐿 2.6 Real-time prediction and control = gradient of the loss function concerning weight 𝑤 𝜕𝑤 The trained model is deployed for real-time prediction The training process iterates through numerous epochs, during CNC operations. When a novel machining adjusting weights after each batch of training examples. To configuration is initiated, the input values are first prevent overfitting, early stopping is executed: training processed (encoded and normalized) as per training halts if the validation loss fails to improve over a routines. The model then generates an output probability?̂?. predefined number of epochs. This strategy improves If ?̂? > 0.5, the system flags a high deformation risk. In model generalization on new, unseen CNC conditions. such cases, immediate corrective actions are triggered by predefined control logic. For instance, a high-risk flag 2.5 Evaluation phase prompts a 10% reduction in cutting speed, utilizing the formula: After training, the model’s efficiency is assessed on the testing set utilizing standard classification metrics. These 𝑁𝑒𝑤 𝐶𝑢𝑡𝑡𝑖𝑛𝑔 𝑆𝑝𝑒𝑒𝑑 (18) metrics assess the model’s capability to correctly predict = 𝑂𝑙𝑑 𝐶𝑢𝑡𝑡𝑖𝑛𝑔 𝑆𝑝𝑒𝑒𝑑 × 0.9 deformation risk: Accuracy measures the ratio of correct predictions to total Where: samples: "Old Cutting Speed" = initial programmed cutting speed "New Cutting Speed" = adjusted speed to reduce stress on 𝑇𝑃 + 𝑇𝑁 (13) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = the workpiece 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 Where, Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 239 This reduction reduces both mechanical and thermal stress equations, this system creates a rigorous yet practical on the workpiece. Other adaptive responses, like framework for real-time decision-making and long-term increasing coolant flow or decreasing feed rate, are improvement. The result is a smarter, more efficient, and implemented concurrently based on the material type and resilient manufacturing environment. observed vibration. If the expected risk is low, the machining operation continues without intervention, 3 Results ensuring efficiency while maintaining safety. 3.1 Experimental setup 2.7 Feedback loop and online learning All experiments were carried out on a Windows 11 system Following each machining operation, the actual running Python 3.10. The machine was equipped with an deformation outcome is recorded and compared to the Intel Core i7 processor and 16 GB of RAM. TensorFlow, model's prediction. This creates a feedback loop, Scikit-learn, Pandas, NumPy, and Matplotlib were used to increasing the model's adaptability over time. Using train, evaluate, and visualize models. The dataset was online learning, the model gradually updates its weights divided into two sets: training (70%) and testing (30%). using recent prediction errors. The update rule is given by: Early stopping and adaptive learning rate scheduling were 𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + 𝛼. (𝑦 − ?̂?). 𝑥 (19) used to prevent overfitting and speed up convergence. Where: 3.2 Comparison results 𝑤𝑜𝑙𝑑= previous weight 𝑤𝑛𝑒𝑤= updated weight Table 2 compares the classification models used on the 𝛼 = online learning rate (a small constant) CNC-DeformControl dataset, including SVM, Random 𝑦 = actual label (0 or 1) Forest (RF), KNN, Logistic Regression (LR), and the ?̂?= predicted output proposed NeuroPID-CNC model. 𝑥 = input feature value. In a feedback-driven online learning system, predictions Table 2: Performance comparison of classification consistently impact control actions, which then change models future input data. This feedback can exacerbate problems Model Accura Precisi Reca F1- MC if not adequately stabilized. A diminutive learning rate (𝛼) cy (%) on (%) ll Scor C guarantees more gradual weight adjustments and (%) e contributes to stability preservation. An elevated learning (%) rate (𝛼) may induce oscillations or divergence, particularly SVM 88.43 86.22 85.13 85.6 0.76 in feedback systems. As updates rely on prediction error, 7 significant spikes in error can disrupt learning until Random 90.12 89.05 87.60 88.3 0.79 addressed. In practical CNC machining, complete Forest 2 convergence is uncommon. In online learning, weights are KNN 87.30 84.95 84.00 84.4 0.74 adjusted following each data point or small batch, resulting 7 in continual retraining. Periodic full model resets or Logistic 86.75 83.90 83.10 83.5 0.72 reinitializations may be conducted to prevent drift or Regressi 0 overfitting. on This type of incremental learning ensures that the model NeuroPI 92.00 90.00 93.00 91.5 0.84 evolves with real-world data, adapting to unknown D-CNC 0 materials, dynamic wear conditions, or unexpected operational disruptions. By combining real-time prediction The proposed NeuroPID-CNC algorithm had the best with continuous learning, the system grows more robust performance across all metrics tested. It enables real-time and context-aware over time, eventually achieving a self- feedback adaptation and improved learning of improving CNC control mechanism that maximizes deformation-prone patterns. This architecture is extremely machining precision while reducing the risk of costly responsive to subtle patterns in deformation-prone defects. conditions, resulting in higher prediction accuracy and The NeuroPID-CNC algorithm represents an intelligent, robustness. Furthermore, its streamlined structure lightweight, and adaptable solution for predicting and minimizes overfitting, whereas more complex models may suppressing deformation during CNC machining. It require deeper tuning. Figure 3 shows the confusion matrix tightly integrates machine learning principles with control for proposed approach. engineering strategies using a single-neuron PID-inspired structure, strong preprocessing, accurate prediction, and dynamic feedback adaptation. With ten foundational 240 Informatica 49 (2025) 231-244 T. Liu et al. classification errors and improves robustness when dealing with complex interactions between CNC parameters like cutting speed, tool wear, and thermal readings. The model's ability to learn consistently across diverse inputs supports its use in real-time industrial settings. In Figure 5, NeuroPID-CNC leads with a precision of 90%, indicating that it correctly predicts a deformation risk 90% of the time. Figure 3: Confusion Matrix for proposed approach Figure 4 demonstrates that the proposed NeuroPID-CNC model attains the highest accuracy among all evaluated classifiers, reaching 92%. Figure 5: Precision comparison From figure 5, the precision of proposed NeuroPID-CNC approach outperforms SVM, RF, KNN and LR by 4.38%, 1.07%, 5.94% and 7.27% respectively. High precision is required in CNC machining environments to avoid unnecessary operational adjustments caused by false positives. The model's low false alarm rate leads to increased operational efficiency by ensuring that control recommendations (such as reducing cutting speed or increasing coolant flow) are only implemented when there is a genuine risk. This precision advantage stems primarily from the model's ability to learn subtle patterns associated with actual deformation- inducing conditions while filtering out noise from non- Figure 4: Accuracy comparison critical anomalies. Figure 6 shows that NeuroPID-CNC has the highest recall value of 93%, indicating an excellent From figure 4, the accuracy of proposed NeuroPID-CNC sensitivity to actual deformation occurrences. approach outperforms SVM, RF, KNN and LR by 4.03%, 2.09%, 5.38% and 6.05% respectively. This high accuracy demonstrates the model's overall predictive power in correctly identifying deformation risk ("Yes") and non-risk ("No") instances. The superior performance is due to the unique integration of a PID-inspired control mechanism within the neuron, which allows the model to adjust its internal weights with greater precision during training. This reduces Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 241 Figure 6: Recall comparison Figure 7: F1-Score comparison From figure 6, the recall of proposed NeuroPID-CNC From figure 7, the F1-Score of proposed NeuroPID-CNC approach outperforms SVM, RF, KNN and LR by 9.24%, approach outperforms SVM, RF, KNN and LR by 6.81%, 6.16%, 10.71% and 11.91% respectively. 3.60%, 8.32% and 9.58% respectively. A high recall ensures that the model rarely overlooks true The F1-score, which is the harmonic mean of precision and positive cases—an important feature in critical recall, measures the model's overall effectiveness in manufacturing scenarios where undetected deformations handling the binary classification task. This balanced could jeopardize product quality, damage tools, or cause performance indicates that the NeuroPID-CNC model production downtime. This exceptional recall is due to the optimizes both false positives and false negatives, rather model's continuous feedback adjustment loop, inspired by than favoring one over the other. Such a balance is critical the integral component of PID control, which improves in industrial settings where both unnecessary interventions detection sensitivity over time as more real-world and missed deformation risks have financial and machining data is processed. Figure 7 shows that operational implications. Finally, Figure 8 demonstrates NeuroPID-CNC has the best trade-off between precision that NeuroPID-CNC obtained the highest Matthews and recall among all tested models, with an F1-score of Correlation Coefficient (MCC) score of 0.84, which is 91.5%. widely considered one of the most reliable metrics for evaluating binary classifiers, especially on imbalanced datasets. Figure 8: MCC comparison 242 Informatica 49 (2025) 231-244 T. Liu et al. From figure 8, the MCC of proposed NeuroPID-CNC The Single-Neuron PID-Inspired Control is proficient in approach outperforms SVM, RF, KNN and LR by 10.53%, real-time management of dynamic, nonlinear systems, 6.33%, 13.51% and 16.67% respectively. adaptive error learning, feedback-based decision-making, MCC accounts for all four confusion matrix components and resource-constrained applications. The machine (true positives, true negatives, false positives, and false learning models exhibit challenges due to inadequate negatives), providing a more complete picture of model temporal feedback management, rigidity in online performance. The high MCC score confirms that the learning, elevated computational expenses (particularly in model consistently and strongly correlates predicted and random forests and k-nearest neighbors), and limited actual outcomes, regardless of class imbalance. This adaptability in non-stationary control contexts. robust performance ensures reliability and fairness in The results demonstrate the superiority of the proposed prediction decisions over varying dataset distributions and NeuroPID-CNC model in predicting deformation risk machining conditions. during CNC machining. The model's PID-inspired single- McNemar’s test was employed to statistically validate the neuron architecture not only provides superior performance differences across classifiers based on the performance across all standard classification metrics but paired predictions of all models. Table 3 presents the it also ensures operational interpretability and real-time results of the statistical significance test conducted with adaptability. These benefits make it an ideal candidate for McNemar’s test. The suggested method demonstrated smart manufacturing environments where precision, statistically significant superiority over RF (p < 0.001), dependability, and responsiveness are crucial. Future SVM (p < 0.003), KNN (p=0.004), and LR (p<0.005). research will concentrate on implementing the model on industrial edge devices for real-time inference, utilizing Table 3: Statistical Test - McNemar's Test multi-modal sensor data including audio and thermal images, applying transfer learning for enhanced Algorithm McNemar’s p-value generalization, incorporating explainable AI statistic methodologies to augment interpretability, and embedding the model within closed-loop control systems for SVM 42.13 0.002 autonomous CNC parameter modification based on predictive feedback. RF 45.24 0.0001 KNN 39.18 0.004 5 Conclusion LR 37.89 0.0045 This study described the NeuroPID-CNC algorithm, which is a lightweight single-neuron PID-inspired classifier for predicting deformation risk in CNC machining. The 4 Discussion model outperformed traditional classifiers, achieving the highest accuracy, precision, recall, F1-score, and MCC, The single-neuron PID-inspired predictive control proving its suitability for real-time deformation risk technique can surpass machine learning models such as detection and adaptive control in manufacturing. The RF, SVM, KNN, and LR. Single-neuron PID-inspired current model was trained using data from a controlled lab controllers are designed for dynamic system regulation, environment, which may limit its applicability to different combining the advantages of PID control with adaptive machine types and unstructured production scenarios. It features. It can adjust weights in real-time utilizing also focuses solely on binary classification and requires straightforward learning algorithms, rendering it suitable manual feature selection, with no support for multi-output for dynamic, non-linear systems with fluctuating or continuous prediction tasks. Future research will focus conditions. It provides a temporal viewpoint by evaluating on deploying the model on industrial edge devices for real- past errors, the current state, and anticipated future time inference, incorporating multi-modal sensor data behavior, which is consistent with control system needs. such as audio and thermal images, using transfer learning The methodology is interpretable, and its performance can for broader generalization, integrating explainable AI be adjusted using domain expertise (e.g., calibrating techniques to improve interpretability, and embedding the proportional, integral, and derivative influences). model into closed-loop control systems for autonomous Machine learning algorithms are models trained in CNC parameter adjustment based on predictive feedback. batches. They do not readily adapt in real time without expensive retraining. These are computationally intensive, perhaps rendering them unsuitable for real-time embedded control systems. It does not inherently manage temporal dynamics until augmented by time-lagged features, which may still lack responsiveness or interpretability. Deformation Suppression Method for the CNC Machining Process of... Informatica 49 (2025) 231-244 243 References [11] Fan, L., Tian, H., Li, L., Yang, Y., Zhou, N., & He, N. (2020). Machining distortion minimization of [1] Yao, K. C., Chen, D. C., Pan, C. H., & Lin, C. L. monolithic aircraft parts based on the energy (2024). The development trends of computer principle. Metals, 10(12), numerical control (CNC) machine tool 1586. https://doi.org/10.3390/met10121586 technology. Mathematics, 12(13), 1923. [12] Ma, T., Han, Y., & Li, H. (2024). Single neuron PID https://doi.org/10.3390/math12131923 based method for deformation suppression during [2] Soori, M., Jough, F. K. G., Dastres, R., & Arezoo, B. CNC machining of parts. Advanced Control for (2024). Robotical automation in CNC machine tools: Applications: Engineering and Industrial a review. acta mechanica et Systems, 6(2), e122. https://doi.org/10.1002/adc2.122 automatica, 18(3). https://doi.org/10.2478/ama- [13] Kasprowiak, M., Parus, A., & Hoffmann, M. (2022). 2024-0048 Vibration suppression with use of input shaping [3] Mouli, K. C., Prasad, B. S., Sridhar, A. V., & Alanka, control in machining. Sensors, 22(6), 2186. S. (2020). A review on multi sensor data fusion https://doi.org/10.3390/s22062186 technique in CNC machining of tailor-made [14] Guo, L., Yang, F., Li, T., Zhou, M., & Tang, J. (2021). nanocomposites. SN Applied Sciences, 2(5), 931. Vibration suppression of aeroengine casing during https://doi.org/10.1007/s42452-020-2739-7 milling. The International Journal of Advanced [4] Soori, M., & Arezoo, B. (2023). Modification of CNC Manufacturing Technology, 113, 295-307. machine tool operations and structures using finite https://doi.org/10.1007/s00170-020-06582-2 element methods, A review. Jordan Journal of [15] Shi, Y., Su, M., Cao, Q., & Zheng, D. (2024). A Mechanical and Industrial Engineering. Normal Displacement Model and Compensation https://doi.org/10.59038/jjmie/170302 Method of Polishing Tool for Precision CNC [5] Kuntoğlu, M., Aslan, A., Sağlam, H., Pimenov, Polishing of Aspheric Kuntoğlu, M., Aslan, A., Sağlam, H., Pimenov, D. Y., Surface. Micromachines, 15(11), Giasin, K., & Mikolajczyk, T. (2020). Optimization 1300. https://doi.org/10.3390/mi15111300 and analysis of surface roughness, flank wear and 5 [16] Hasçelik, A., Aslantas, K., & Yalçın, B. (2025). different sensorial data via tool condition monitoring Optimization of Cutting Parameters to Minimize Wall system in turning of AISI 5140. Sensors, 20(16), Deformation in Micro-Milling of Thin-Wall 4377. https://doi.org/10.3390/s20164377 Geometries. Micromachines, 16(3), 310. [6] Shen, Y., Yang, F., Habibullah, M. S., Ahmed, J., Das, https://doi.org/10.3390/mi16030310 A. K., Zhou, Y., & Ho, C. L. (2021). Predicting tool [17] Zheng, L., Chen, W., & Huo, D. (2020). Investigation wear size across multi-cutting conditions using on the tool wear suppression mechanism in non- advanced machine learning techniques. Journal of resonant vibration-assisted micro Intelligent Manufacturing, 32, 1753- milling. Micromachines, 11(4), 380. 1766. https://doi.org/10.1007/s10845-020-01625-7 https://doi.org/10.3390/mi11040380 [7] Kasprowiak, M., Parus, A., & Hoffmann, M. (2022). [18] Gan, L., Wang, L., & Huang, F. (2023). Adaptive Vibration suppression with use of input shaping backlash compensation for CNC machining control in machining. Sensors, 22(6), applications. Machines, 11(2), 193. 2186. https://doi.org/10.3390/s22062186 https://doi.org/10.3390/machines11020193 [8] Nguyen, D. K., Huang, H. C., & Feng, T. C. (2023). [19] Świć, A., Gola, A., Orynycz, O., Tucki, K., & Prediction of Thermal Deformation and Real-Time Matijošius, J. (2022). Technological Methods for Error Compensation of a CNC Milling Machine in Controlling the Elastic-Deformable State in Turning Cutting Processes. Machines, 11(2), 248. and Grinding Shafts of Low https://doi.org/10.3390/machines11020248 Stiffness. Materials, 15(15), 5265. [9] Li, Y., Li, Y. N., Li, X. W., Zhu, K., Zhang, Y. A., Li, https://doi.org/10.3390/ma15155265 Z. H., ... & Wen, K. (2023). Influence of material [20] Lv, Y., Liu, Y., Li, X., Lu, L., & Malik, A. (2024). removal strategy on machining deformation of Automated Shape Correction for Wood Composites in aluminum plates with asymmetric residual Continuous Pressing. Forests, 15(7), stresses. Materials, 16(5), 2033. 1118. https://doi.org/10.3390/f15071118 https://doi.org/10.3390/ma16052033 [21] Yi, J., Wang, X., Zhu, Y., Wang, X., & Xiang, J. [10] Hasçelik, A., Aslantas, K., & Yalçın, B. (2025). (2024). Deformation Control in Mesoscale Micro- Optimization of Cutting Parameters to Minimize Wall Milling of Curved Thin-Walled Deformation in Micro-Milling of Thin-Wall Structures. Materials, 17(20), Geometries. Micromachines, 16(3), 5071. https://doi.org/10.3390/ma17205071 310. https://doi.org/10.3390/mi16030310 [22] Korpysa, J., & Habrat, W. (2024). Dimensional Accuracy After Precision Milling of Magnesium 244 Informatica 49 (2025) 231-244 T. Liu et al. Alloys Using Coated and Uncoated Cutting Tools. Materials, 17(22), 5578. https://doi.org/10.3390/ma17225578 [23] Devi, C., Mahalingam, S. K., Cep, R., & Elangovan, M. (2024). Optimizing end milling parameters for custom 450 stainless steel using ant lion optimization and TOPSIS analysis. Frontiers in Mechanical Engineering, 10, 1353544. https://doi.org/10.3389/fmech.2024.13535 44 https://doi.org/10.31449/inf.v49i12.9142 Informatica 49 (2025) 245-254 245 An Enhanced FSO-BPNN Framework for Anomaly Detection and Early Warning in Power System Monitoring Na Li*, Guanghua Yang, Yuexiao Liu, Xiangyu Lu, Zhu Tang State Grid Beijing Electric Power Company, Beijing,100032, China E-mail:diance003@126.com *Corresponding Author Keywords: anomaly detection (AD), internet of things (IOT), monitoring, neural network, power system (PS) smart grid, predictive maintenance Recieved: May 7, 2025 The increasing complexity of contemporary power networks necessitates the development of enhanced early warning systems and intelligent monitoring to ensure stability and operational efficiency. Traditional approaches to risk prevention and predictive maintenance often fail due to limitations in identifying real-time abnormalities and adapting to dynamic system characteristics. To address these issues, the present research proposes an improved fish swarm optimization with Backpropagation Neural Network (IFSO-BPNN) for anomaly detection (AD) and fault detection (FD) early warning in power system (PS) monitoring that integrates an IFSO algorithm with a BPNN. The major goal is to increase the accuracy of AD and FD in smart grids by utilizing deep learning (DL) and optimization approaches. The IFSO method integrates adaptive weighting and behavioral dynamics into classic fish swarm optimization, improving overall search capabilities. By tweaking BPNN parameters using IFSO, the model achieves higher convergence rates and improved classification accuracy. The assessment dataset was compiled usingInternet of Things (IoT) sensors and pan/tilt camera-based surveillance systems at Beijing power plants, with preprocessing techniques such as min-max normalization and feature extraction using Independent Component Analysis (ICA) to improve model performance. Resultsfrom experiments show that the IFSO-BPNN model outperforms standard algorithms with an accuracy ofFD99.98% and AD 0.9980. These findings illustrate the system's capacity to detect anomalies quickly and perform preventive maintenance. The proposed method, which combines swarm intelligence with neural networks, helps to construct smarter, more robust power grids capable of meeting future energy demands with lower failure risks. Povzetek: Za odkrivanje napak (FD) in nepravilnosti (AD) v nadzoru elektroenergetskega sistema je razvit IFSO-BPNN (Izboljšana optimizacija jata rib in BPNN). Model izboljša kvaliteto z optimizacijo parametrov BPNN z IFSO, kar omogoča hitro zgodnje opozarjanje in prediktivno vzdrževanje. 1 Introduction difficult [4].Real-time data collection and analysis of electrical characteristics is part of PS monitoring, used to Artificial intelligence (AI), big data, and deep learning ensure system stability, identify problems, improve (DL) revolutionize power systems (PS) by enhancing performance, and assist in decision-making for feature modeling, control, and fault diagnosis; these are dependable and effective power grid operation [5].As presenting recent advances and applications in monitoring demonstrated by the arctic sky tragedy, the expansion of and performance analysis [1]. The expansion ofPS is the cruise industry needs advanced, dependable PS to hindered by growing power demand and environmental avoid blackouts, which endanger public safety, the objectives, which present challenges for transmission environment, financial stability, and reputation capacity and distance. Advanced, sustainable energy [6].Potential false alarms, reliance on data quality, solutions are being used to achieve carbon peaking and difficulty identifying new abnormalities, computational neutrality [2].Reconstruction errors and thresholding are complexity, difficulties with real-time implementation, used in AD(AD) to minimize false alarms and isolate fault and threshold setting are some drawbacks of AD and early areas by training a model to learn typical system warning in PS [7]. behaviorin an unsupervised manner [3].Approximately 70% of energy is produced by thermal power plants; new large-capacity units (600–1000+ MW) improve operating efficiency but make system coupling and integration more 246 Informatica 49 (2025) 245-254 N. Li et al. 1.1 Aim and contribution of the research Following data cleaning and feature extraction, supervisory control and data acquisition (SCADA)were The aim of the research is to develop a new method, processed using aConvolutional neural network - improved fish swarm optimization with Backpropagation bidirectional gated recurrent unit (CNN-BiGRU) with Neural Network (IFSO-BPNN), for detecting anomalies attention to identify wind turbine faults [10]. Accurate FD and faultsin PS by integrating BPNN and IFSO in actual wind farms was accomplished; however, it was algorithms. The goal is to increase the accuracy and constrained by the generalizability of the data source and efficiency of AD and fault detection (FD) in smart grids the possibility of overfitting to particular turbine models. while also enabling proactive maintenance. The research's The monitoring of wind turbine health was enhanced by key contributions include the following: utilizing mutual information to determine essential • IFSO Algorithm: Improves the global search parameters, support vector regression (SVR) for capability and adaptive weighting of classic Fish thresholding, and long short-term memory -autoencoder Swarm Optimization, resulting in less (LSTM-AE) for AD [11]. The outcome demonstrated convergence time and higher classification precise AD and successful identification of crucial accuracy in anomaly and fault identification. parameters. Real-time monitoring settings could show a • BPNN Optimization: IFSO is used to optimize decline in performance due to noisy data or inadequate BPNN parameters, which results in quicker temporal information. To optimize the monitoring and convergence and greater classification accuracy security of smart hospitals, machine learning (ML) and for real-time AD and FD. edge-based advertising on Contiki Coojawere applied to • Advanced-Data Preprocessing: Uses min-max identify IoT network intrusions and e-health incidents normalization and Independent Component [12]. The system was successful in identifying Analysis (ICA) for feature extraction, improving cyberattacks and e-health events, but it was very the model's performance in power system dependent on the reality of the simulated data, which could monitoring by efficiently preprocessing Internet not work effectively with complex or novel attack of Things (IoT) sensor and surveillance system patterns. data. Abnormalities in wind turbines were discovered and The next phase (phase 2) clearly explains the existing accurately analyzedutilizing a combination of methods. research about ADand early warning in PS monitoring. Local outlier factor (LOF) and adaptive K-means for Phase 3 presents the methodology, Phase 4 provides the preprocessing, Extreme Gradient Boosting (XGBoost) for result and discussion of existing vs proposed method, and diagnosis, and long short-term memory-stacked denoising Phase 5 deliversthe conclusion. autoencoder (LSTM-SDAE) for feature extractionwere employed [13]. The technique increased wind turbine 2 Related works dependability by efficiently identifying and diagnosing problems in real-time utilizing SCADA data. Performance The aim of the research [8] was to increase the was dependent on the caliber of preprocessing and could dependability of seismic stations. For reliable power be hampered by noisy data or hidden anomalies. The failure prediction, the SeismoGuard Ensemble, which research created an early warning system that incorporates comprises random forest (RF), support vector machine meteorological data to enhance PS dependability and (SVM), k-nearest neighbors (KNN), and logistic proactively reduce atmospheric dangers [14]. The regression (LR), along with IoT monitoring, was used. technology enhanced defect detection and prevented Results demonstrate that the approach attained 90% outages during severe weather; however, its performance accuracy and increased dependability. The dataset's reach depended on data quality and erratic weather patterns. The was restricted; however, the data contains long-term advancements in battery electric vehicle (BEV) testing with wider generalization across various situations. technology, platforms, charging, and monitoring were A combination of elliptic curve cryptography (ECC)- examined to address issues regarding safety, charging, and based token control with deep reinforcement learning range in new energy cars [15]. Although cutting-edge (DRL)-based sleep scheduling was used for secure and platforms and safety features dominate the BEV industry, adaptive power management under possible threat however, there were issues with battery lifecycle safety, conditions in order to improve the security and energy charging simplicity, and weather adaptation. The PS load efficiency of wireless sensor networks (WSNs) [9]. The margin was determined by utilizing an artificial neural approach achieved a 15% increase in energy efficiency network (ANN) trained on phasor measurement unit and a 20.01% power reduction. While simulation-based (PMU) data and model simulations to ensure voltage and outcomes were validated, more verification was required small-signal stability [16]. An ANN's ability to anticipate for scalability and real-world implementation under load margin effectively cannot exceed a dependence on various attack types. the quality of PMU data and model assumptions in actual systems. To increase safety in nuclear-powered marine An Enhanced FSO-BPNN Framework for Anomaly Detection... Informatica 49 (2025) 245-254 247 operations, developments in ship nuclear power framework was suggested; however, knowledge remains machinery (SNPM) design, fault diagnostics, and risk limited and needs to be verified. Table 1 provides the assessmentwere evaluated [17]. Design enhancements and related works summary table. investigation spaces were identified, and an integrated risk Table 1: Comparative Summary of the related works Reference Methods Results Limitations Duet al. [8] SeismoGuard Ensemble (RF, Achieved 90% accuracy, Limited dataset SVM, KNN, LR) + IoT improved dependability coverage; needs monitoring of seismic stations generalization and broader testing Qinet al.[9] ECC token control + DRL- 15% energy efficiency Simulation-based only; based sleep scheduling for gain, 20.01% power real-world scalability WSN reduction and threat resilience not verified Xianget al.[10] SCADA data + CNN-BiGRU + Accurate wind turbine Data source attention mechanism FD in real wind farms generalizability is limited; overfitting risk to specific turbine models Chen et al. [11] Mutual information + SVR for Accurate anomaly Real-time performance thresholds + LSTM-AE for detection; key could degrade under anomaly detection parameters identified noisy or incomplete data Said et al. [12] ML + edge-based intrusion Identified e-health events Simulated data could detection on Contiki Cooja for and IoT network fail under real, complex smart hospitals intrusions accurately attack patterns Zhang et al. [13] LOF + adaptive K-means Real-time, accurate Sensitive to data quality; preprocessing + XGBoost + ADand diagnosis in wind hidden anomaly types LSTM-SDAE turbines may be missed Božičeket Early warning system using Prevented outages and Dependent on weather al.[14] meteorological data improved detection unpredictability and during extreme weather data quality He et al. [15] BEV platform, Technological Issues remain in battery charging/swapping stations, dominance and safety safety, weather and monitoring platform improvements in the adaptability, and BEV market charging ease Bento et al. [16] ANN trained on PMU data + Accurate load margin Performance hinges on model-based simulation prediction ensuring PMU data and voltage and small-signal assumptions in stability simulation models Adumene et al. SNPM designs + fault hybrid risk framework; Incomplete knowledge [17] diagnosis + risk assessment identified design base; needs validation progress and framework integration scale power networks. The research fills a gap by merging 2.1 Research gap an IFSO method with a BPNN for PS anomaly and fault identification. Compared to earlier techniques, this The method additionally solves past techniques' approach improves accuracy, convergence speed, and FD drawbacks, such as restricted data generalization, resilience, especially in noisy situations. overfitting, simulation reliance, and data quality sensitivity. The proposed approach, IFSO-BPNN, provides a scalable, real-time solution for proactive maintenance and problem detection in complex, large- 248 Informatica 49 (2025) 245-254 N. Li et al. 3 Research methodology 3.2 Data preprocessing via min-max normalization This section discusses IoT sensor-based data collection in PS and introduces the IFSO-BPNN approach for anomaly Min-max normalization is a common method used for and fault identification, as well as early warning in PS numerical sensor and camera data from Beijing power monitoring. Figure 1 shows the methodology flow, which plants to scale characteristics between 0 and 1, in which includes data pretreatment, feature extraction, and model the values of a feature are translated into a preset range, optimization. usually [0-1]. The method retains data connections, hence being suitable for a wide range of ML applications. The transformation is carried out using the following Equation (1). 𝑥=𝑚𝑖𝑛 𝑋𝑛𝑒𝑤 = (1) 𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥) 𝑋𝑛𝑒𝑤 = The adjusted value obtained after scaling the data 𝑋 = outdated value, 𝑚𝑎𝑥(𝑥)= dataset’s highest possible value.𝑚𝑖𝑛(𝑥) = dataset's lowest possible value. The normalizing technique improves AD and FD in PS monitoring by ensuring that all data points have a consistent scale, which increases predictive model accuracy. 3.3 Feature extraction using independent component analysis (ICA) ICA is a current statistical technique that attempts to break down observable data into statistically independent components. The ICA was used on sensor and surveillance data to reduce dimensionality and extract essential Figure 1: Flow of the proposed method features, which improved the IFSO-BPNN model's capacity to detect abnormalities in PS monitoring as a 3.1 Data collection linear mixture of independent components, expressed as follows in Equation (2). The system configuration includes a pan/tilt integrated camera, a series of local storage DVR hosts, a 1-terabyte 𝑦 = 𝐵. 𝑇 (2) dedicated hard disk, and equipment from major domestic video equipment manufacturers. A wireless networking Where: 𝑦 represents the observed data vector, 𝐵denotes module is an important element that allows direct the mixing matrix, and 𝑇 denotes the separate components. connection across 4G or 5G wireless networks. The In ICA, components are assumed to be statistically research is centered on power stations surrounding independent and non-Gaussian, with a square and Beijing, where the distribution stations lack wired unknown mixing matrix𝐵. To extract the components, networks and must communicate over wireless networks. calculate the inverse 𝑋 of matrix 𝐵 as follows in Equation To achieve that, on-site terminal equipment is required to (3). access different network types at the distribution station, such as 2G/3G/4G, GSM, CDMA, and wired networks. 𝑇 = 𝑋. 𝑦 (3) Many of these stations are found in basements. In the event of a severed wireless connection between the station and ICA divides data into statistically independent the platform, short messages transmitted to the terminal components, helping in AD and FD in PS. While the equipment at the distribution station allow for simple technique does not give direct variance or ordered data, the permission and re-establishment of communication. The enhanced sparsity-based technique improves feature data were split into an 8:2 ratio, 80% for training, and 20% extraction and speeds up convergence for real-time for testing dataset. applications such as early warning systems. ICA has been widely applied in disciplines like face recognition and dimensionality reduction. PS monitoring, which extracts essential characteristics from sensor data, catches An Enhanced FSO-BPNN Framework for Anomaly Detection... Informatica 49 (2025) 245-254 249 complicated, non-Gaussian patterns that standard 𝐹(𝑥)𝑥. 𝑡𝑎𝑛ℎ(𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠(𝑥)). where softplus approaches typically overlook, resulting in improved AD, function𝑓(𝑥) = log (1 + 𝑒𝑥).Mish is a self-regulatory FD and maintenance efficiency. activation that improves accuracy and generalization instead of standard function. The process is smooth and 3.4 Detection and early warning in PS non-monotonic, allowing for modest negative outputs monitoring using improved fish swarm while retaining strong positive flow, avoiding problems optimization with backpropagation like dead neurons in ReLU.𝑥: Input to the neuron. neural network (IFSO-BPNN) 𝑆𝑜𝑓𝑡 𝑝𝑙𝑢𝑠 (x): A smooth variant of ReLU.𝑡𝑎𝑛ℎ(⋅): Implements smooth limiting behavior for high input The IFSO-BPNN enhances AD and FD in PSby values. Data from the power system is collected, optimizing BPNN parameters with the IFSO algorithm, standardized, and sent to the network for training. increasing classification accuracy, and allowing for real- Normalization guarantees that each input feature time predictive maintenance. Figure 2 displays the contributes evenly to model training. During forward proposed method’s flow diagram for power system propagation, input data is transferred through the layers as monitoring. the model produces predictions. Backpropagation then changes the weights and biases depending on the loss function, which is commonly Mean Squared Error (MSE) and computed as follows in Equation (5). 1 2 𝑀𝑆𝐸 = ∑𝑁 𝑗=1(𝑥 − 𝑥 5 𝑁 𝑝𝑟𝑒𝑑 𝑎𝑐𝑡𝑢𝑎𝑙) ( ) To improve the model's capacity to detect anomalies and fault, increase system dependability, and provide early alerts for proactive PS repair. Loss function: In the PS anomaly and fault detection, the loss function is critical for reducing prediction errors and improving model parameters. The BPNN's output layer computes the error between the expected output and the actual observed detection using the MSE and an appropriate activation function. The error gradient of each neuron in the output layer could be computed as follows in Equation (6). Figure 2: Flow diagram for the proposed method. 𝛿𝑜𝑢𝑡 = (𝑥𝑝𝑟𝑒𝑑 − 𝑥𝑡𝑟𝑢𝑒). 𝜎′(𝑤) (6) 3.4.1 Back-propagation neural network (BPNN) The BPNN is a multi-layer feed-forward artificial neural 𝑥𝑝𝑟𝑒𝑑 : predicted output (anomaly, and fault score). 𝑥𝑡𝑟𝑢𝑒: network designed to identify anomalies in PS. The True label (0 for no abnormality and 1 for anomaly). 𝜎′(𝑤) architecture consists of an input layer, one or more hidden is the derivative of the activation function for the neuron's layers, and an output layer. Sensor readings, system input 𝑤.The gradient of the hidden layers is affected performance measurements, and ambient parameters are primarily by the output error, but also by the gradients of all sent into the input layer. The hidden layers discover the following layers. The gradient of a hidden layer neuron complicated patterns in the data, whereas the output layer 𝐺𝑗 could be calculated using the chain rule in Equation (7). anticipates anomalies and faults such as system malfunctions or failures. Each neuron's output is defined 𝛿ℎ𝑖𝑑𝑑𝑒𝑛 = ∑𝑖 𝑧𝑗.𝑖 . 𝛿𝑖. 𝜎′(𝑤𝑗) (7) by applying an activation function to the weighted sum of inputs in Equation (4). 𝛿ℎ𝑖𝑑𝑑𝑒𝑛: Error gradient for a hidden layer neuron. 𝑧𝑗.𝑖: Weight coupling hidden layer cell 𝐺𝑗 with output 𝑥 = ∑𝑛 𝑗=1 𝑧𝑗 . 𝑦𝑖 + 𝑎 (4) neurons. 𝛿𝑖: The error gradient of the output neuron.𝜎′(𝑤𝑗): Derivative of the activation function for Where 𝑦𝑖is the input, 𝑧𝑗is the weight, 𝑎is the bias, and 𝜎 (⋅ the buried layer input 𝑤𝑗 .Gradient descent is used to ) is the exponential activation function (TanhExp) 𝑓(𝑥) = update weights and biases during training to minimize the 𝑥. tanh (𝑒𝑥), generally mish activation function 𝑡𝑎𝑛ℎ. loss function. The rules for updating the weights (𝑧) and The Mish function is smooth and comparable to TanhExp. biases (𝑎) in each round are as follows in Equations (8-9). The formula is provided as follows. 250 Informatica 49 (2025) 245-254 N. Li et al. 𝑧(𝑛+1) 𝜕 = 𝑧(𝑛) 𝑃 − 𝜂. (8) detection accuracy are enhanced by the adaptive 𝜕𝑧 technique. 𝑎(𝑛+1) = 𝑎(𝑛) 𝜕𝑃 − 𝜂. (9) 𝜕𝑎 The artificial fish swarm algorithm (AFSA) uses swarming and following behaviors to determine convergence speed. The current weights and biases at iteration 𝑛 are denoted However, narrow distances can cause local optima and by 𝑧(𝑛)and𝑎(𝑛). The learning rate (𝜂) is a hyperparameter delayed convergence. Randomization changes that controls the step size. The gradients of the loss swimming's step size to prevent premature convergence. 𝜕𝑃 𝜕𝑃 function about weights and biases are ∂ and , The algorithm focuses on determining the optimal position 𝜕𝑧 𝜕𝑎 of fake fish for efficient attribute reduction, and eliminates respectively. The learning rate 𝜂 adjusts the model's search behavior to save execution time. The enhanced weights and biases to reduce prediction errors for ADinPS. swarming and subsequent behaviors are defined as Equations (12-13). 3.4.2 Improved fish swarm optimization (IFSO) FSO was selected over PSO, GA, and DE because of its 𝑌𝑛𝑒𝑥𝑡 = 𝑌𝑗 + 𝑠𝑡𝑒𝑝 × (𝑌𝑑 − 𝑌𝑗)𝑖𝑓 𝐺(𝑌𝑑) > 𝐺(𝑌𝑗) greater global search capabilities and adaptive behavior, which improve convergence and classification accuracy in (12) AD and FD. An IFSO is proposed to increase detection 𝑌𝑗 = 𝑌𝑑 𝑖𝑓 𝐺(𝑌𝑑) > 𝐺(𝑌𝑗) (13) accuracy and convergence speed. For balanced exploration and exploitation, the system incorporates adaptive control 𝑌𝑛𝑒𝑥𝑡: The fake fish's next position.𝑌𝑗: The fake fish's overstep size and visual field, which shrinks with current location.𝑌𝑑: The position of the swarm's iterations. By eliminating default search behaviors and center.𝑠𝑡𝑒𝑝: The step size for movement is determined by crowding conditions, swarming and following techniques a random component. 𝐺(𝑌𝑑): The fitness value at the are improved. Fish retry with modified settings when an center position. 𝐺(𝑌𝑗): The fish's fitness value at that improved solution is discovered. To preserve the quality of present location. These changes improve the algorithm's global optimization, an extinction-regeneration system efficiency, resulting in faster convergence and higher removes the most susceptible fish and replaces it with a performance. more suitable one. This improved method efficiently Improved Search Behavior: In the AFSA, searching for optimizes BPNN parameters for AD and FD in PS. behavior entails exploring the available domain to discover The classic Fish Swarm Algorithm (FSA) has fixed visual alternatives. The number of tries has a significant impact and step sizes, which can hinder convergence. To improve on search efficiency, frequently resulting in premature or AD performance, an adaptive piecewise function is inefficient searches. To solve these things, extend the proposed to gradually decrease visual and step sizes with viewing field when no superior location is discovered after iterations, finding a balance between speed and accuracy. a certain number of difficulties. When a suitable place is Step𝑆𝑆 (𝑖𝑡𝑒𝑟) and adaptive 𝑉(𝑖𝑡𝑒𝑟) are defined as follows located, the fish takes one step towards that, with a in Equations (10-11). maximum step size of 𝑠𝑡𝑒𝑝𝑛𝑒𝑤 = 2 × 𝑠𝑡𝑒𝑝. Without false, the fish moves randomly. IFSO's capacity was 𝑖𝑡𝑒𝑟 log (𝑚𝑖𝑛𝑣/𝑚𝑎𝑥 improved to efficiently tune BPNN parameters, hence 𝑉(𝑖𝑡𝑒𝑟) = int (𝑚𝑎𝑥𝑣 × ( 𝑣) ) )(10) log (𝑚𝑎𝑥𝑔𝑒𝑛) increasing accuracy and convergence in PS anomaly and 𝑖𝑡𝑒𝑟 log (𝑚𝑖𝑛 / fault detection. 𝑆𝑆(𝑖𝑡𝑒𝑟) = int (𝑚𝑎𝑥𝑠 × ( 𝑠 𝑚𝑎𝑥𝑠) ) ) (11) log (𝑚𝑎𝑥𝑔𝑒𝑛) Mechanism of Extinction and Rebirth: The algorithm uses an extinction mechanism to remove the least suitable fish, 𝑉(𝑖𝑡𝑒𝑟) enhancing swarm adaptability but decreasing swarm size : The artificial fish's field of vision at iteration iter. and randomness. A regeneration mechanism is then 𝑆𝑆(𝑖𝑡𝑒𝑟): The maximum step the fish can take during included to restore swarm size by regenerating highly iteration. 𝑚𝑎𝑥𝑣: Step size and initial (maximum) visual adaptable fish, ensuring resilience and enhancing range. 𝑚𝑖𝑛𝑣: The smallest step size and visual range for efficiency by shortening iteration durations while efficient searching. The maximum number of iterations maintaining high fitness levels. The IFSO-BPNN is𝑚𝑎𝑥𝑔𝑒𝑛 . 𝑖𝑡𝑒𝑟: The number of the current iteration. For approach attempts to discover and detect deviations in discrete issues, 𝑖𝑛𝑡(. . . ) rounds values to integers. Values PSmore efficiently by optimizing neural network are rounded to integers, with a minimum step and visual parameters, assuring faster convergence, and improving sizes set to 1 for discrete issues such as attribute reduction prediction accuracy for proactive maintenance. Algorithm in Equations (10-11); both the visual and step sizes use an 1 displays IFSO-BPNN. exponential decrease from maximum to minimum across iterations, allowing for quick global search at the beginning and accurate local search at the final stage. The provided AD, and FD framework's convergence and An Enhanced FSO-BPNN Framework for Anomaly Detection... Informatica 49 (2025) 245-254 251 Algorithm 1: IFSO-BPNN 4.1 Experimental setup Step 1: Initialize the BPNN parameters Initialize BPNN with input layer, hidden layers, and output The IFSO-BPNN technique is implemented on a machine layer equipped with an Intel i7 CPU, 16GB RAM, and a 512GB Set learning rate η and number of iterations max_iter SSD. Python 3.9 is used for implementation, including Step 2: Initialize the Fish Swarm Optimization (IFSO) libraries like NumPy, TensorFlow, Scikit-learn, and parameters Matplotlib for processing and visualization. Table 2 Initialize fish swarm population size, maximum visual displays the hyperparameters of the proposed method. field (max_v), and step size (max_s) Set the minimum values for visual field (min_v) and step Table 2: Hyperparametric for proposed method size (min_s) Step 3: Data Preprocessing Hyperparameter Range/Value Preprocess data: BPNN Learning Rate (η) 0.01 to 0.1 Normalize sensor readings using min-max Max Iterations (max_iter) 100 to 1000 normalization Swarm Population Size 50 to 200 Perform feature extraction using Independent Max Step Size (max_s) 0.1 to 1.0 Component Analysis (ICA) Min Step Size (min_s) 0.01 to 0.1 Step 4: Training the BPNN with IFSO optimization Learning Rate (η) for 0.001 to 0.01 for each iteration in range(max_iter): BPNN for each fish in the swarm: Fitness Function Error of BPNN model visual = V(iter) predictions step = SS(iter) MSE Threshold for 0.001 if G(Y_d) > G(Y_j): Convergence Y_j = Y_d Activation Function for Mish, TanhExp, or ReLU for each fish in the swarm: BPNN BPNN.weights = optimize_with_fish_swarm(Y_j) BPNN.biases = optimize_with_fish_swarm(Y_j) 4.2 Performance outcome for epoch in range(max_epochs): output = BPNN.forward(input_data) Figures 3 and 4 show the ROC curve and confusion matrix error = calculate_MSE(output, expected_output) for anomaly detection and fault detection, respectively. gradients = backpropagate(error) The performance was evaluated based on the false positive BPNN.weights = BPNN.weights - η * gradients.weights rate, the true positive rate for the ROC curve, and the BPNN.biases = BPNN.biases - η * gradients.biases predicted and actual for the confusion matrix. Step 5: Extinction and Regeneration remove_weakest_fish() regenerate_strong_fish() Step 6: Anomaly and Fault Detection anomaly_score = BPNN.predict(test_data) fault_score = BPNN.predict(test_data) if anomaly_score> threshold or fault_score> threshold: trigger_early_warning() Step 7: Return the optimized BPNN model for PS monitoring Return BPNN model optimized using IFSO 4 Result and discussion This section compares the result of the proposed method, Figure 3: Anomaly detection (a) Roc curve, and (b) an enhanced IFSO-BPNN framework, for AD and FD confusion matrix. early warning in PS monitoring with existing methods. The evaluation was conducted using parameters such as accuracy (Acc), success rate (SR), misclassification instances (MI), error rate (ER), precision (Pre), recall (Rec), and F1 score (F1). 252 Informatica 49 (2025) 245-254 N. Li et al. Table 3: FD metrics values for proposed method. Metrics LSTM [18] IFSO-BPNN [Proposed] Acc (%) 91.21 98.5 SR(%) 92.42 96.85 MI 17 9 ER (%) 8.76 5.15 Figure 4: fault detection (a) Roc curve, and (b) confusion matrix. 4.3 Parameter explanation Accuracy (Acc):Acc is defined as the ratio of accurately predicted occurrences (including true positives and true negatives) to total instances in a dataset, which measures the overall performance of PS monitoring and fault detection.Success rate (SR): The smart grid system is Figure 5(a): Acc and SR value for FD. calculated as the proportion of accurately discovered faults and successful predictions to improveFD and maintenance accuracy.Misclassification instances (MI): The events occur when the model incorrectly identifies problems or normal conditions to demonstrate the possible flaws in identifying power defects.Error rate (ER):The fraction of misclassified cases, revealing the model's errors with an emphasis on decreasing mistakes in FD for PS.Precision (Pre) is the fraction of successfully diagnosed errors among all expected anomalies, demonstrating detection accuracy. Recall (Rec) measures the model's ability to detect all real abnormalities. F1 Score (F1) balances precision and recall. These metrics assess the IFSO-BPNN model's ability to accurately detect and monitor PS faults. 4.4 Comparison phase The proposed method, IFSO-BPNN, is compared to the Figure 5(b): MI and ER FD value for proposed method. existing methods like Long Short-Term Memory (LSTM) [18] for FD, k-Nearest Neighbors (KNN), Decision tree Table 4 and Figure 6show the comparison of the proposed classifier (DTC), and Random Forest (RF) [19] for ADand method and existing methods to evaluate the metric early warning in PS monitoring with evaluation metrics. valuesused to predict ADand early warning of PS Table 3 and Figure 5 (a-b) display the comparison of monitoring. The proposed IFSO-BPNN (0.9980) method metric values for the proposed method and existing achieves greater Acc than KNN (0.9729), DTC (0.9937) methods to predict FD and FD in early warning of PS and RF (0.9976). monitoring. The proposed IFSO-BPNN (98.5%) method achieves greater Acc than LSTM (91.21%). An Enhanced FSO-BPNN Framework for Anomaly Detection... Informatica 49 (2025) 245-254 253 Table 4: Metrics values for proposed vs existing methods. Metrics KNN DTC RF IFSO- 5 Conclusions [19] [19] [19] BPNN [Proposed] The improved early warning model, combining IFO with Pre 0.9732 0.9937 0.9976 0.9978 a BPNN (IFSO-BPNN), was presented to improve FD and Rec 0.9729 0.9937 0.9976 0.9977 predictive maintenance in smart power systems. The F1 0.9729 0.9937 0.9976 0.9979 method aims to optimize neural network parameters for Acc 0.9729 0.9937 0.9976 0.9980 higher detection accuracy. The results demonstrated exceptional performance with FD accuracy (98.5%) and AD accuracy (0.9980) higher than existing methods. To address statistical validation, the IFSO-BPNN model has limited specificity, required more processing resources, and relied on precise parameter adjustment, which could leave an impact on real-time performance and generalizability across different power systems. The dataset's limited coverage of Beijing's local distribution stations, as well as a lack of sample size and class distribution information, limit its generalizability and model performance assessment. The future scope may extend the dataset to cover varied power systems, and providing precise details on sample size and class distribution would improve model resilience, generalization, and performance evaluation. Future research should focus on increasing specificity, testing in Figure 6: Evaluation metrics values for the proposed a variety of grid scenarios, and incorporating real-time method. adaptive processes to widen and improve the system's FD capabilities and use confidence intervals and standard In this research, both BPNN and IFSO-BPNN techniques deviations to demonstrate dependability. Future directions were trained for FD and AD in PS. The numerical results include statistical validation methods, such as confidence of the ablations study for FD and AD in PS are displayed intervals and standard deviations, to support the reliability in Table 5, indicating that IFSO-BPNN performs better of results, providing clearer justification for performance than BPNN. metrics and model robustness. Future work will concentrate on providing thorough feature extraction, Table 5: Outcome of ablation study dimensionality reduction using ICA, and using correlation Method AD Acc (%) FD Acc (%) reduction methods for better analysis. Future research aims BPNN 98.0 98.2 to enhance model performance and generalization by IFSO-BPNN 99.8 98.5 improving feature extraction, incorporating diverse data sources, and reducing dimensionality. 4.5 Discussion References The proposed IFSO-BPNN method achieves higher Acc, Pre, Rec, F1 and SR and significantly reduces MI and ER [1] Wang G, Xie J, and Wang S (2023). Application of compared to existing methods like LSTM, KNN, DTCand artificial intelligence in power system monitoring and RF. Existing models struggle with real-time adaptation and fault diagnosis. Energies, 16(14), 5477. FD accuracy. The IFSO method overcomes these https://doi.org/10.3390/en16145477 constraints by improving global search and optimizing [2] Chen Q, Li Q, Wu J, He J, Mao C, Li Z, and Yang B BPNN parameters for improved performance. The (2023). State monitoring and fault diagnosis of HVDC connection helps electricity systems identify faults and system via KNN algorithm with knowledge graph: a provide early warnings. The key benefit is the substantial practical China power grid case. Sustainability, 15(4), dependability and precision in predictive maintenance, 3717. https://doi.org/10.3390/su15043717 which improves the robustness and efficiency of PS. [3] He K, Wang T, Zhang F, and Jin X (2022). Anomaly Deploying the IFSO-BPNN model in smart grids provides detection and early warning via a novel multiblock- real-time defect detection, such as detecting transformer based method with applications to thermal power overheating early on, averting blackouts, lowering plants. Measurement, 193, 110979. maintenance costs, and enhancing energy distribution https://doi.org/10.1016/j.measurement.2022.110979 reliability across locations. 254 Informatica 49 (2025) 245-254 N. Li et al. [4] Stanković AM, Tomsovic KL, De Caro F, Braun M, the world: achievements in technology system Chow JH, Čukalevski N and Zhao S. (2022). Methods architecture and technological breakthroughs. Green for analysis and quantification of power system Energy and Intelligent Transportation, 1(1), 100020. resilience. IEEE Transactions on Power Systems, https://doi.org/10.1016/j.geits.2022.100020 38(5), 4774–4787. [16] Bento ME (2022). Monitoring of the power system https://doi.org/10.1109/TPWRS.2022.3212688 load margin based on a machine learning technique. [5] Bolbot V, Theotokatos G, Hamann R, Psarros G, and Electrical Engineering, 104(1), 249–258. Boulougouris E (2021). Dynamic blackout probability https://doi.org/10.1007/s00202-021-01274-w monitoring system for cruise ship power plants. [17] Adumene S, Islam R, Amin MT, Nitonye S, Yazdi M, Energies, 14(20), 6598. and Johnson KT (2022). Advances in nuclear power https://doi.org/10.3390/en14206598 system design and fault-based condition monitoring [6] Baba M, Nor NB, Sheikh MA, Baba AM, Irfan M, towards the safety of nuclear-powered ships. Ocean Glowacz A, and Kumar A(2021). Optimization of Engineering, 251, 111156. phasor measurement unit placement using several https://doi.org/10.1016/j.oceaneng.2022.111156 proposed case factors for power network monitoring. [18] Veerasamy V, Wahab NIA, Othman ML, Energies, 14(18), 5596. Padmanaban S, Sekar K, Ramachandran R, and Islam https://doi.org/10.3390/en14185596 MZ (2021). LSTM recurrent neural network classifier [7] Florkowski M (2021). Anomaly detection, trend for high impedance fault detectionin solar PV evolution, and feature extraction in partial discharge integrated power system. IEEE Access, 9, 32672– patterns. Energies, 14(13), 3886. 32687. https://doi.org/10.3390/en14133886 https://doi.org/10.1109/ACCESS.2021.3060800 [8] Du J, Wang X, and Zhang H (2025). Secure power [19] Mokhtari S, Abbaspour A, Yen KK, and Sargolzaei A management in wireless sensor networks for power (2021). A machine learning approach for anomaly monitoring using deep reinforcement learning. detection in industrial control systems based on Informatica, 49(19). measurement data. Electronics, 10(4), https://doi.org/10.31449/inf.v49i19.7125 407.https://doi.org/10.3390/electronics10040407 [9] Qin G, Juan M, and Rui MH (2025). IoT-based intelligent power supply management using ensemble learning for seismic observation stations. Informatica, 49(8). https://doi.org/10.31449/inf.v49i8.6502 [10] Xiang L, Yang X, Hu A, Su H, and Wang P (2022). Condition monitoring and Anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks. Applied Energy, 305, 117925. https://doi.org/10.1016/j.apenergy.2021.117925 [11] Chen H, Liu H, Chu X, Liu Q, and Xue D (2021). Anomaly detection and critical SCADA parameters identification for wind turbines based on LSTM-AE neural network. Renewable Energy, 172, 829–840. https://doi.org/10.1016/j.renene.2021.03.078 [12] Said AM, Yahyaoui A, and Abdellatif T (2021). Efficient Anomaly detection for smart hospital IoT systems. Sensors, 21(4), 1026. https://doi.org/10.3390/s21041026 [13] Zhang C, Hu D, and Yang T (2022). Anomaly detection and diagnosis for wind turbines using long short-term memory-based stacked denoising autoencoders and XGBoost. Reliability Engineering & System Safety, 222, 108445. https://doi.org/10.1016/j.ress.2022.108445 [14] Božiček A, Franc B, and Filipović-Grčić B (2022). Early warning weather hazard system for power system control. Energies, 15(6), 2085. https://doi.org/10.3390/en15062085 [15] He H, Sun F, Wang Z, Lin C, Zhang C, Xiong R, and Zhai L (2022). China's battery electric vehicles lead https://doi.org/10.31449/inf.v49i12.9062 Informatica 49 (2025) 255-268 255 Automated AutoCAD Drawing Assessment via Image Processing and Vector Transformation Techniques Zhengkai Xiong, Jiaming Ge, Rong Wei* Department of Mechanical and Electrical Engineering, Cangzhou Technical College, Cangzhou City, Hebei Province, 061000, China E-mail: xzk870036@163.com, JiamingGe68@163.com, weirong525@163.com *Corresponding author Keywords: graphics processing, computer graphics. information extraction, computer graphics examination system, computer-aided design (CAD) Received: April 28, 2025 Conventional assessment practices in computer graphics courses, particularly those that utilize AutoCAD, often rely on manual grading or basic template-matching strategies. These methods are ineffective and biased, particularly when used for extensive evaluations. Intelligent evaluation methods and automated image processing must be integrated as educational technology continues to evolve. The purpose of the proposed effort is to develop and put into use an intelligent AutoCAD computer drawing evaluation system that uses image processing technologies. Enhancing assessment accuracy, automating scoring, and utilizing robotic technologies to combine virtual drawing analysis and actual drawing validation are the objectives. The system evaluates student drawings using MATLAB-based techniques, including vector transformation, grayscale conversion, binarization, and histogram similarity. It extracts components using DXF file parsing, performs geometric matching, and features extraction. A feedback-driven retransmission method ensures packet correctness. A servo motor-powered drawing computer duplicates input drawings, and performance is assessed using torque analysis, picture entropy, consistency, and smoothness criteria. The system could accurately reproduce student drawings with an accuracy of more than 0.1 cm and an average drawing speed of 1.75 cm/s. The system's dependability was confirmed when evaluation ratings for example drawings nearly matched hand grading. Within the robotic arm's torque limits, moment and motion analysis verified operational safety and accuracy. The proposed approach automates computer graphics analysis by combining hardware and software elements for perceptive evaluation. However, limitations on robot motion and image quality sensitivity were limitations, requiring future improvements. Povzetek: Predstavljen je inteligentni sistem za avtomatsko ocenjevanje risb AutoCAD z obdelavo slik in vektorsko transformacijo. Uporablja DXF analizo, primerjavo slik in robotsko reprodukcijo za natančno in objektivno ocenjevanje. 1 Introduction automates various engineering and design processes, despite its high skill and work requirements [4]. Recent advancements in generative models in language Screencasts enhance concurrent learning in CAD-based and imaging have transformed the perception of computers and technical drawing classes, providing flexible, self- as co-creators, enabling creative AI to actively participate paced learning options for students lacking prior CAD in idea exploration [1]. Augmented Reality (AR) enhances experience and limited curriculum time [5]. Conventional learning in graphic design education by providing CAD systems enhance manufacturing productivity in dynamic, 3D-registered visuals, improving students' industries like metallurgy, glass working, and woodturning practical interaction with intricate mechanical structures by facilitating detailed 3D modeling and group technology and spatial comprehension [2]. AutoCAD is a popular for small-batch production [6]. program for creating technical drawings and The goal of the research is to create and put into use an documentation in design and architecture, but beginners intelligent AutoCAD computer drawing evaluation system may face challenges due to standardized teaching that uses image processing technologies. Enhancing strategies [3]. assessment accuracy, automating scoring, and utilizing Automatic List Processing (AutoLISP), a key component robotic technologies to combine virtual drawing analysis of AutoCAD, is a software development tool that and actual drawing validation are the objectives. 256 Informatica 49 (2025) 255-268 Z. Xiong et al. • To create an automatic AutoCAD assessment restricted validation across several CAD platforms and system that combines image processing methods with reliance on teacher-defined coefficients [8]. Sulfur DXF file structure parsing for precise and impartial Hexafluoride (SF6) dial pointer recognition system that is grading. accurate and effective, utilizing Computer Aided eXtended • To use sophisticated vector transformation Application (CAXA) secondary development for techniques, like skeleton extraction, binarization, and automated CAD drawing generation, open-source grayscale conversion, to transform visual drawing inputs computer vision library (OpenCV)-based angle detection, into formats that could be analyzed. and socket communication. The method achieved a 0.69° • To put into practice a feedback-driven average error, exceeding accuracy requirements; retransmission algorithm that replicates annealing restrictions include dependence on particular applications principles for effective drawing packet delivery and and restricted adaptability to varying dial designs [9]. correction. Following an experiment with focus groups, a literature- • To create a robotic drawing platform with servo informed questionnaire was given to 59 students and 21 motors that could physically replicate digital inputs, educators to assess preferences between hand drafting and confirming the accuracy of vector interpretations. CAD in architectural working drawings. CAD was • To test mechanical drawing precision and selected for effectiveness and accuracy; restrictions compare automated scores with manual grading to assess include interest dependence on duplicate instructions and the accuracy and dependability of the suggested solution. restricted understanding of context during site visits [10]. System organization: Related research on AutoCAD To enhance the teaching of cosmetics design by assessment is reviewed in Section 2. The image processing incorporating graphic design software, evaluating efficacy, methods, methodology, and DXF file analysis are paintbrush choice, and digital design efficiency through explained in Sections 3–5. Results, experiments, and comparative tests, two-stage questionnaires, and expert system implementation are presented in Sections 6–11. assessments. Computer drawing reduced design time in The investigation is concluded in Section 12, which also half and increased the efficacy of instruction; however, the suggests potential enhancements for evaluation accuracy method had drawbacks, such as a learning curve at first and and scalability. a dependence on particular software capabilities like symmetry functions [11]. Simple brush painting that is 2 Related work automated and realistic. The approach, which was evaluated on the FaceX dataset using Python and Employing task performance metrics and rubric-based TensorFlow, merges an attention mechanism (AM) with a imagination evaluation with undergraduate students Long Short-Term Memory (LSTM) network. The model's compare the AutoCAD 2025 and AutoCAD Mechanical accuracy was 98.63% and the F1 score was 98.75%; 2025 CAD tasks' efficiency and creativity. Efficiency and however, that requires a lot of processing power and the creativity were increased by AutoCAD Mechanical; outcome could differ depending on the dataset [12]. The however, short-term evaluation, a single discipline focus, computer vision system uses wavelet denoising, multi- and a lack of user input analysis cloud were some of the feature fusion, transfer of style enhancement, and drawbacks [7]. Create an automated evaluation tool for recognition models trained on WikiArt and OilPainting CAD models in mechanical courses that uses a model- datasets to classify painting styles and analyze sentiment. based methodology to assess parametric, feature-based, The model attained 90% sentiment accuracy and over 95% and geometric aspects with parameterization. Although the style classification; however, performance could differ CAD Model Automatic Assessment (MAA) Tool when applied to less structured, real-world artwork that efficiently automates model evaluation, limitations include wasn't part of benchmark datasets [13]. Table 1 provides the related works summary table. Table 1: Comparative Summary of the related works Reference Method Dataset Result Limitation Gutiérrez et Comparison of Undergraduate AutoCAD Short-term study, single- al. [7] AutoCAD 2025 vs. mechanical Mechanical improved discipline focus, no user AutoCAD Mechanical engineering students efficiency and feedback 2025 using performance creativity metrics and creativity rubrics Eltaief et al. CAD Model Automatic Mechanical CAD Efficient automation Limited cross-platform [8] Assessment (MAA) models in an of model evaluation validation, depends on Tool using parametric, academic setting teacher-set parameters Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 257 geometric, and feature- based evaluation Zhang et al. SF6 dial pointer Dial images with 0.69° average error, Limited generalizability, [9] recognition using angle readings high precision depends on specific OpenCV, CAXA, and software socket communication Fakhry et al. Survey with 59 students Architecture CAD preferred for Risk of overusing copy- [10] and 21 educators coursework and field accuracy and paste, lack of site context comparing CAD vs. visits efficiency integration hand drafting Hsu et al. Integration of graphic Cosmetology Halved design time, Initial learning curve, [11] design software in students and experts improved software-dependent (e.g., makeup design teaching instructional mirror function) via experiments and effectiveness questionnaires Zhang [12] LSTM and attention- FaceX dataset 98.63% accuracy, High processing cost, based model for 98.75% F1 score dataset-sensitive automated brush painting (Python + TensorFlow) Cheng et al. Computer vision using WikiArt and 95%+ style Reduced accuracy on [13] wavelet denoising, OilPainting datasets classification, 90% non-benchmark, real- feature fusion, and style sentiment accuracy world art transfer Research fills a critical gap by focusing on the absence of the calculation amount is small, the operation speed is fast, intelligent, automatic assessment systems for AutoCAD- and it is the most used method now in calculate the based drawings in educational settings. Research Equation (1). combines image processing and vector transformation techniques to provide accurate, objective, and scalable 1 |𝑔 −𝑠 | 𝑆𝑖𝑚(𝐺, 𝑆) = ∑𝑁= (1 − 𝑖 𝑖 ) (1) 𝑁 𝑖 1 assessment, whereas existing solutions concentrate on 𝑀𝑎𝑥(𝑔𝑖,𝑠𝑖) manual review or limited automation. The research helps to modernize CAD education, lessen the workload of Here, 𝐺 and 𝑆 are the histograms of the target image and instructors, and improve the learning experience for the source image𝑁 is the amount of color space models, is students with limited CAD competency by bringing the image attribute of the block area of the target image, automated outcomes into line with human grading and is the image attribute of the block area of the source standards and increasing the efficiency of drawing image. gi 𝑠𝑖 interpretation. The histogram-based method was chosen. After all, it is more appropriate for real-time AutoCAD examination 3 Image processing applied to systems because it is faster to execute and has less computer graphics examination- processing complexity while keeping competitive accuracy. The applicability of techniques like SSIM and related technologies cosine similarity in time-sensitive evaluation environments was diminished by the reality that they only The use of MATLAB for graphics processing is because slightly increased accuracy but arrived with much longer MATLAB has strong matrix operation capabilities, so the processing times. processed graphics are represented in the form of matrices or vectors [14-15]. 4 Image acquisition and processing Funded by: Daqing Normal University Youth Fund Research Project (No.: 9ZQ08; Teaching Research Project of Heilongjiang Bayi Agricultural University (Project The typical AutoCAD computer drawing examination Title: Research and Application of Paperless Exam System system design mathematical model can be expressed by in Computer Graphics Courses. The images produced by Equation (2), and the calculation of the optimal computer the differences are similar) The measure of the degree, the processing analysis 𝑃 = {𝑢1, 𝑢2, ⋯ , 𝑢𝑘}: normalization can be well realized by using the histogram, 258 Informatica 49 (2025) 255-268 Z. Xiong et al. 𝑚𝑖𝑛 𝑍 (𝑃𝛼) = ∑𝑘=1 terms of accuracy and diversity 𝑖=1 𝑑(𝑢𝛼 , 𝑢𝛼 ) + 𝑑(𝑢 , 𝑢𝑘) (2) thorough analysis in 𝑖 𝑖+1 𝛼1 typical illustration based on an image processing technique In the formula, 𝛼𝑖 is used to describe the reorganization of include [16-17] in Equation (6). the K computer-processed analysis point order and 𝑑(𝑢𝛼 , 𝑢𝛼 ) describes the Manhattan distance between 𝐸𝑣𝑎𝑙(𝜋𝑝) = 𝜆𝐴𝑐𝑢(𝜋𝑝) + (1 − 𝜆)𝐷𝑖𝑣(𝜋𝑝) 𝑖 𝑖+1 two points. (6) The path optimization model for examining AutoCAD drawing elements during assessment is represented by 𝜆 ∈ [0,1] : The exactness and variety of drawings handed equation (2). Practically speaking, each point 𝑢 degree of importance in the complete analysis 𝑖 represents out are the a feature or object like a wall, door, or window that was criteria. Equation (6) uses the diversity 𝑝𝑟𝑜(𝜋𝑝) of the taken from a student's artwork and represented as spatial image processing method's basic processing in Equation coordinates or data blocks. The function (7) to get the probability 𝐷𝑖𝑣(𝜋𝑝) of choosing every 𝑑(𝑢𝛼 , 𝑢𝛼 )computes the distance in meters between 𝑖 𝑖+1 drawing processing basic analyzing technique as the consecutive features, a pertinent metric in CAD since evaluation's basic processing objects are frequently aligned to orthogonal grids. The variable 𝛼𝑖 indicates a sequence of such features as 𝐷𝑖𝑣(𝜋 𝑟𝑜(𝜋 ) = 𝑝)𝑝 identified by the system. Finding the most effective 𝑝 𝐵 ∑𝑝=1𝐷𝑖𝑣(𝜋𝑝) structural match between the student's layout and the (7) reference drawing is part of minimizing the sum of these distances. to restore the drawing's natural color and recognition, the The formula above quantifies the spatial deviation, for drawing processing network module works to optimize instance, if the student's layout rearranges or misaligns the each reconstructed drawing's color and spatial placement four rooms that are connected linearly in the correct [18–19]. The formula defines the drawing processing drawing. That allows the system to assess not only the network module's loss function Lip. The following elements' existence but also their arrangement in a processing losses are displayed: unmasked regions, sequence that makes sense geometrically and conceptually. masked regions 𝐿1 𝑖𝑛𝑝 𝑠𝑡𝑦𝑙𝑒 + 𝐿 2 𝑠𝑡𝑦𝑙𝑒 : Style loss, anti-loss 𝐿𝑡𝑜𝑡𝑎𝑙 The following is the specific discriminant in Equation (3). : Total difference loss, and𝐿𝑝𝑒𝑟 : Perceptual loss in 𝑑(𝑖, 𝑗, 𝑢): Inputs parameters function. Equation (8). 𝑖𝑛𝑝 [0,0,0], 𝑖𝑓 𝑖 ≥ 𝑎 𝑎𝑛𝑑 𝑗 ≥ 𝑎 𝑎𝑛𝑑 𝑢 ≥ 𝑎 𝐿 1 𝑡𝑜𝑡𝑎𝑙 = 2𝐿𝑣𝑎𝑙𝑖𝑑 + 12𝐿ℎ𝑜𝑙𝑒 + 0.04𝐿𝑝𝑒𝑟 + 100(𝐿𝑠𝑡𝑦𝑙𝑒 + 𝑑(𝑖, 𝑗, 𝑢) = {[𝑖, 𝑗, 𝑢], 𝑖𝑓 𝑖 < 𝑎 𝑎𝑛𝑑 𝑗 < 𝑎 𝑎𝑛𝑑 𝑢 < 𝑎 𝐿 2 𝑠 𝑡 𝑦𝑙𝑒) + 100𝐿𝑎𝑑𝑣 + 0.3𝐿𝑣𝑎𝑟 (8) [𝑖, 𝑗, 𝑢], 𝑖𝑓 𝑖 > 𝑏 𝑎𝑛𝑑 𝑗 > 𝑏 𝑎𝑛𝑑 𝑢 > 𝑏 (3) 𝐿: Loss function. The weight of each loss term in Manhattan distance (MD) is determined by examining 50 The corresponding drawing processing drawing tests. The actual and unmasked processing modes are used, information feature vector 𝜒𝑖 The expression in Equation with 𝑀 representing the irregular binary mask, 𝐼𝑑𝑎𝑚 (4). representing the damaged mode, and 𝐼𝑖𝑛𝑝 representing the outcome mode in Equation (9-10). 𝑙𝜀(𝑔) = (1 − 𝜌)𝑙𝜀(𝑔 − 1) + 𝛾𝑓(𝜒𝑖(𝑔)) (4) 𝐿𝑣𝑎𝑙𝑖𝑑 = ‖𝑀 × (𝐼𝑖𝑛𝑝 − 𝐼𝑑𝑎𝑚)‖ (9) 1 𝑓 : Represents the adaptive function corresponding to the feature drawing feature vector 𝜒𝑖 of the drawing process. 𝐿ℎ𝑜𝑙𝑒 = ‖(1 − 𝑀) × (𝐼𝑖𝑛𝑝 − 𝐼𝑑𝑎𝑚)‖ (10) 𝛾𝜒 1 𝑖(𝑔) : The corresponding drawing processing analysis of the 𝜀𝑡ℎ dispensation in the actual application process. Rotate and restore the identification points against the Equation (5) contains for processing 𝜋𝑝 in Drawing original image. ℎ Is the connection point of the opening Processing𝐼𝐼. draw, placed in the drawing parallel to the identification graph 0  k  h (𝑣𝑥′ ′ 2𝑘+1,𝑖 , 𝑣𝑦2𝑘+1,𝑖) for each 𝐴𝑐𝑢(𝜋𝑝) = 𝑁𝑀𝐼(𝜋𝑝 , 𝜋 ∗) (5) identification point the following rotation operation is performed in Equation (11) 𝜋𝑝 and 𝜋𝑞 represent the processing of drawing processing. If less information is shared with the drawing processing base drawing, the base drawing is less accurate. Drawings based on image processing techniques that define the Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 259 𝑑𝑥′ ′ ℎ 𝑘 = 𝑣𝑥2𝑘+1,𝑖 − 𝑣𝑥𝑘,𝑖 𝑣𝑥 ′ 2𝑘+1,𝑖 𝑣𝑥𝑘,𝑖 { [ 𝑞𝑥 𝑖−2 𝑣𝑥𝑘,𝑖−𝑣𝑥𝑘,𝑖 𝑗 = ∑ 𝑖 ∑ = ∑ 𝑖 ?̃? {𝑖| =𝑗} 𝑘=1 𝑑𝑦′ ′ ′ ] = [𝑣𝑦 ] + (𝑣𝑥 𝑖𝛼 = 𝑐?̃? 𝑘+1,𝑖−𝑣𝑥𝑘,𝑖)𝑝𝑘 {𝑖| =𝑗} 𝑗𝛼 𝑘 = 𝑣𝑦2𝑘+1,𝑖 − 𝑣𝑦𝑘,𝑖 𝑣𝑦2𝑘+1,𝑖 𝑘,𝑖 { 𝑐 𝑐 ℎ 𝑐𝑜𝑠(−𝜃) − 𝑠𝑖𝑛(−𝜃) 𝑑𝑥′ 𝑖−2 𝑣𝑦 𝑞𝑦 𝑘,𝑖−𝑣𝑦𝑘,𝑖 [ ] × [ 𝑘] 𝑗 = ∑ 𝑖 ∑ = ∑ 𝑖 ?̃? 𝛼 = 𝑐?̃? (11) {𝑖| =𝑗} 𝑘=1 (𝑣𝑦 𝑠𝑖𝑛(−𝜃) 𝑐𝑜𝑠(−𝜃) 𝑑𝑦′ 𝑐 𝑘+1,𝑖−𝑣𝑦𝑘,𝑖)∘𝑝 {𝑖| =𝑗} 𝑖 𝑗𝛼 𝑘 𝑐 𝑘 (17) For each drawing 𝑃𝑖 and ?̃?𝑖 , calculate its barycentric Aggregated gradients 𝑞𝑥𝑗 , 𝑞𝑦𝑗 for group 𝑗 are defined by coordinates after removing the last point. In the formula, ℎ equations (14)–(17) based on value changes and is the length (number of nodes) of the i-th graph after the spacing(𝛥𝑥 original graph is split, ℎ′ 𝑖 , 𝛥𝑦𝑖). Weighted by coefficients 𝛼 and 𝑖 is inserted into the length that transformation parameter ?̃? has been found, and Equation (12–13) computes the next 𝑖, gradients are calculated using either straight finite differences or multistep two offset values. approximations, depending on whether 𝑥⁄𝑦 differences are ℎ′ zero. Calculate numeric information values for each plot 1 𝑣𝑥′ 1 𝑖−1 ′ ℎ𝑖−1 𝑖 = ∑ 𝑘=1 𝑣𝑥 𝑣𝑥 ℎ′ 𝑘 𝑖 = ∑ 𝑘=1 𝑣𝑥ℎ 𝑘 in Equation (18). { 𝑖−1 { 𝑖−1 ℎ′ 1 (12) ℎ 𝑣𝑦′ 1 𝑣𝑦 𝑖 = ∑ 𝑖−1 𝑖 = ∑ 𝑖−1 =1 𝑣𝑦 ′ ℎ′ 𝑘 𝑘 ℎ 𝑘=1 𝑣𝑦𝑘 𝑖−1 𝑞𝑥 𝑖−1 𝑗+𝑞𝑦𝑗 1, > 1 ?̃?𝑗 = { 2 𝑞𝑥 (18) 𝑗+𝑞𝑦𝑗 1 ℎ −1 0, < 1 𝛥𝑥𝑖 = ∑ 𝑖 𝑘=1 (𝑣𝑥 2 2ℎ 𝑘+1 − 𝑣𝑥𝑘)𝑝𝑘 { 𝑖−2 1 (13) ℎ 1 𝛥𝑦 𝑖− 𝑖 = ∑𝑘=1 (𝑣𝑦2ℎ 𝑘+1 − 𝑣𝑦𝑘)𝑝𝑘 𝑖−2 Once the digital data has been extracted, verify that the identification is valid. The method recognizes similarities Anzhao Equations (13-17) respectively compute the between the unique recognition and the extracted recognition points of the perpendicular and parallel acceptance by using the correlation coefficient organized of each drawing. 𝑐𝑜𝑟(𝑚, ?̃?)in Equation (19). ①𝛥𝑥 ∑𝑛−1 𝑖 ≠ 0&&𝛥𝑦𝑖 ≠ 0 𝑖=0 (𝑚𝑐𝑜𝑟(𝑚, ?̃?) = 𝑖?̃?𝑖) (19) √∑𝑛−1𝑚2 𝑛−1 2 𝑖=0 𝑖 √∑𝑖=0 ?̃?𝑖 𝑣𝑥′−𝑣𝑥 𝑞𝑥 𝑖 𝑗 = ∑ 𝑖 𝑖 = ∑ 𝑖 ?̃? 𝑖| =𝑗} 𝛥𝑥 𝑖𝛼 = 𝑐?̃? 𝑖 {𝑖| =𝑗} 𝑗 ∘ 𝛼{ 𝑐 𝑐 (14) The digital information ?̃? is taken from the recognized 𝑣𝑦′ graphics, and the correlation coefficient between the 𝑞𝑦 𝑖−𝑣𝑦𝑖 𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃? { 𝑗 = ∑ | =𝑗} 𝛥𝑦 {𝑖| =𝑗} 𝑖𝛼 𝑗 ∘ 𝛼{𝑖 𝑐 𝑖 𝑐 digital data m and m is𝑐𝑜𝑟(𝑚, ?̃?). ②𝛥𝑥𝑖 = 0 , 𝛥𝑦𝑖 ≠ 0 5 DXF file format ℎ𝑖−2 𝑣𝑥 𝑞𝑥 𝑘,𝑖−𝑣𝑥𝑘,𝑖 𝑗 = ∑ 𝑖 ∑ {𝑖| =𝑗} 𝑘=1 = ∑ 𝑖 ?̃? = 𝑐?̃? (𝑣𝑥 DXF is a data interchange file. AutoCAD supports saving 𝑐 𝑘+1,𝑖−𝑣𝑥 =𝑗} 𝑖𝛼 𝑗𝛼 𝑘,𝑖)𝑝𝑘 {𝑖| 𝑐 { a n d reading of DXF format files to exchange data with 𝑣𝑦′ 𝑞𝑦 𝑖−𝑣𝑦𝑖 𝑗 = ∑ 𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃? other applications. The DXF file is an ASCII file, so it is 𝑖| =𝑗} 𝛥𝑦𝑖 {𝑖| =𝑗} 𝑖𝛼 𝑗𝛼{ 𝑐 𝑐 helpful to use it as the evaluation basis for the answer, (15) check the correctness and rationality of the software production, and design appropriate scoring rules [20-21]. ③𝛥𝑥𝑖 ≠ 0, 𝛥𝑦𝑖 = 0 The proposed AutoCAD test system incorporates image- based similarity scoring with rule-based methods like 𝑣𝑥′ 𝑞𝑥 𝑖−𝑣𝑥𝑖 vector comparisons and DXF parameter matching, 𝑗 = ∑ 𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃? 𝑖| =𝑗} 𝛥𝑥 𝑗𝛼{ { 𝑐 𝑖 {𝑖| =𝑗} 𝑖𝛼 𝑐 e n s uring both structural and visual accuracy in assessment, ℎ𝑖−2 𝑣𝑦 𝑦 𝑞𝑦 aligning with human evaluation standards, and ensuring a 𝑗 = ∑ 𝑖 ∑ 𝑘,𝑖−𝑣 𝑘,𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃? {𝑖| =𝑗} 𝑘=1 (𝑣𝑦 𝑐 𝑘+1,𝑖−𝑣𝑦𝑘,𝑖)∘𝑝𝑘 𝑖| =𝑗} 𝑖𝛼{ 𝑗𝛼 𝑐 comprehensive evaluation process. (16) The DXF file consists of six parts: HEADER, CLASSES, TABLES, BLOCKS, ENTITIES, and OBJECTS. Each ④𝛥𝑥𝑖 = 0 ,𝛥𝑦𝑖 = 0 segment starts with group 0-SECTION and ends with group 0-ENDSEC. The format used here is FORTRAN I3, and the next line of different integer values represents different parameter values, such as strings, integer values, real numbers, etc., which have different meanings. 260 Informatica 49 (2025) 255-268 Z. Xiong et al. The DXF parsing method compares graphic primitives 7 Development tool selection based on spatial coordinates from entity definitions. The alignment between student and reference drawings is Because the system is built in AutoCAD, it is executed in evaluated using coordinate-based geometric matching. A AutoCAD, the commonly used software is Visual LISP, tolerance criterion accounts for small inconsistencies. VisualC, VisualB a total of three kinds [22], Table 2 is a Only relevant geometric aspects are highlighted by comparison of the above three kinds of software. filtering layer and annotation data. The system manages multiple DXF layers by parsing each layer separately and comparing those using specific Table 2: Feature comparison of three programming drawing components. DXF entity attributes interpret line languages. styles, allowing for small variations. DXF text entities extract annotations, including text and dimensions, and Easy to runnin flexibilit confidentiali assess alignment consistency, position, and content about learn and g y ty the standard drawing. use speed Geometric matching of drawing elements and DXF file Visua Differen good good quick parsing are the primary methods used for scoring, while l C ce image features help confirm the correctness of robotic Visua Differen reproduction. l good Difference slow ce LISP 6 Examination system processing flow Visua good good good quick l B When candidates open the examination questions, they are automatically loaded into the examination system through As can be seen from Table 2, the Visual LISP language is the interface of VBA and AutoCAD. After the examinee easy to learn but lacks flexibility when completing completes the answer, the submit action is executed, and complex system software. Visual C has strong functions the automatic evaluation engine is started. The software and high flexibility, but it is relatively complex, and runs in the background, firstly outputs the candidates' programmers have high requirements for computer answers to the DXF file, detects the corresponding knowledge, making it difficult to use and grasp quickly. standard answer DXF file according to the test question Visual B has the advantages of both Visual LISP and Visual number, compares, and calculates the score according to C. the scoring parameter table of the test question package. Disadvantage, this is one of the reasons for AutoCAD to Then upload it to the server and record it. The design switch to Visual B support. Therefore, to develop this process of the computer drawing test system is displayed system, it is appropriate to use Visual B. in Figure 1. AutoCAD has built-in VBA comprehensive development tools. (VisualBasicforApplication). 8 Design process of computer graphics examination system Compile the source program with GCC. The driver of the camera is uvcvideo, which supports two formats, YUYV and MPJEG. The device supports the image file imagebmp.bmp captured by the USB camera, such as streaming I/O operations. Use the ARM-Xilinx-Lnux cross-compilation environment to cross-compile the source files, and copy the executable files generated by compilation to SD. Use the command ARM-Xilinx-Linux-gnuueab-gccv4l2 grab. Compile the c-ozed-camera program, copy the compiled Figure 1: Processing flow of computer graphics executable file zeed-camera to the Zed-board, connect the examination system. USB camera to the Zedboard, connect the cd to the /dev folder, and use the ls command to confirm whether the dev directory is There are videoO devices. Executable files, if Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 261 any. Before executing the file, command chmod+x zed- camera or chmod777zed-camera to obtain the executable permission of the file. The former is only valid for the current user; the latter is valid for all users. Execute the executable program according to the command zed- camera, and as shown in Figure 2, code 1 shows the HyperTerminal code. Code 1: Information is displayed on the HyperTerminal. Support format: 1. 𝑌𝑈𝑉 4: 2: 2(𝑌𝑈𝑌𝑉) Figure 2: The program obtains the picture successfully. 2.𝑀𝐽𝑃𝐸𝐺 𝑓𝑚𝑡. 𝑡𝑦𝑝𝑒: 1 9 Vector transformation of images Using a vector drawing computer as input, before drawing, 𝑝𝑖𝑥. 𝑝𝑖𝑥𝑒𝑙𝑓𝑜𝑟𝑚𝑎𝑡: 𝑌𝑈𝑌𝑉 the target image needs to be converted into a vector diagram suitable for the execution of the drawing 𝑝𝑖𝑥. ℎ𝑒𝑖𝑔ℎ𝑡: 480 computer. As shown in Figure 3, the process includes grayscale conversion, binarization, isolated pixel removal, 𝑝𝑖𝑥. 𝑤𝑖𝑑𝑡ℎ: 640 edge refinement [23-24], position restriction, continuous curve detection, synthesis, and other actions. 𝑝𝑖𝑥. 𝑓𝑖𝑒𝑙𝑑: 1 𝑖𝑛𝑖𝑡/𝑑𝑒𝑣/𝑣𝑖𝑑𝑒𝑜0 [𝑂𝐾] 𝑔𝑟𝑎𝑏 𝑦𝑢𝑦𝑣 𝑂𝐾 𝑠𝑎𝑣𝑒/𝑢𝑠𝑟/𝑖𝑚𝑎𝑔𝑒_𝑦𝑢𝑣. 𝑦𝑢𝑣 𝑂𝐾 Figure 3: The vector transformation process of the image. 𝑐ℎ𝑎𝑛𝑔𝑒 𝑡𝑜 𝑅𝐺𝐵 𝑂𝐾 The system uses an image-to-vector procedure to extract vector features from rasterized student outputs for 𝑠𝑎𝑣𝑒/𝑢𝑠𝑟/𝑖𝑚𝑎𝑔𝑒_𝑏𝑚𝑝. 𝑏𝑚𝑝 𝑂𝐾 comparison with the standard, ensuring consistent evaluation despite the vector-based nature of AutoCAD The USB camera used supports both YUYV and MJPEG. drawings, which standardizes various input formats like The collected pictures in the two formats are saved in the scanned or non-DXF submissions in Equation (20). /usr folder and can be displayed in the picture browser. For processing convenience, first, convert the 3-channel color image collected by the camera into a single-channel A complete digital image processing system requires an grayscale image. image display system in addition to the image collection system. Add a display interface developed by Qt on Linux 𝑓 = 𝑓𝑅 ∗ 0.299 + 𝑓𝐺 ∗ 0.587 + 𝑓𝐵 ∗ 0.114 (20) to display the collected images. Here𝑓𝑅,𝑓𝐺,𝑓𝐵 respectively represent a 3-component image in the RGB space, and f represents a transformed grayscale image. An adaptive threshold technique was used to transform the acquired grayscale image into a binary image [25–26]. Establish the binarization threshold as 𝑇 (𝑎 ≤ 𝑇 ≤ 𝑏) and ascertain the image's gray value range [𝑎, 𝑏] in Equation (21). 262 Informatica 49 (2025) 255-268 Z. Xiong et al. 1, 𝑓(𝑥, 𝑦) ≥ 𝑇 As shown in Figure 6, using different thicknesses to 𝑓𝑇(𝑥, 𝑦) = { (21) 0, 𝑓(𝑥, 𝑦) < 𝑇 represent different vector curves, the image consists of 4 curves in total. The plotter uses it to verify the accuracy of the vector transformation by creating a physical reference by converting standard images to vector paths. The following assures that student drawings are appropriately interpreted by the system's scoring engine, which is based on picture similarity and DXF matching. As a result, both digital and tangible elements support evaluation accuracy and consistency. Among them, 𝑓𝑇 represents the transformed binary image. An example of the transformed effect is shown in Figure 4. Figure 6: Initial vector curve. To further advance the drawing effectiveness of the drawing computer and reduce the number of actions of the drawing computer to raise and drop the pen, the divided vector curves are merged, and the adjacent non-closed vector curves are converted into closed curves. As shown in Figure 7, after merging and optimization, the vector curves in the figure are reduced from 4 to 3. Figure 4: Example of a binarized image. Through skeleton extraction, the image edge curve with a single pixel width is obtained as shown in Figure 5. Figure 7: Merged and optimized vector curves. Vector curves were successfully decreased from 4 to 3 by the merging stage without sacrificing structural integrity in Figure 7 enhancing drawing efficiency. After the above series of image processing operations, the target image to be drawn can be converted into a vector curve recognizable by the drawing computer, and the drawing operation can Figure 5: Refinement of the image. be performed after being downloaded to the computer actuator. A single-pixel-width edge was consistently formed by the Therefore, image entropy can be selected as a refinement process in Figure 5. For the drawn image to be characterization feature of Chinese paintings, calligraphy the center of the canvas, the image must be snapped in images, and man-made images [27-28]. Equation (22) uses position, leaving only the valid portion of the image. The the image entropy𝑝(𝑧𝑖)(𝑖 = 0,1,2,⋯ , 𝐿 − 1)as a random method in this paper first obtains the contour of the variable that represents grayscale, where 𝐿 is the number effective image through the edge detection algorithm, of identifiable grayscales and the related histogram. determines the four outermost pixel points, and then cuts 𝐿−1 the rectangle determined by the four points to obtain the 𝑒 = −∑𝑖=0 𝑝(𝑧𝑖) 𝑙𝑜𝑔2 𝑝 (𝑧𝑖) (22) required image and its coordinate information. Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 263 From the nature of entropy, the average uncertainty of the reference directory. Go to the directory where the equal probability distribution source is the largest, and the installation files were extracted and enter the following uncertainty of the random variable distribution is the command. largest at this time. The images of Chinese painting and The "Compile and Make Runtime Library Files" part calligraphy are obtained from nature, and artificial images supports the image capture component for image-to-vector are produced by people's subjective thoughts, so the transformation, creating reference vector diagrams and images of Chinese painting and calligraphy are more verifying robotic drawing reproduction. To ensure end-to- complicated than artificial images. end integrity of the suggested evaluation workflow, The image's edge, with grayscale variation, boundary, and supporting visual comparison and physical drawing direction, contains the most image information. Systematic validation, even though not directly related to CAD measure measures regional difference, maximal when all scoring. Algorithm 1 displays the command to compile the levels are equal [29]. The system employs DXF parsing executable file. and vector manipulation to address edge scenarios, compensate for scaled or rotated drawings, reduce partial Algorithm 1: Library Files to installation and extracted occlusions, and validate components across layers before command scoring, thereby preserving grading accuracy and enhancing robustness, while addressing incorrect layer 𝑑𝑑 𝑖𝑓 =/𝑑𝑒𝑣/𝑧𝑒𝑟𝑜 𝑜𝑓 = 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 𝑏𝑠 = utilization. The appropriate histogram is making𝑝(𝑧𝑖)(𝑖 = 1𝑀 𝑐𝑜𝑢𝑛𝑡 = 80 0,1,2,⋯ , 𝐿 − 1), where 𝐿 is the number of distinct gray levels, and Equation (23) defines the consistency𝑈. 𝑚𝑘𝑓𝑠. 𝑒𝑥𝑡4 − 𝐹 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 𝑈 = ∑𝐿−1𝑖=0 𝑝2(𝑧𝑖) (23) 𝑐ℎ𝑚𝑜𝑑𝑔𝑜 + 𝑤 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 𝑚𝑜𝑢𝑛𝑡 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 − 𝑜 𝑙𝑜𝑜𝑝/𝑚𝑛𝑡 From the perspective of the generation mechanism of Chinese paintings, calligraphy images, and artificial images, Chinese paintings, and calligraphy images have 𝑐𝑝 − 𝑟𝑓/𝑢𝑠𝑟/𝑙𝑜𝑐𝑎𝑙/𝑇𝑟𝑜𝑙𝑙𝑡𝑒𝑐ℎ/𝑄𝑡 − 4.7.3/∗/ local obvious recognition and other local features [30]. 𝑚𝑛𝑡𝑐ℎ𝑚𝑜𝑑 𝑔𝑜 − 𝑤 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. Therefore, consistency can be a feature that distinguishes these two images. 𝑖𝑚𝑔 The second-order moment (homogeneous 𝑢𝑚𝑜𝑢𝑛𝑡/𝑚𝑛𝑡 variance𝜎2(𝑧) = 𝜇2(𝑧)) is another important feature of the identification feature. It represents the measure of gray- Therefore, the library files under the /usr/local/troltech/Qt- level contrast and can establish a descriptor about 4.7.3/ folder are all included in the newly made 80M image smoothness, which is expressed by Equation (24). file. The library is ready. 1 𝑅 = 1 − 1+𝜎2 (24) First, the AutoCAD exam questions are classified (𝑧) according to the knowledge points of the exam. The important functions and knowledge points of AutoCAD A region's relative brightness smoothness is measured by are drawing and editing of graphics, dimensions and its cleanliness. 𝑅 = 0 : In the area with constant dimensions, text styles and annotations, setting of brightness; 𝑅 = 1 in the area where the gray level value environment variables, query, view scale, block, pattern deviates significantly. filling, etc. [14]. Each exam question sets a scoring parameter table according to the knowledge points. 10 Compile and make runtime library The image processing technique is a simulation algorithm files that mimics the solid annealing process in real life. This process involves heating and cooling a solid, causing disorder and increasing internal energy [16]. The particles In the directory f where the project is located, use the are then sorted slowly, reaching equilibrium at every command qmake-Project to obtain the project file lab2 to temperature. The image processing method consists of two generate qtcamera.pro. Then use the qmake command to stages: drawing processing and recognition. The specific generate the makefile file, and use make to compile the steps include achieving the ground state at room executable file. temperature, minimizing internal energy, and achieving The execution of the Qt software depends on the the equilibrium state at every temperature. executable library, which is created and mounted to the 264 Informatica 49 (2025) 255-268 Z. Xiong et al. 1) Each rendering packet is sent at a specific time interval, method needs to update the feedback matrix to determine and the source node gets ACK or NAK feedback the different types of packets, and the image processing information to create a feedback matrix T that keeps the method not only combines all lost packets for update up to current. The origin node processes N retransmission but also the number of recommended rendering packets for K receiving nodes. packets is determined by Mmax. 2) After the resource node has processed N packets, it enters the retransmission phase with time. All missing 2) The image processing method is not affected by the packets make up the largest coefficient 𝐷 = distribution of lost packets, and the number of {𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑛} in the set 𝐺 = recommended packets is determined mainly by the {𝑔𝑖1, 𝑔𝑖2, 𝑔𝑖3, ⋯ , 𝑔𝑖𝑛}(1 ≤ 𝑖 ≤ 𝑀𝑚𝑎𝑥()) coefficient vector receiving node with the most lost packets. Algorithm 2 𝑀𝑚𝑎𝑥 (chosen randomly from a limited domain) to shows the pseudocode of the main module. recommend all missing plotting packets.𝐹𝑞 generates 𝑀𝑚𝑎𝑥 recommendation packages. The maximum number Algorithm 2 Pseudocode of the main module of lost packets at all nodes of 𝑀𝑚𝑎𝑥 is Equation (25). function main(): 𝑀 𝑚𝑎𝑥 {∑𝐾𝑗=1𝑇(𝑖, 𝑗)} (25) 𝑖∈{1,2,⋯,𝐾} 𝑚𝑎𝑥 Input 3)After resending the recommended drawing package, each receiving node approximates and shows the image = getImageInput() arrangement of its own recommendation vector matrix G. 𝑀𝑚𝑎𝑥 If 𝑟𝑖 ≠ 𝑁, the node means that G does not reach the dxf = getDXFInput() complete permutation, then the node needs to notify the source node and resend some recommended packets, and Preprocessing G can be a complete permutation. Here by indicating the required recommendation grouping, the specific situation gray = toGrayscale(image) in Equation (26) binary = binarize(gray) 𝑁 − 𝑟𝑖 , 𝑟𝑁𝑖 = { 𝑖 ≤ 𝑁 (26) 0, 𝑟𝑖 ≥ 𝑁 edges = refineEdges(binary) in the formula𝑖 = 1,2,⋯ , 𝐾. vector_img = extractVectors(edges) In the drawing resend phase, if the receiving node receives DXF Feature Extraction the recommended drawing packet, it is 0. When a node loses two recommended packages Ri𝑀𝑚𝑎𝑥 Ni 𝑅𝑖then𝑁𝑖 = student_feats = parseDXF(dxf) 2。 ref_feats = loadReference () 4) The source node updates based on the feedback value of each receiving node, and generates the recommended Scoring group key algorithm in the new retransmission stage.𝑁𝑖 Mmax 𝑀𝑚𝑎𝑥(4)。 hist_score = histogramSimilarity (vector_img, ref_feats) dist_score = sum(manhattanDist(student_feats[i], 5) 3) and 4) are repeated until all receiving node vector ref_feats[i]) for i in range(len(ref_feats))) matrices are equal to N. That is, 𝑀𝑚𝑎𝑥 , with no lost packets, the receiving node can decode the original comp_score = completeness (student_feats, ref_feats) drawing packets using Gaussian elimination. Final Score It can be seen that the differences between the ERA and AutoCAD computer drafting methods presented here are mainly shown in the following points. final_score = 0.5 * hist_score + 0.3 * (1 - normalize(dist_score)) + 0.2 * comp_score 1) Image processing methods have low complexity in combining lost packets. The AutoCAD computer drawing print("Score:", final_score) Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 265 Retransmission & Drawing (Optional) furniture is lacking, and the score of the project test paper is 92. Teachers who graded the papers by hand achieved packets = 92 grades. The input B student diagram completes the feedbackRetransmit(preparePackets(vector_img)) display of the axis, doors and windows, part of the wall, and part of the size. The placement of furniture, stairs, and if final_score>= threshold: a part of the wall was incomplete, and the score for the program volume was 80. Teachers who manually graded the papers achieved 80 grades. According to the output drawRobot(pathPlan(packets)) grades of students A and B, the program calculation results based on the similarity principle are consistent with the 11 Examples and results analysis integer part of the manually collected results, and the decimal point can be rounded off. In the specific operation of Auto CAD computer drawing Image similarity and CAD content evaluation are included pictures, by comparing the answer pictures of the two in the scoring. First, histogram-based similarity is used to students with the correct answer pictures, this operation compare answer photos to standard images. After that, can also test the rationality of the computer program parsed DXF files are used to evaluate CAD-specific system. The graph is divided into three different types of aspects, such as axes, walls, doors, and dimensions. CAD drawings, the graphs of the same size as the standard Student A and B's scores (92 and 80) match the hand- picture are cut out, a folder is created to save them together, assessed results, indicating both visual accuracy and and then the similarity calculation method is used to content completeness, which could be explained by the calculate the corresponding answer. The import system of blended technique. the A and B pictures given by the two students is shown in Using the above design scheme, a suspended drawing Figures 8 and 9 below. computer drawing examination system based on a servo motor drive is developed. The main parameters are shown in Table 3. Table 3: The table of drawing computer parameters Hangi white Suppl maxim Rotatin ng qualit board y um g point y(kg) height volta torque( speed(ra spaci (m) ge(V) kg/cm) d/min) ng(m) 0.335 0.86 0.45 11.1 2.22 300 Figure 8: Calculation program for reading A students' scores (92). The servo motor-powered drawing machine physically reproduces digital drawings to verify the accuracy of the system's vector interpretation. The machine provides real execution for verifying vector outputs and evaluating picture processing integrity, ensuring the practical robustness of the suggested AutoCAD evaluation system. Vectorizer outputs maintained an average speed of 1.75 cm/s and a drawing accuracy of better than 0.1 cm, Figure 9: Calculation program for reading B students' according to quantitative validation using the drawing scores (80). robot (Table 3). That suggests the vector conversion pipeline had little distortion. However, when the result is Histogram-based comparison methods were used to weak contrast edges or overlapping stroke regions during calculate the similarity scores between the student binarization, the method introduces small alignment errors drawings and the reference. Early instances (Students A that could impact snapping accuracy or curve continuity. and B) are depicted in Figures 8 and 9, which closely Usually, post-processing techniques like coordinate match grades assigned by humans. anchoring and contour-based cropping help to reduce these In the input drawing of student, A, the student has inaccuracies. As demonstrated by student score completed the axis, walls, doors and windows, as well as comparisons (e.g., students A and B matched human- some dimension display, the configuration of stairs and evaluated scores), the vectorization accuracy generally closely matches manual grading outputs, indicating that 266 Informatica 49 (2025) 255-268 Z. Xiong et al. the vector transformation procedure is both mathematically robust and pedagogically reliable for AutoCAD examination assessment. To facilitate the drawing computer to adjust the drawing position to adapt to different types of whiteboards, an easy- to-operate GUI user interface is designed, which can easily perform target image input, motor position adjustment, whiteboard parameter setting, drawing start and stop control, etc. To validate the strength of the adopted design and the proposed method, a drawing experiment was carried out using the developed drawing computer drawing examination system. After determining the Base and whiteboard height information, perform moment analysis on each force point on the whiteboard, and the results are shown in Figure 10. Among them, Motor Load represents the moment received. It can be seen that the torque is too large only in the area of y=0, and the other areas meet the requirement that the load torque is less than 30% of the Figure 11: Limiting the movement position of the maximum torque. drawing computer. Table 4: The experimental results of the drawing speed and accuracy test of the drawing computer. set Measured require actual (cm distance(c distance(c d speed(cm/ ) m) m) time(s) s) 20 19.9 11.38 1.749 0.1 10 9.9 5.71 1.733 0.1 5 5.0 2.81 1.780 0.0 Figure 10: Moment analysis of each point on the From the results in Table 3, it can be seen that the average whiteboard. drawing speed of the computer is 1.75 cm/s, the drawing accuracy is better than 0.1 cm, and the drawing speed and The dead zone positions of the two suspension points are accuracy meet the design requirements. To verify the removed, and the positions are limited according to the drawing effect of the drawing computer, several pictures principle that the load moment is less than 30% of the were randomly selected for drawing, and the drawing maximum moment, and the results shown in Figure 11 are results are demonstrated in Figure 12. obtained. It can be seen that the motion position does not include the upper left and upper right fan-shaped areas, this is because the two hanging points are positions that the drawing computer cannot reach. In addition, to prevent the motor torque from being too large, the moving area of the drawing computer is limited (the rectangular box area in Figure. 11). To verify the drawing speed and accuracy of the drawing computer, the moving distances of 20 cm, 10 cm, and 5 cm were set respectively, and the average value Figure 12: Image rendering example was obtained after 10 experiments in each group. The outcomes are illustrated in Table 4. The proposed AutoCAD assessment system integrates CAD-specific content analysis with histogram-based image similarity for objective, consistent scoring. AutoCAD uses a servo motor-driven drawing machine for physical verification, ensuring higher accuracy, less subjectivity, and practical dependability. The system Automated AutoCAD Drawing Assessment via Image Processing and... Informatica 49 (2025) 255-268 267 surpasses conventional approaches in automation, [3] Quiminsao CMD and Sumalinog JA (2023). Factors accuracy, and adaptability for CAD-based examination affecting the students’ achievement and attitude in and assessment activities. learning AutoCAD. Australian Journal of Engineering and Innovative Technology, 5(3), 130– 12 Conclusions 140. http://dx.doi.org/10.34104/ajeit.023.01300140 [4] Türkyılmaz T (2023). Visual Basic drawing codes from 2D AutoCAD drawings and machine parts For the calculation criteria of the similarity of the pictures, applications. Journal of Innovative Engineering using MATLAB to calculate the picture, the system of Applications, 13(2), Article 4. drawing the similarity of the pictures, using the students to http://dx.doi.org/10.7176/JIEA/13-2-04 draw pictures and compare the correct answers, using three [5] de Almeida JS and Baratto NS (2022). Evaluation of different types of pictures and the money-making answers screencasts settings applied to CAD online teaching. to compare and calculate, calculate and export the correct In: Más allá de las líneas. La gráfica y sus usos: XIX answer. Realize the function of only scoring by computer. Congreso Internacional de Expresión Gráfica It improves work efficiency and saves manpower and Arquitectónica, pp. 639–642. Universidad material resources for manual scoring. The development Politécnica de Cartagena. and application market of computer-only scoring systems http://dx.doi.org/10.31428/10317/11414 is ideal. However, a small sample size is a disadvantage. [6] Li X, Wang X, Li J, Zhang M, Al Ansari MS, and To ensure statistical robustness and show the Goyal B (2023). Development of NC program generalizability of the system's evaluation accuracy, future simulation software based on AutoCAD. Computer- research will incorporate a broader display of similarity Aided Design and Applications, Special Issue, S3, scores across a varied collection of student submissions. 72–83. Method also features a graphical user interface for user http://dx.doi.org/10.14733/cadaps.2023.S3.72-83 convenience. Complex overlapping designs and problems [7] Gutiérrez de Ravé S, Gutiérrez de Ravé E, and with contrast could pose problems for the system. Future Jiménez-Hornero FJ (2025). Enhancing efficiency research could include AI for adaptive scoring refinement, and creativity in mechanical drafting: A comparative study of general-purpose CAD versus specialized assist 3D CAD review, and improve edge recognition. To toolsets. Applied System Innovation, 8(3), 74. demonstrate the method's statistical robustness and https://doi.org/10.3390/asi8030074 generalizability, future research will need to conduct [8] Eltaief A, Ben Amor S, Louhichi B, Alrasheedi NH, additional examination using a more extensive and varied and Seibi A (2024). Automated assessment tool for dataset. A major focus of future research will be thorough 3D computer-aided design models. Applied Sciences, statistical validation using larger datasets. The system's 14(11), 4578. https://doi.org/10.3390/app14114578 analytical depth and evaluation precision will be improved [9] Zhang N, Li F, and Zhang E (2023). The machine in subsequent work by incorporating quantitative tracking vision dial automatic drawing system—Based on mechanisms during the vector transformation process. The CAXA secondary development. Applied Sciences, result will allow the calculation of vectorization errors and 13(13), 7365. https://doi.org/10.3390/app13137365 the identification count of geometric shapes. Large-scale [10] Fakhry M, Kamel I, and Abdelaal A (2021). CAD quantitative validation will be incorporated into future using preference compared to hand drafting in research. Future research will take into account more architectural working drawings coursework. Ain extensive statistical comparisons for a thorough Shams Engineering Journal, 12(3), 3331–3338. assessment. To enhance item detection and analysis https://doi.org/10.1016/j.asej.2021.01.016 accuracy, future studies will use CNNs trained on [11] Hsu HH, Wu CF, Cho WJ, and Wang SB (2021). annotated CAD datasets to improve drawing structure Applying computer graphic design software in a recognition. computer-assisted instruction teaching model of makeup design. Symmetry, 13(4), 654. References https://doi.org/10.3390/sym13040654 [12] Zhang J (2025). Attention mechanism-enhanced [1] Ibarrola F, Lawton T, and Grace K (2023). A model for automated simple brush stroke painting. collaborative, interactive, and context-aware drawing Informatica, 49(20). agent for co-creative design. IEEE Transactions on https://doi.org/10.31449/inf.v49i20.7688 Visualization and Computer Graphics, 30(8), 5525– [13] Cheng J, Yang L, and Tong S (2024). Recognition and 5537. https://doi.org/10.1109/TVCG.2023.3293853 analysis of painting styles with the help of computer [2] Fraile-Fernández FJ, Martínez-García R, and vision techniques. Informatica, 48(21). Castejón-Limas M (2021). Constructionist learning https://doi.org/10.31449/inf.v48i21.6891 tool for acquiring skills in understanding [14] Zhao YQ (2017). Transcending images and forms: standardized engineering drawings of mechanical The theory of expressive aesthetic value of traditional assemblies in mobile devices. Sustainability, 13(6), 3305. https://doi.org/10.3390/su13063305 268 Informatica 49 (2025) 255-268 Z. Xiong et al. Chinese freehand painting. Journal of Aesthetic [28] Timoftei S (2018). Industrial robot in fine art: Can Education, 10(8), 3023–3034. an industrial robot draw a binary image? IOP [15] Lu G (2018). An analysis of the application of Conference Series: Materials Science and traditional painting and calligraphy elements in the Engineering, 78(4), 110–120. design of theme hotels. Journal of Heihe University, https://doi.org/10.1088/1757-899X/399/1/012019 32(4), 329–335. [29] Feng M, Ying L, Sun G, Dong Y, Zhang F, and Liu [16] Jian M, Dong J, Gong M, Yu H, Nie L, and Yin Y Y (2018). Adaptive processing of dimensioning tire (2020). Learning the traditional art of Chinese patterns in engineering drawings. Chinese Journal calligraphy via three-dimensional reconstruction and of Automotive Engineering, 15(8), 370–374. assessment. IEEE Transactions on Multimedia, [30] Wang G, Liu S, Liu Q, Zhao S, Zhao X, and Li Y 22(4), 970–979. (2017). Micro-arrayed stretch drawing process of https://doi.org/10.1109/TMM.2019.2931390 nanocrystalline Ni-Co foils with soft-male-die. [17] Wang G, Zhao S, Liu S, and Siyu L (2017). Micro- Journal of Materials Processing Technology, arrayed stretch drawing process of nanocrystalline 240(4), 806–820. Ni-Co foils with soft-male-die. Journal of Materials https://doi.org/10.1016/j.jmatprotec.2016.10.038 Processing Technology, 78(4), 110–120. [18] Lee IK, Lee SY, Kim DH, Lee JW, and Lee SK (2018). Wire drawing process design for fine rhodium wire. Transactions of Materials Processing, 15(8), 370–374. [19] Wang Y (2018). Digital subsistence of Chinese calligraphy fonts. Packaging Engineering, 215(1), 806–820. [20] Yang Q (2018). Technical operation analysis of Photoshop in Premiere header image processing. China Computer & Communication, 93(3), 1–8. [21] Nakagawa M, Sutou K, and Hayakawa T (2020). Reproduction of additive-type fluorescence moiré fringes by image drawing software and study of accuracy of fluorescence imprint alignment. Japanese Journal of Applied Physics, 5(5), 445– 452. https://doi.org/10.35848/1347-4065/ab5cbe [22] Lin J and Chen H (2019). Application of image processing technology in graphic design. Modern Electronics Technique, 73(7), 40–55. [23] Cao G (2018). The history and aesthetic features of Chinese literati painting. Journal of Tianjin Academy of Fine Arts, 29(7), 143–148. [24] Cheng H, Huijie L, Luo R (2019). Research on geometric characteristics of asphalt mixture aggregate based on image processing. Journal of Wuhan University of Technology (Transportation Science & Engineering), 21(5), 773–791. [25] Yin Y and Antonio J (2020). Application of 3D laser scanning technology for image data processing in the protection of ancient building sites through deep learning. Image and Vision Computing, 102(5), 173–196. https://doi.org/10.1016/j.imavis.2020.103982 [26] Yang S, Zhang X, and Wang F (2017). Application of map GIS image analysis system in making design drawing of regional gravity points. Geological Survey of China, 32(4), 329–335. [27] Wang C and Han D (2017). Research on the construction of graphic image cooperative processing system based on HTML5 technology. Boletin Tecnico / Technical Bulletin, 55(15), 375– 384. https://doi.org/10.31449/inf.v49i12.8907 Informatica 49 (2025) 269-280 269 Optimization of Dynamic Energy Management Strategy for New Energy Vehicles Based on Multi-Agent Reinforcement Learning Xiaoyu Zhang Automotive Academy, Henan Communications Vocational and Technical College, Zhengzhou Henan, 450000, China E-mail: zxiaoyhappy@163.com Keywords: battery degradation, energy management strategies, fuel economy, new energy vehicle (NEV), power distribution, scalable satin bowerbird optimizer-driven multi-agent deep Q-Network (SSB-MADQN) Received: April 14, 2025 The development of New Energy Vehicles (NEVs), such as battery electric vehicles, is vital to addressing global issues like environmental pollution and fossil fuel depletion. However, optimizing their energy management strategies (EMSs) is complex due to conflicting goals, dynamic driving conditions, and system nonlinearity. This study proposes a dynamic EMS based on Multi-Agent Reinforcement Learning (MARL) using a Scalable Satin Bowerbird Optimizer-driven Multi-Agent Deep Q-Network (SSB- MADQN). The approach aims to enhance fuel economy, maintain battery State of Charge (SOC), and reduce battery degradation in real-time driving scenarios. Prior to training, data preprocessing— including min-max normalization and Principal Component Analysis (PCA)—improves learning efficiency. The MADQN framework consists of agents representing subsystems such as the engine, battery, and regenerative braking, each trained using a deep Q-network with three hidden layers (128-64-32 neurons). The dataset comprises 5,000 samples with 13 features, including vehicle speed, power demand, and battery performance. Evaluated on HWFET and WLTC driving cycles, the proposed strategy reduces fuel consumption by 0.912 L (WLTC) and 0.681 L (HWFET) compared to traditional methods. It effectively regulates SOC and reduces high-power discharge events, confirming the robustness of MARL for adaptive and efficient EMS in NEVs. Povzetek: Raziskava predlaga dinamično strategijo upravljanja z energijo (EMS) za NEV na osnovi MARL (SSB-MADQN). Optimizira porabo goriva, stanje napolnjenosti baterije (SOC) in zmanjšuje degradacijo, s čimer izboljša učinkovitost v realnem času. 1 Introduction (RL) [4]. Figure 1 shows the dynamic energy management strategy for NEVs. The growing demand for NEVs, which includes hybrids and battery electric vehicles, occurs because they serve as an environmentally friendly replacement for traditional internal combustion engine vehicles that offer improved air quality decreased greenhouse gas emissions, and reliable energy systems [1]. Strong worldwide climate change understanding, along with decreasing fossil fuel reserves, has made NEV development essential for countries implementing sustainable transportation solutions [2]. Conventional EMS approaches, such as rule- based, fuzzy logic, or model predictive control methods, rely on pre-defined heuristics or offline optimization and often fail to adapt in real-time to complex, dynamic Figure 1: Dynamic energy management strategy for environments like varying road gradients, traffic NEVs conditions, and driving behaviours [3]. The growing complexity of NEVs and their need for adaptive, real-time Reinforcement learning has shown significant promise in decision-making have thus pushed the investigation EMS optimization by enabling systems to accumulate toward leveraging artificial intelligence (AI) techniques reward functions, such as fuel efficiency or battery health such as machine learning (ML) and reinforcement learning [5]. However, most existing RL-based EMS frameworks 270 Informatica 49 (2025) 269-280 X. Zhang et al. operate under a single-agent paradigm, where the entire for new energy vehicles (NEVs). Current systems face decision-making process is centralized, which limits challenges in optimizing fuel efficiency, battery health, scalability and does not fully represent the distributed and driving performance simultaneously, especially under nature of NEV components. In reality, energy management dynamic driving conditions. By leveraging Multi-Agent involves coordination between multiple subsystems [6]. Reinforcement Learning (MARL) and the novel SSB- The vehicle dynamics are modeled to include real-world MADQN approach, this research aims to reduce fuel constraints such as regenerative braking, load variations, consumption while maintaining optimal battery SOC and and battery degradation metrics [7]. Despite various minimizing degradation, ultimately contributing to more conventional EMS strategies yielding acceptable sustainable and efficient NEV operation in real-world performance under ideal conditions, they often fail in scenarios. unpredictable or highly dynamic driving environments. By The research is comprised of the following sections: In leveraging the strengths of multi-agent systems and Section 2, a list of relevant works was presented. In metaheuristic-optimized DL models, it offers a robust, Section 3, the methodology is described. In Section 4, the adaptive, and intelligent EMS that is both scalable and findings are presented. The discussion portion is provided energy-efficient. It highlights the transformative potential in Section 5, and Section 6 contains the conclusion. of AI-driven strategies in the automotive domain, particularly for real-time optimization and sustainable 2 Related work energy utilization in NEVs. To address these limitations, MARL has emerged as an A novel multiple-input and multiple-output (MIMO) innovative solution for optimizing EMS in a decentralized control technique based on Multi-Agent Deep and cooperative manner. In MARL-based EMS, different Reinforcement Learning (MDARL) was examined in [8] vehicle components are modeled as intelligent agents, such for the multi-mode photovoltaic EV. Two learning agents as a battery agent and an engine agent that learn to make would collaborate under the MDARL, utilizing the deep decisions based on local observations and collaborate to deterministic policy gradient (DDPG) algorithm by achieve a global objective. It allows for distributed control, implementing a handshaking technique that provided a reduced computational complexity, and more effective relevance ratio. To improve fuel economy, [9] provided a adaptation to real-time driving dynamics. A novel MARL- unique EV EMS based on the MDARL architecture. Under based EMS framework is proposed using an SSB- power limits, the EMS effectively achieved optimal power MADQN. The SSB is a nature-inspired metaheuristic transmission between the engine and battery. algorithm based on the mating behavior of satin The optimal functioning of a fleet of EVs that were bowerbirds, known for balancing exploration and directed to supply power to a group of clients at various exploitation efficiently. The aim is to enhance fuel places was covered in [10]. MARL was used in a economy, sustain battery SOC, and decrease battery Decentralised Markov Decision Procedure reformulation degradation under dynamic driving conditions. framework to be practicable for a fleet of EVs to function well and provide energy to numerous clients at various 1.1 Key contribution places. A unique optimum energy management approach based on the suggested MDARL technique was presented Data Collection: The dataset captures real driving in [11]. It used a deep neural network to train a strategy conditions, fuel consumption, power distribution, and based on multi-agent deep deterministic policy gradient battery health metrics specific to NEV scenarios. (MADDPG) learning capacity and stacked denoising auto- Data preprocessing: Applied data cleaning and min-max encoders. By considering the different characteristics of normalization to standardize input variables, ensuring both electrical and thermal energies. consistent scale and reducing data noise for learning A MADRL optimization approach was proposed in [12] stability. for energy control with EV charging development. To Feature extraction: Used PCA to extract 12 principal determine the optimal choice, the aggregator and components, preserving 95% variance for improved prosumers were designed to be intelligent agents that training efficiency and dimensionality reduction. communicate with one another. Utilizing EV battery Proposed method: SSB-MADQN, a MARL-based scheduling, prosumers might save on power costs. A new framework with decentralized agents and a Satin Multi-Agent ActorCritic (MA2C) system was examined in Bowerbird-optimized DQN for dynamic NEV energy [13], which was specifically designed for mixed-traffic management. situations. The MA2C algorithm offers an extensive method of managing urban traffic that prioritizes 1.2 Motivation effectiveness, safety, and passenger security. To effectively recommend public charging stations, [14] The motivation for this research is driven by the need for anticipated a Multi-Agent Spatio-Temporal Reinforcement more effective and adaptive energy management strategies Learning (Master) that takes into consideration several Optimization of Dynamic Energy Management Strategy for New... Informatica 49 (2025) 269-280 271 long-term spatiotemporal characteristics. The Demand agent reinforcement learning (MARL) for energy Response potential in smart homes using a multi-agent management in smart systems. It highlights diverse reinforcement learning framework enhanced with applications ranging from EVs and smart grids to smart BiLSTM and Attention Mechanism for improved data homes using algorithms like MADDPG, MA2C, and efficiency and handling stochastic household loads [15]. BiLSTMA-MADDPG. While most approaches show The BiLSTMA-MADDPG model improves data improved performance in energy savings and efficiency, efficiency, convergence speed, and scalability in common limitations include coordination complexity, high controlling household appliances under limited training computational needs, and data inefficiency. samples. Table 1 presents recent advancements in multi- Table 1: Contrast examination of traditional works Ref. Year Area Focused Algorithms Limitations Performance [8] 2023 Energy Management in MADRL, DDPG, Requires careful Energy savings can range Multi-mode plug-in Hand-shaking tuning of DDPG from 4% to 23.54% when hybrid EVs Strategy, Relevance parameters; compared to a single-agent Ratio learning system and a rule-based performance is system. sensitive to learning rate [9] 2025 Hybrid EVs, Energy MADRL, MADDPG Complexity in Fuel consumption was Management Strategy multi-agent reduced by 26.91% coordination, (WLTC) and 8.41% simulation-based (HWFET), improving validation only EMS robustness. [10] 2022 Smart Grids, Multi- MARL, Decentralized High initial Significant reduction in Agent Systems, EVs. Markov Decision training complexity simulation time; superior Process (Dec-MDP), assumes accurate scalability and efficiency Actor-Critic Networks agent-environment modeling. [11] 2023 Optimal Energy MADRL, Stacked Requires high Achieved optimal dispatch Management, Smart Denoising Auto- computational of electric and thermal Grid, Multi-Energy Encoders framework resources, energies, and reduced MicroGrids. complexity in emissions and costs. decentralized implementation, and training convergence [12] 2023 Smart Grid Energy MADRL, Real-Time High Mean power consumption Management, EV Pricing, Smart Agent computational was reduced by 9.04% (vs. Scheduling, Solar Interaction requirements for no EV usage) and reduced Photovoltaic (PV) real-time DRL. by 39.57% (vs. Integration conventional pricing) [13] 2024 Smart Cities, MA2C, Complexity of Outperforms existing Autonomous Vehicles, Reinforcement multi-agent models in lane-changing Sustainable Mobility Learning, Actor-Critic coordination; efficiency, safety, comfort, Architecture Requires realistic and inter-vehicle traffic data for cooperation. deployment [14] 2021 EVs Charging MA2C Framework, Required Outperforms 9 baseline Recommendation, Centralized Attentive coordination approaches in Smart Mobility, DRL Critic, Delayed Access among distributed recommending charging Strategy agents stations. [15] 2023 Demand Response in BiLSTMA-MADDPG Non-stationary Improved data efficiency, Smart Homes (Multi-Agent RL) environment; data faster convergence, and inefficiency better scalability with small samples. 272 Informatica 49 (2025) 269-280 X. Zhang et al. 3 Methodology power become visible through off-diagonal scatter plots. Figure 3 shows the data exploration. The methodology involves modeling the NEV's energy system as a multi-agent environment with engine and battery agents. Real-time driving data undergoes data cleaning and min-max normalization, and PCA for feature extraction. AnSSB-MADQN is employed to optimize power distribution. Trained on WLTC and HWFET cycles, this strategy improves fuel efficiency, stabilizes SOC, and reduces battery degradation, enabling adaptive, real-time energy management under dynamic driving conditions. Figure 2 presents the proposed methodology’s overview. Figure 3: Data exploration outcomes 3.2 Data preprocessing using data cleaning To clean the NEV energy management dataset, missing values should be handled through mean or median Figure 2: Proposed methodology overflow imputation techniques while maintaining sparse data rows. Convert data types to ensure consistency across numerical and categorical fields. The data types should be converted 3.1 Data collection to achieve numerical and categorical field consistency. The NEV energy management dataset was collected from Reduction of redundant data will occur by eliminating the Kaggle source. It is meant to assist in finding the most duplicate records. The system needs to identify and handle effective ways to save energy in NEVs, using the approach unusual cases found in energy consumption alongside of MARL. It battery degradation trends. A final test must verify the data includes data about real-world traffic, energy distribution, balance between driving cycles and efficiency classes. mileage, and battery health for multiple driving routines. 70% of the dataset was used for training and 30% for 3.2.1 Min-Max normalization testing to evaluate performance under diverse scenarios. The process of min-max normalization transforms new Source:https://www.kaggle.com/datasets/ziya07/nev- energy vehicle energy management datasets into energy-management-dataset/data standardized ranges, which improves both model performance and speed of convergence, and accuracy 3.1.1 Data Description during energy efficiency optimization. Using linear The NEV Energy Management Dataset features 5,000 modifications of the original data, min-max normalization records with 13 attributes for measuring vehicle speed creates a balanced set of value comparisons between the along with acceleration, power demand, fuel usage, and data before and after the execution, as follows in Equation battery performance across different driving conditions. (1). The system combines essential variables such as engine 𝑊−min (𝑊) power, battery power and SOC, battery degradation, and 𝑊𝑛𝑒𝑤= …. (1) max(𝑊)−min (𝑊) regenerative braking power to assess energy efficiency and sustainability levels. 𝑊𝑛𝑒𝑤- The adjusted value derived from the normalized 3.1.2 Data Exploration outcomes The pair plot demonstrates the relationship dynamics 𝑊- Old Value between speed, power demand, battery power, SOC, and max(𝑊)-The dataset's maximum value fuel consumption variables for designing a dynamic min (𝑊)- The dataset's minimum value energy management strategy in NEVs. The diagonal presentation displays distribution patterns to identify normal or skewed data shapes. The correlations and strong positive associations between power demand and battery Optimization of Dynamic Energy Management Strategy for New... Informatica 49 (2025) 269-280 273 3.3 Feature extraction using PCA applying min-max normalization, PCA reduced the feature space to 6 principal components, maintaining more than The dynamic energy management technique becomes 95% of the total variance while minimizing duplication, more efficient by eliminating unnecessary variables and boosting the energy management model's learning focusing exclusively on critical factors. This results in efficiency. Figure 4 shows the PCA-based feature faster convergence and more accurate decision-making via contribution to the first principal component, which the MARL framework for energy distribution. PCA was explains the most variation. This information assists in used to minimize the dimensionality of the dataset while determining the most significant elements for EMS retaining the majority of its informational richness. In optimization. Notably, this representation is based on the addition, 5 derived characteristics were designed to PCA loading matrix before dimensionality reduction. capture complicated energy dynamics such as power Figure 4 shows PCA-Based Feature Importance Output for fluctuation, energy trends, and driving cycle behavior, Energy Management Optimization. which are crucial for intelligent EMS control. By eliminating the class label, each observation in a data set of 𝑙 observations is mathematically 𝑚-dimensional. Assuming that 𝑤1, 𝑤2, … . . , 𝑤𝑙 ∈ ℜ𝑚 . The subsequent procedures for calculating PCA. Determine the mean vector µ in 𝑚-dimensions by Equation (2). 1 𝜇 = ∑𝑙 𝑗 1 𝑤 𝑙 = 𝑗 (2) Determine the observed data's estimated matrix of covariance 𝑇 by Equation (3). 1 𝑠 𝑇 = ∑𝑙 (𝑤 ) 𝑤 − ) 𝑙 𝑗=1 𝑗 − 𝜇 ( 𝑗 𝜇 (3) Figure 4: PCA-based feature importance output for Determine the associated eigenvectors and eigenvalues of energy management optimization 𝑇, whereby 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑙 ≥ 0. Determine the 𝑙 primary components from the 𝑙 original variables by • Data Cleaning (13 features): Outliers, Equation (4). impossible values (e.g., negative fuel), and missing values were handled through imputation 𝑧1 = 𝑏11𝑤1 + 𝑏12𝑤2 + ⋯ + 𝑏1𝑙𝑤𝑙 and filtering. 𝑧2 = 𝑏21𝑤1 + 𝑏22𝑤2 + ⋯ + 𝑏2𝑙𝑤𝑙 ⋯ (4) • Normalization (13 features): Each feature was 𝑧𝑙 = 𝑏𝑙1𝑤1 + 𝑏𝑙2𝑤2 + ⋯ + 𝑏𝑙𝑙𝑤𝑙 scaled to a standard range (mean = 0, std = 1) for consistent learning performance. It is orthogonal that 𝑧𝑙 are uncorrelated. As much of the • PCA Application: Principal component analysis initial variation in the data set can be explained by 𝑧1, as reduced the final 18-dimensional space to 6 much of the residual variance can be explained by 𝑧2, etc. principal components, capturing >95% variance, In the most useful data sets, a small number of bigger enhancing model training speed and eigenvalues often outnumber the others, as follows in generalization. Equation (4). Where the proportion maintained in the data format is denoted by 𝑧𝑙. While the original dataset contained 13 attributes, 5 additional derived features were introduced through 𝜆 𝛾 1+𝜆2+⋯+𝜆 𝑙 = 𝑛 ≥ 80% (5) feature engineering to enhance the model's ability to 𝜆1+𝜆2+⋯+𝜆𝑛+⋯+𝜆𝑙 capture dynamic driving patterns and battery behavior. For instance, ΔSOC (change in State of Charge) reflects short- Principal Component Analysis (PCA) was applied to term battery discharge rates, offering temporal insights that reduce the dimensionality of the input space. Although the static SOC cannot. Similarly, features like speed trend and original dataset consisted of 13 attributes, only 12 numeric regenerative efficiency were designed to capture vehicle features were used for PCA, excluding the non-numeric acceleration patterns and energy recovery rates, target column. PCA transformed this 12-dimensional respectively. These engineered features provide higher- feature space into 6 uncorrelated principal components, level abstractions that improve the learning model’s capturing over 95% of the total variance and improving contextual awareness. PCA was then applied to this 18- model training efficiency by eliminating redundancy. After dimensional space to reduce redundancy, improve 274 Informatica 49 (2025) 269-280 X. Zhang et al. generalization, and retain the most informative patterns by selecting 6 principal components that preserved over 95% of the variance. 3.4 SSB-MADQN The SSB-MADQN is a novel framework for dynamic energy management in NEVs. It integrates the SBO to enhance agent policy optimization and exploration within a MADQN environment. By enabling decentralized cooperation among energy management agents, SSB- MADQN effectively balances power delivery among both the engine and battery, optimizes fuel consumption, and mitigates battery degradation under diverse driving cycles. The scalable design ensures adaptability across vehicle platforms, while the optimizer enhances learning efficiency, making SSB-MADQN a robust solution for real-time, intelligent NEV energy management. 3.4.1 MADQN Figure 5: MADQN architecture The MADQN enables dynamic energy management in NEVs by allowing multiple agents (engine, battery, motor) In multi-agent reinforcement learning, the replay buffer to learn cooperative strategies. Through DRL, each agent holds all agents' experiences, which frequently include optimizes energy distribution, improving efficiency, shared observations, actions, and rewards to capture inter- reducing fuel consumption, and adapting to varying agent relationships. Each agent's training is stabilized by driving conditions in real time. It uses a model-free the target network, which provides constant Q-value reinforcement learning strategy, which eliminates the need targets and is updated on a regular or soft basis. Q-value to explicitly understand the environment's dynamics. updates are changed by taking into account not just an Agent 1 observes state 𝑡𝑠 and chooses the optimal action agent's action and reward, but also the effect of other at time 𝑠 to move to state 𝑡𝑠+1 in traditional Q-learning, agents' activities, employing centralized training and based on a value model-free approach. The agent then decentralized execution. This allows agents to develop changes the Q-value after receiving an instant benefit coordinated methods while functioning independently 𝑟(𝑡𝑠, 𝑏, 𝑡𝑠+1)at time 𝑠 + 1, as shown in Equation (6). during deployment. A replay buffer is used to retain the agent's experiences, a 𝑄𝑠+1(𝑡𝑠, 𝑏𝑠) ← (1 − 𝛼)𝑄𝑠(𝑡𝑠, 𝑏𝑠) + 𝛼[𝑟(𝑡𝑠, 𝑏𝑠, 𝑡𝑠+1) + target network (𝜃𝑡𝑔) replicates the main network to offer 𝛾 max 𝑄𝑠(𝑡𝑠+1, 𝑏)] (6) 𝑏 a steady target for learning, and a main network In reinforcement learning, 𝛾 is a discount factor, parameterized by (𝜃𝑛) is used to estimate Q-values in the 𝛾 𝑚𝑎𝑥′ 𝑏𝑅𝑠(𝑡′, 𝑏′) is the discounted reward, and 𝛼 ∈ [0,1] multi-agent environment. First, agent 1 observes the is the learning rate. The Q-values for every potential state energy demand signal and its states at the time 𝑠 and action for agent 1 are stored in a two-dimensional communicates with neighboring agents (states (𝑡𝑠) and look-up column with dimensions 𝒯 × ℬ. Consequently, policies), and selects an action (𝑏𝑠). For example, suppose the number of actions and states in a complex system that Agent 1 is unable to fulfill the energy storage request. causes the Q-table's size to grow exponentially. Figure 5 Suppose that three collaborative NEV modules (engine, presents the MADQN architecture. Every edge server is battery, motor) ({𝑖, 𝑟} ∈ 휀𝑛𝑏) with a strategy for new regarded as an agent in EV. Figure 5 depicts the MADQN energy 𝑞𝐹,𝑗𝑖and 𝑞𝐹,𝑖𝑞 , where 𝑞𝐹,𝑗𝑖 < 𝑞𝐹,𝑖𝑞, have the framework utilized in the caching environment, with matching content. This situation results in the selection of architectural details. The neural networks (Main and the neighboring agent with energy cost, as shown in Target) are implemented as multilayer perceptrons, with an Equation (7). input layer matching the state dimension (e.g., 50 features), two hidden layers of 128 and 64 neurons, arg max𝑏∈ℬ𝑄(𝑡𝑠, 𝑏) 𝑜 = 1 − 𝜖1 − 𝜖2 respectively, employing ReLU activation, and an output 𝑏 layer representing the number of potential actions (e.g., 𝑠 = { 𝑟𝑎𝑛𝑑𝑜𝑚 𝑏 ∈ ℬ 𝑜 = 𝜖 (7) 1 two for binary caching decisions). These features are 𝑂𝑡ℎ𝑒𝑟 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡 𝑝𝑜𝑙𝑖𝑐𝑦 𝑏 ∈ ℬ 𝑜 = 𝜖2 critical to understanding the model's structure and ensuring repeatability. Furthermore, it has 𝜖1 and 𝜖2 set to decrease with time. Consequently, the model will eventually choose the best Optimization of Dynamic Energy Management Strategy for New... Informatica 49 (2025) 269-280 275 course of action. It is suggested to explore if the agent does ➢ Logistic Chaos's initialization: not function well. A collection of recent rewards (𝑅𝐺) is Although the algorithm's initial population utilizes a tracked, and ∈𝑦 (𝑤ℎ𝑒𝑟𝑒 ∈𝑦 {1, 2})is updated, as shown random initialization mode according to natural law, a in Equation (8). The step sizes for modifying the better initialization approach would greatly accelerate the probability ∈𝑦 are 𝛿+ and 𝛿−, and 𝑟𝑡ℎ is a reward intelligent optimization algorithm's convergence speed. threshold. The population is also initialized by the SB using random values. A logistic chaos map was created to improve the ∈ starting population's diversity, which in turn led to a better- 𝑦+ 𝛿+, 𝔼(𝑄𝐺) < 𝑟𝑡ℎ ∈𝑦= { (8) ∈𝑦− 𝛿−, 𝔼(𝑄𝐺) ≥ 𝑟 starting population, which improved the algorithm's 𝑡ℎ accuracy and speed of convergence. Equation (10) illustrates the logistic chaos map calculating method. The agent moves on to the next state (𝑡𝑠 + 1) for the selected action (𝑏𝑠), preserves moving in the replay buffer 𝑊𝑗+1 = 𝜇𝑊𝑗 ∗ (1 − 𝑊𝑗) (10) of size, and receives an instant benefit (𝑟𝑠 + 1). During the training stage, agent 1 uses mini-batch descent to train the primary network after selecting a mini-batch of size 𝐴 The control parameters 𝜇 have a value range of 0 𝑡𝑜 4. There will be more confusion when the number of 𝜇 is from the replay buffer. In every 𝐼 step, the target network replicates the primary network to provide learning higher. The chaotic initialization effect will be amplified𝜇. stability, as follows in Equation (9). Equation (11) is used as the population initialization. 𝑄 𝑝𝑜𝑝(𝑗). 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝑌(𝑗, : ).∗ (𝑉𝑎𝑟𝑀𝑎𝑥 − 𝑉𝑎𝑟𝑀𝑖𝑛) + 𝑠+1 = (𝑡𝑠, 𝑏𝑡) ← (1 − 𝛼)𝑄𝑠(𝑡𝑠, 𝑏𝑡; 𝜃𝑛) + 𝛼[𝑟(𝑡𝑠, 𝑏𝑡 , 𝑇𝑠+1) + 𝛾 𝑚𝑎𝑥 𝑄 𝑏 𝑠(𝑡𝑠+1, 𝑏; 𝜃𝑠ℎ) 𝑉𝑎𝑟𝑀𝑖𝑛 (11) − ➢ The cauchy variation method: 𝑄𝑠(𝑡𝑠, 𝑏𝑡; 𝜃𝑛)] + ∑𝑖𝜖𝑀 𝑤𝑗𝑖𝑄𝑠−1(𝑡𝑠, 𝑏𝑡; 𝜃𝑛) (9) 𝑓 Instead of using the conventional SB mutation technique, which produces a shorter peak dispersed at the origin and Where 𝜔𝑗𝑖 is modeled as inversely proportional to the a longer spread in the remainder, the Cauchy mutation EMS(𝑟𝐹, 𝑦𝑥) among 𝑖and 𝑗, and is used to highlight the strategy guarantees more disruption near the current effect of neighbor 𝐼 on agent 1. population. Equation (12) shows the Cauchy variation approach. 3.4.2 SSB The traditional Satin Bowerbird (SB) optimizer struggles 𝑊𝑠+1 𝑗,𝑖 = 𝑊𝑏𝑒𝑠𝑡 + 𝐶𝑎𝑢𝑐ℎ𝑦(0,1)⨁ 𝑊𝑏𝑒𝑠𝑡(𝑠) (12) to effectively manage the complex, dynamic, and multi- objective nature of energy management strategies in new Where 𝑊𝑏𝑒𝑠𝑡(𝑠) is the location of an individual that energy vehicles (NEVs). It lacks the scalability and the requires variation, and 𝐶𝑎𝑢𝑐ℎ𝑦 (0,1) is the typical ability to deal with several competing priorities, including Cauchy distribution. Equation (13) computes the relevant fuel consumption, battery capacity, and reducing battery variation probability. degradation. The basic SB algorithm lacks mechanisms for efficiently navigating high-dimensional search spaces or 𝑖𝑡 20 adapting to rapidly changing driving conditions. It also 𝑂𝑡 = − exp (1 − ) + 𝑜 (13) 𝑀𝑎𝑥𝐼𝑡 falls short in maintaining solution diversity and handling trade-offs among multiple objectives, often leading to Both the current is represented by 𝑀𝑎𝑥𝐼𝑡, where 𝑜 is set at premature convergence or local optima. Furthermore, its 0.05. The procedure of the Cauchy mutation will not be limited ability to handle real-time updates and high- carried out if 𝑞 and < 𝑃𝑠. Table 2 shows the dimensional decision spaces reduces its effectiveness in hyperparameters of SSB. dynamic driving conditions, prompting the need for SSB’s chaotic initialization improves exploration by improved approaches like the Scalable SB (SSB) ensuring diverse initial solutions, avoiding local optima, optimizer. SSB efficiently balances energy distribution and speeding up convergence. The Cauchy variation, with between battery and engine systems, adjusts to various its heavy-tailed distribution, enables larger step sizes, driving schedules, speeds up how policies are learned and improving the algorithm's capacity to escape local minima helps achieve better fuel efficiency, fewer emissions, and and strike a better balance between exploration and longer life of the vehicle battery in complex driving exploitation. These traits exceed typical heuristics, situations. allowing for faster and more efficient optimization. 276 Informatica 49 (2025) 269-280 X. Zhang et al. Table 2: Hyperparameters of SSB No. Hyperparameter Symbol / Name Typical Value Description / Range 1 Population size P 5 – 50 Number of candidate solutions (bowerbirds) 2 Maximum iterations MaxIter 10 – 100 Maximum SBO optimization cycles 3 Attraction coefficient α 0.05 – 0.3 Strength of movement toward better solutions 4 Random scaling factor rand () [0, 1] Random noise for solution diversification 5 Learning rate search LR_range [0.0001, 0.01] Search space for learning rate range 6 Epsilon search range ε_range [0.1, 1.0] Exploration rate range 7 Discount factor search γ_range [0.8, 0.99] Reward discount factor range range 8 Fitness function F(x) Avg episodic Evaluate solution quality reward 9 Movement formula x_new = x + α * rand () — Bowerbird movement update * (x_best - x) 10 Dimensionality of D 3 Parameters optimized (LR, ε, solution γ) 4 Results and discussion 4.1 Confusion matrix The result comparison parameters, such as EMS The results of the confusion matrix are shown in Figure 6. optimization results for different strategies under WLTC, The model accurately predicted all classes: 152 samples as EMS optimization results for different strategies under class 0, 777 as class 1, and 71 as class 2, with zero HWFET, and control action, are used to demonstrate the misclassifications. This indicates that the energy comparison of the proposed model, SSB-MADQN, for management model is highly effective in correctly energy management strategy for new energy with the categorizing vehicle energy efficiency levels or strategies existing techniques, such as MADDPG [9] and Deep Q- with no false positives or negatives across all classes. The learning Adaptive Moment Estimation (DQL-AMSGrad) predicted classes represent EMS efficiency levels: 0 [16]. The experimental setup is presented in Table 3. (High), 1 (Medium), and 2 (low). Table 3: Experimental setup Projects Environment Operating System Windows 10(x64) CPU i5-9500HF CPU@2.40GHz Memory Size 32GB GPU NVIDIA GeForce GTX 2080 Ti CUDA Version 10.2 Python Version 3.8 Episode count 1000 Batch size 64 Convergence Training stops when criteria reward, loss, episodes, or epsilon criteria are met. Figure 6: Confusion matrix outcomes Optimization of Dynamic Energy Management Strategy for New... Informatica 49 (2025) 269-280 277 4.2 Battery degradation distribution The distribution of battery degradation in NEV highlights a concentration of around 10%. It suggests significant wear under certain conditions, necessitating a dynamic energy management strategy. By integrating real-time degradation data, NEVs can optimize engine-battery energy distribution, extend battery life, and improve energy efficiency, especially under high-degradation scenarios. It supports adaptive, data-driven decision-making for sustainable vehicle performance. Figure 7 presents the distribution of battery degradation outcomes. Figure 8: Graphical representation of WLTC 4.4 HWFET According to the HWFET driving cycle, SSB-MADQN performs better than MADDPG when optimizing the EMS Figure 7: Distribution of battery degradation outcomes system. It achieves a higher terminal SOC (0.603 vs. 0.556), reduced equivalent fuel consumption (0.681 L vs. 4.3 WLTC 0.734 L), and better fuel efficiency (4.121 L/100km vs. 4.446 L/100km), indicating improved energy recovery and The EMS optimization results under the WLTC driving reduced fuel usage in dynamic energy management for cycle show that the proposed SSB-MADQN method NEVs. Figure 9 presents the EMS optimization under outperforms the existing method, MADDPG. SSB- HWFET. MADQN achieves a higher terminal SOC (0.643 vs. 0.598), lower equivalent fuel consumption (0.912 L vs. 0.977 L), and improved fuel efficiency (3.864 L/100km vs. 4.199 L/100km), demonstrating its effectiveness in dynamic energy management for NEVs by enhancing energy utilization and reducing fuel use. Figure 8 presents the EMS optimization under WLTC. Figure 9: Graphical Representation of HWFET 278 Informatica 49 (2025) 269-280 X. Zhang et al. 4.5 Control action smoother transitions and a peak of 1.7, reflecting improved A comparison of control action variations over time in responsiveness and stability. It suggests SSB-MADQN's dynamic energy management for NEVs. DQL-AMSGrad superior performance in managing energy distribution shows fluctuating control values, peaking at 1.5, indicating dynamically and efficiently in NEV systems. Table 4 and moderate adaptability. The proposed SSB-MADQN model Figure 10 show control action outcomes. consistently yields slightly higher control actions, with Table 4: Control action outcomes Model 10 20 30 40 50 60 70 80 90 100 DQL- 1.3 0.4 0.3 1.0 0.1 0.8 1.2 1.5 0.1 0.3 AMSGrad [16] SSB- 1.5 0.6 0.7 1.2 0.2 1.0 1.4 1.7 0.3 0.6 MADQN [proposed] 4.6 Performance metrics summary of SSB- MADQN for NEV energy management The primary performance metrics of the proposed multi- agent deep reinforcement learning framework applied to dynamic energy management in new energy vehicles (NEVs). Metrics include fuel consumption, battery SOC limits, battery degradation rate, and computational efficiency during both training and real-time inference. These results demonstrate the framework’s effectiveness in balancing energy usage and system longevity. Table 5 displays the SSB-MADQN performance. Figure 10: Graphical representation of control action Table 5: Key results of SSB-MADQN performance Performance metric SSB-MADQN (Proposed) Fuel Usage 3.4 L/100km SOC Bounds 20% – 80% Degradation Rate (%) 0.72% Training Time 4.1 hours Inference Time 14 ms 5 Comparative analysis with existing complex multi-agent interactions in dynamic NEV energy environments. Such technology mandates a large amount systems of training material alongside powerful computing capabilities. The A dynamic EMS for NEVs optimizes power distribution integration of DQL-AMSGrad with adaptive learning rates between the battery and engine in real-time, enhancing facilitates better convergence, but it performs poorly with energy efficiency, reducing emissions, and adapting to the continuous action spaces regularly found in NEV varying driving conditions. MADDPG faces limitations in energy systems. The decision-making processes of these scalability and convergence stability when managing Optimization of Dynamic Energy Management Strategy for New... Informatica 49 (2025) 269-280 279 methods show poor adaptation to sudden driving condition References changes, along with restricted performance across different driving cycles, which affects real-time decisions [1] Wang, Y., Wu, Y., Tang, Y., Li, Q., & He, H. (2023). in NEVs. The proposed SSB-MADQN enhances Cooperative energy management and eco-driving of scalability and convergence stability by integrating the plug-in hybrid electric vehicle via multi-agent SSB with MADQN, enabling efficient exploration and reinforcement learning. Applied Energy, 332, 120563. exploitation in complex NEV environments. The system https://doi.org/10.1016/j.apenergy.2022.120563 successfully deals with complex action spaces together [2] Yang, N., Han, L., Liu, R., Wei, Z., Liu, H., & Xiang, with dynamic driving conditions because it learns quickly C. (2023). Multiobjective intelligent energy and provides reliable real-time energy management management for hybrid electric vehicles based on functionality that outperforms MADDPG and DQL- multiagent reinforcement learning. IEEE AMSGrad by showing better adaptability and Transactions on Transportation Electrification, 9(3), generalization over several driving cycles. The proposed 4294-4305. strategy relies heavily on high-quality simulations, which https://doi.org/10.1109/TTE.2023.3236324 may not fully capture real-world complexities. [3] Gautam, A. K., Tariq, M., Pandey, J. P., Verma, K. S., Additionally, there is a lack of real-world validation, and & Urooj, S. (2022). Hybrid sources powered electric the interpretability of multi-agent reinforcement learning vehicle configuration and integrated optimal power models remains a challenge, hindering broader practical management strategy. IEEE Access, 10, 121684- adoption. 121711.https://doi.org/10.1109/ACCESS.2022.32177 71 6 Conclusion [4] Jiang, Q., & Wang, H. (2025). Risk Assessment Method for New Energy Vehicle Supply Chain Based Energy efficiency and operational performance in NEVs on Hierarchical Holographic Model and Matter have significantly improved through the application of AI- Element Extension Model. Informatica, 49(7). driven optimization strategies. The suggested SSB- https://doi.org/10.31449/inf.v49i7.6953. MADQN architecture used MARL to allow cooperative [5] Hu, H., Yuan, W. W., Su, M., & Ou, K. (2023). agents to control the engine and battery's power allocation Optimizing fuel economy and durability of hybrid fuel in real time under various driving circumstances. Data cell electric vehicles using deep reinforcement preprocessing methods, such as data cleaning and min- learning-based energy management systems. Energy max normalization, and PCA employed for feature Conversion and Management, 291, 117288. extraction, ensured consistency, reduced dimensionality, https://doi.org/10.1016/j.enconman.2023.117288 and enhanced model learning. Experimental results [6] Bakare, M. S., Abdulkarim, A., Shuaibu, A. N., & revealed notable improvements, with fuel consumption Muhamad, M. M. (2024). Energy management reduced under WLTC compared to MADDPG, achieving controllers: strategies, coordination, and a final consumption of 3.864 L/100km, and similarly under applications. Energy HWFET with a reduction to 4.121 L/100km. These Informatics, 7(1),57.https://doi.org/10.1186/s42162- outcomes confirm the effectiveness of intelligent EMS in 024-00357-9 achieving adaptive and globally optimized energy [7] Rawat, R., Borana, K., Gupta, S., Ingle, M., strategies for NEVs. The limitations of relying solely on Dibouliya, A., Bhardwaj, P., & Rawat, A. (2025). simulation-based testing and plans to incorporate real- Enhancing OSN Security: Detecting Email Hijacking world ECU-in-the-loop evaluation to enhance validation. and DNS Spoofing Using Energy Consumption and Another key challenge is the interpretability of the MARL Opcode Sequence Analysis. Informatica, 49(2). model, for which we plan to adopt explainability https://doi.org/10.31449/inf.v49i2.6956. techniques such as SHAP or LIME to analyze Q-values [8] Hua, M., Zhang, C., Zhang, F., Li, Z., Yu, X., Xu, H., and better understand agent decisions. Additionally, & Zhou, Q. (2023). Energy management of multi- potential deployment on edge computing platforms like mode plug-in hybrid electric vehicle using multi-agent NVIDIA Jetson is being considered to assess real-time deep reinforcement learning. Applied Energy, 348, feasibility. The proposed approach shows strong potential 121526.https://doi.org/10.1016/j.apenergy.2023.1215 for real-time EMS in NEVs by leveraging decentralized 26 agents and a powerful optimizer for high-dimensional [9] Li, X., Zhou, Z., Wei, C., Gao, X., & Zhang, Y. (2025). spaces. However, to strengthen its scientific contribution, Multi-objective optimization of hybrid electric future work should focus on improving algorithm vehicles energy management using multi-agent deep transparency, ensuring rigorous experimentation, and reinforcement learning framework. Energy and incorporating advanced statistical techniques for deeper AI, 20, 100491. validation and performance comparison. https://doi.org/10.1016/j.egyai.2025.100491 280 Informatica 49 (2025) 269-280 X. Zhang et al. [10] Alqahtani, M., Scott, M. J., & Hu, M. (2022). Dynamic energy scheduling and routing of a large fleet of electric vehicles using multi-agent reinforcement learning. Computers & Industrial Engineering, 169, 108180. https://doi.org/10.1016/j.cie.2022.108180 [11] Monfaredi, F., Shayeghi, H., & Siano, P. (2023). Multi-agent deep reinforcement learning-based optimal energy management for grid-connected multiple energy carrier microgrids. International Journal of Electrical Power & Energy Systems, 153, 109292. https://doi.org/10.1016/j.ijepes.2023.109292 [12] Kaewdornhan, N., Srithapon, C., Liemthong, R., & Chatthaworn, R. (2023). Real-time multi-home energy management with EV charging scheduling using multi-agent deep reinforcement learning optimization. Energies, 16(5), 2357. https://doi.org/10.3390/en16052357 [13] Louati, A., Louati, H., Kariri, E., Neifar, W., Hassan, M. K., Khairi, M. H., ... & El-Hoseny, H. M. (2024). Sustainable smart cities through multi-agent reinforcement learning-based cooperative autonomous vehicles. Sustainability, 16(5), 1779.https://doi.org/10.3390/su16051779 [14] Zhang, W., Liu, H., Wang, F., Xu, T., Xin, H., Dou, D., & Xiong, H. (2021, April). Intelligent electric vehicle charging recommendation based on multi-agent reinforcement learning. In Proceedings of the Web Conference 2021 (pp. 1856-1867). https://doi.org/10.1145/3442381.3449934 [15] Al-Saffar, M., & Gül, M. (2023). Data-efficient MADDPG based on self-attention for IoT energy management systems. IEEE Access, 11, 109379- 109389. https://doi.org/10.1109/ACCESS.2023.3322193. [16] Montaleza, C., Arévalo, P., Gallegos, J., & Jurado, F. (2024). Enhancing energy management strategies for extended-range electric vehicles through deep Q- learning and continuous state representation. Energies, 17(2), 514.https://doi.org/10.3390/en17020514 s https://doi.org/10.31449/inf.v49i12.7724 Informatica 49 (2025) 281–298 281 Hybrid Machine Learning and Optimization Algorithms for pH- Based Water Quality Classification Xiaolin Li 1, * and Baomeng Pang2 1School of Intelligent Manufacturing, Qingdao Huanghai University, Qingdao, Shandong, 266427, China 2Shandong HI-SPEED Maintenance GROUP CO., LTD, Jinan, Shandong, 250000, China E-mail: lxl123101321@163.com *Corresponding author Keywords: water quality, PH level, machine learning, support vector classifier, extra trees classifier, optimization algorithms Received: December 2, 2024 Water quality—defined through its physical, chemical, and biological parameters—is essential for critical applications such as drinking and irrigation. Among these parameters, pH plays a significant role by influencing metal solubility and nutrient availability, thereby impacting aquatic ecosystems. In this study, Support Vector Classifier (SVC) and Extra Trees Classifier (ETC) were employed to classify water quality based on pH values. To boost classification accuracy, the models were hybridized using two advanced metaheuristic algorithms: Transit Search Optimization Algorithm (TSOA) and Chaos Game Optimization (CGO), resulting in hybrid variants ETTS, ETCG, SVTS, and SVCG. Comprehensive experiments were conducted using standard evaluation metrics. The ETTS model achieved the best performance, with training accuracy of 0.910 and testing accuracy of 0.778, along with a precision of 0.911, recall of 0.910, and F1 score of 0.910 in training. In contrast, the base ETC model recorded training and testing accuracies of 0.881 and 0.750, respectively. Similarly, SVTS and SVCG outperformed the base SVC model, with SVTS achieving training and testing accuracies of 0.894 and 0.760, compared to SVC’s 0.850 and 0.745. The proposed hybrid framework outperforms traditional SVC and ETC models and demonstrates superior classification performance compared to standard non-optimized baselines. This underscores the value of integrating advanced optimization techniques with machine learning for robust and reliable water quality assessment. The framework is a promising tool for environmental monitoring, promoting sustainable water resource management and public health protection. Povzetek: Študija je razvila hibridne modele strojnega učenja za klasifikacijo kakovosti vode na podlagi pH-vrednosti. Kombinacija klasifikatorjev Extra Trees (ETC) in Support Vector Classifier (SVC) z metahevrističnimi algoritmi TSOA in CGO (npr. ETTS, SVTS) je izboljšala klasifikacijo. Model ETTS je dosegel najboljšo zmogljivost, kar potrjuje prednost hibridnega okvira za okoljsko spremljanje. 1 Introduction enhance problems relating to water shortages or pollution. Efficiency in water management and water quality prediction plays an important role in ensuring safety and 1.1 Background sustainability in the use of water [2]. These are some of Water is as familiar a material as air, earth and concrete, the issues that emanate from a lack of adequate Water is necessary for life for humans and other forms of hydrological cycles, methods of water management, and life, much like the other three materials—well, maybe knowledge concerning the various human activities with exception of concrete. It is voluminous: about 3.5 % impacting catchments of water. To this end, technological of the land area is permanently flooded, whereas two and policy development remains highly critical to ensure thirds of the world is under the oceans. About the the sustainability of the use and delivery of water, hydrosphere, water is continuously evaporating from the protection of public health, and economic development Earth's surface into condensing in the atmosphere, [3]. reappearing as liquid. Earth's supply of water is now at an Water quality is basically related to its physical, all-time high and will never be depleted [1]. Although chemical, and biological characteristics, making it suitable abundant, the water resources distributed unevenly in for various purposes, such as drinking, gardening, and different regions in some serious respects impede certain leisure activities. During any water quality assessment, regions. As the population rises, industrialization turbidity, the microbiological content, and concentrations increases, and even more factors such as climate change of both organic and inorganic compounds are amongst the 282 Informatica 49 (2025) 281–298 X. Li et al. more commonly measured parameters [4]. The datasets. Additionally, real-time pH prediction, a critical degradation of water quality is a consequence of the parameter in assessing water quality, has not been current process of urbanization, agricultural runoff, and extensively explored using hybrid ML-optimization industrial wastes. Some contaminants such as heavy techniques, especially in scenarios where both historical metals, pesticides, and viruses may result in serious and real-time data are available. human health hazards and ecosystem health. Good water To address these gaps, this study proposes a novel grading control will require technological advancement, framework that integrates SVM, ETC, TSOA and CGO. community participation, and regulatory mechanisms. The These techniques are applied to predict and classify water implementation of best practices in pollution prevention, pH levels using historical and sensor-based real-time wastewater treatment, and watershed management will datasets. The objectives of this research are: ensure the sustainability of water resources through better • To develop and compare ML models capable of maintenance of their quality [5]. accurately predicting water pH levels using both One of the factors influencing the 𝑝𝐻 of water and historical and real-time input data; hence its chemical behavior and its biological availability • To optimize model performance using the Chaos is the concentration of hydrogen ions in it. Basically, 𝑝𝐻 Game Optimization algorithm, ensuring more is the measure of the concentration of hydrogen ions in reliable and efficient learning from complex water. It runs on a scale from 0 to 14, with 7 to 8 being datasets; considered neutral, 0 to 7 considered acidic, and 8 to 14 • To evaluate the classification capabilities of the considered basic. 𝑃𝐻 influences the solubility of metals Extra Trees Classifier and SVM in distinguishing and nutrients' availability, along with activity concerning water quality categories based on pH thresholds; aquatic organisms. • To demonstrate the feasibility of a hybrid ML- Machine learning, as a multidisciplinary subset of optimization approach for proactive and artificial intelligence, develops algorithms with which sustainable water quality monitoring. computers can evaluate, comprehend, and predict data [6– 9]. It has powerful capabilities for identification, data 2 Related works analysis, and decision making and has already revamped many disciplines. The application of machine learning Idroes et al. [15] conducted a study to predict urban air techniques is on the increase in environmental research to quality in DKI Jakarta, Indonesia, using the CATBoost enhance our understanding and management through the machine learning algorithm, which is known for handling modeling of environmental processes, analysis of large- categorical features effectively, managing missing values, scale information, and predictions of future conditions and reducing the risk of overfitting. The research utilized [10]. The most promising application would, therefore, be air quality data collected from Jakarta's monitoring in the monitoring of water quality through management stations over the period of 2010 to 2021. The dataset using machine learning. With the derivation of large data included five key pollutants: PM₁₀, SO₂, CO, O₃, and NO₂. sets from sensors and satellite images, coupled with After a preprocessing stage that involved data cleaning historical records, it will be possible for machine learning and normalization, the authors split the dataset into models to develop leading trends, anomalies, and training (80%) and testing (20%) subsets. The CATBoost predictions of water quality parameters with high accuracy model was trained and evaluated using standard [11] [12]. These capabilities enable more proactive and performance metrics, where it achieved high accuracy effective water management strategies, reducing (0.9781), precision (0.9722), and recall (0.9728). A pollution, optimizing resource allocation, and protecting feature importance analysis revealed that ozone (O₃) was public health. The integration of machine learning into the the most significant contributor to air quality variation, water quality monitoring system is one of the huge leaps followed by PM₁₀. Sasmita et al. [16] investigated the forward in environmental science and technology [13] classification of air quality levels in Indonesia using the [14]. Plume Air Quality Index (PAQI), which incorporates pollutant concentrations such as PM₂.₅, PM₁₀, NO₂, and 1.2 Research gaps and objectives O₃. The study focused on evaluating classification performance using Decision Tree and K-Nearest Neighbor Despite the increasing application of ML in water quality (k-NN) algorithms, applied to secondary data collected prediction, significant challenges persist. Traditional from 33 provincial capitals between July 1 and December approaches often struggle with the nonlinearity and 31, 2022. Unlike prior studies that typically assessed complex variability of environmental data, which limits model performance solely based on accuracy, this their predictive accuracy and generalizability across research adopted a more comprehensive evaluation diverse contexts. Furthermore, while various studies have approach by incorporating precision, recall, and F1-score employed models like MLR, ANN, and SVM, many lack alongside accuracy. The results demonstrated that the the integration of robust optimization algorithms to fine- Decision Tree classifier outperformed k-NN, achieving tune model parameters and enhance performance. performance scores of 90.67% accuracy, 90.61% Another notable gap is the underutilization of precision, 90.67% recall, and 90.63% F1-score. These ensemble tree-based methods such as the ETC, which are findings suggest that tree-based models can provide robust known for their resilience to noise and their ability to classification capabilities for air quality indexing, capture intricate relationships within high-dimensional supporting more reliable monitoring and decision-making Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 283 regarding urban environmental health. Putra et al. [17] through accurate pollutant classification. Saxena and addressed the critical issue of deteriorating air quality in Shekhawat [18] proposed a novel mathematical Indonesia’s major cities, with a focus on Jakarta, where framework to compute a Cumulative Index (CI) for air urbanization and anthropogenic activities such as quality classification based on the concentrations of four vehicular emissions, industrialization, and waste major pollutants: SO₂, NO₂, PM2.5, and PM10. This CI accumulation have significantly impacted atmospheric served as a compact, interpretable metric reflecting the conditions. Their study aimed to classify daily air quality combined impact of pollutants on air quality. Using these using machine learning algorithms—specifically the C5.0 CI values as input features, they developed a two-class algorithm and Random Forest—based on the Air Pollution Support Vector Machine (SVM) model to classify air Standard Index (ISPU). These models were applied to quality as either good or harmful. To optimize the datasets from 2017 and 2018, consisting of pollutant performance of the SVM, the authors employed the Grey parameters including CO, NO₂, SO₂, PM, O₃, and NO. Wolf Optimizer (GWO) for parameter tuning, aiming to Their classification approach emphasized the importance maximize classification accuracy. The methodology was of accurately identifying air quality categories to support tested on real datasets from three major Indian cities— policy-making. The models demonstrated high predictive Delhi, Bhopal, and Kolkata. The results indicated that the accuracy, with C5.0 and Random Forest achieving proposed classifier effectively distinguished between the 99.74%, 99.22%, and 99.97% accuracy on the 2017 two air quality categories, with high classification dataset and 98.28%, 98.85%, and 97.42% on the 2018 performance across all test locations. The study concluded dataset, respectively. The analysis identified O₃ (ozone) as that the CI-based classification framework was both the most influential factor in classifying air quality, with computationally efficient and aligned well with actual air most days falling under the "Moderate" ISPU category. quality data, making it a promising tool for public health This work highlights the potential of decision tree-based and environmental monitoring. The summary of the algorithms in supporting urban air quality management previous studies reported in Table 1. Table 1: The summary of the related works. Study Methodology Dataset Metrics’ results Key Findings Idroes et al. CATBoost machine Air quality data from Accuracy: 0.9781, Ozone (O₃) and PM₁₀ [15] learning for air quality Jakarta monitoring Precision: 0.9722, most significant prediction. stations (2010-2021). Recall: 0.9728 pollutants. Pollutants: PM₁₀, SO₂, CO, O₃, NO₂. Sasmita et Classification using Secondary data from Accuracy: 90.67%, Decision Tree al. [16] Decision Tree and k-NN 33 provincial capitals Precision: 90.61%, outperformed k-NN algorithms. in Indonesia (2022). Recall: 90.67%, F1: for classification Pollutants: PM₂.₅, 90.63% tasks. PM₁₀, NO₂, O₃. Putra et al. Classification using C5.0 Air quality data (2017- C5.0: 99.74% Ozone (O₃) as the [17] and Random Forest 2018). Pollutants: CO, (2017), 98.28% most influential algorithms. NO₂, SO₂, PM, O₃, NO. (2018), RF: 99.22% factor in classifying (2017), 98.85% air quality. (2018) Saxena and Support Vector Machine Real datasets from Classification CI-based Shekhawat (SVM) classification with three Indian cities performance: High classification [18] Grey Wolf Optimizer (Delhi, Bhopal, accuracy for all test framework is (GWO) for parameter Kolkata). Pollutants: locations computationally tuning. SO₂, NO₂, PM₂.₅, PM₁₀. efficient. level of the water, whether it be basic, alkaline, or acidic. 3 Materials and methodology Data recording over some period gathered daily data on water quality. In this case, the 'Date' variable provides for the exact day (a day in every two weeks) certain data was 3.1 Data gathering taken and offers a time-series track showing Water quality data were collected in a systematic manner environmental change over time. Salinity, representing the and analyzed for different environmental parameters and concentration of dissolved salts in water, can directly their relations to 𝑝𝐻 values. The dataset used in the present influence pH levels by altering the ionic balance and study derived from [19] incorporates 1320 records in buffering capacity of the water body. Variations in salinity total, and each of the following input parameters has been may therefore contribute to shifts in pH, particularly in included in the dataset: Date, Salinity, Dissolved Oxygen, estuarine and coastal environments. Dissolved oxygen Secchi Depth, Water Depth, Water Temperature, and Air (DO), essential for aquatic life, can also impact pH Temperature. The output variable analyzed here is the 𝑝𝐻 through biological processes such as respiration and 284 Informatica 49 (2025) 281–298 X. Li et al. photosynthesis, which either consume or release CO₂, dots below 0.4 meters for water depth highlights that most thereby influencing acidity. Secchi Depth, a measure of water samples were taken from shallow depths, with water transparency determined by noting the depth at deeper samples being rare. The output 𝑝𝐻 plot illustrates which a Secchi disk disappears, can serve as an indirect the red dots form distinct horizontal bands, suggesting that indicator of photosynthetic activity, which affects CO₂ 𝑝𝐻 measurements are discrete rather than continuous. levels and thus the pH. Water Depth at the sampling This discrete distribution is crucial for classifying water location affects both light availability and thermal quality based on 𝑝𝐻 levels. stratification, which can influence biological activity and To support the development and execution of the chemical reactions that regulate pH. Water Temperature proposed models, a high-performance desktop and Air Temperature offer insight into thermal conditions workstation was utilized. This system is equipped with an that affect metabolic rates of organisms and chemical Intel® Core™ i7-3770K processor clocked at 3.50 GHz equilibria, both of which can influence pH values. The and complemented by 16 GB of RAM, ensuring efficient primary focus of this study was on pH levels, a key processing and multitasking capabilities. The operating parameter in assessing water quality. In the dataset, pH system used was Windows 11 Pro (64-bit), running on an values were categorized and analyzed as follows: Acidic x64-based architecture. Visual computations and (pH < 7) with 433 instances, Neutral (pH = 7) with 617 graphical rendering were handled by an NVIDIA GeForce instances, and Basic (pH > 7) with 280 instances. Each of GT 640 graphics card, which contributed to a responsive the variables was examined in relation to these pH and stable graphical environment. A 1 TB internal hard categories to explore their predictive relevance. disk served as the primary storage medium, providing Figure 1. consists of several parallel plots, the 𝑥 − ample space for managing datasets and associated files. 𝑎𝑥𝑖𝑠 in each plot represents the total number of samples, All programming tasks were conducted using Python. providing a consistent framework for comparing the The scikit-learn library formed the foundation for building distribution of each parameter. 𝑇ℎ𝑒 𝑦 − 𝑎𝑥𝑖𝑠, 𝑣𝑎𝑟𝑖𝑒𝑠 and assessing machine learning algorithms. Data according to the parameter being measured, showing the preparation and numerical analysis were facilitated by specific quantity for each sample. The red dots effectively Pandas and NumPy, respectively. To aid in visual illustrate the range and concentration of values for each interpretation of results, Matplotlib was employed, parameter, offering an unambiguous graphic depiction of enabling clear and informative graphical outputs the data's distribution. For instance, the clustering of red throughout the analysis process. Figure 1: The parallel plot of the inputs and outputs variables Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 285 𝑁 3.2 Support vector classification ∑𝑎𝑖𝑦𝑖 = 0 (6) 𝑖=1 Support Vector Classification (SVC) is a supervised 0 ≤ 𝑎𝑖 ≤ 𝐶𝑠𝑣𝑐 𝑖 = 1, . . . , 𝑁 (7) learning algorithm rooted in the structural risk A kernel function, denoted as 𝐾(𝑥𝑖 , 𝑥𝑗), computes the minimization principle of Support Vector Machines inner product between pairs of input samples implicitly (SVM) [20]. It operates by mapping input features into a mapped into a high-dimensional feature space, enabling higher-dimensional space through non-linear kernel nonlinear classification without explicitly performing the transformations, enabling the separation of data that is not transformation. Common kernel types include linear, linearly separable in the original feature space. In this polynomial, radial basis function (RBF), and sigmoidal transformed space, SVC constructs an optimal hyperplane kernels, among others. For a kernel to be valid, it must that maximizes the margin — defined as the distance satisfy Mercer's conditions—specifically, it must be between the hyperplane and the closest data points from symmetric and positive semi-definite. Extensive studies each class, known as support vectors — while have shown that the RBF kernel, formally defined in Eq. simultaneously minimizing classification errors [21]. This (8), is particularly effective for classification problems balance between margin maximization and error due to its localized response and flexibility. Accordingly, minimization contributes to the model’s generalization the RBF kernel is adopted in our methodology, where the capability and robustness. hyperparameter 𝛾 governs the inverse of the squared ‖𝑊‖2 𝑁 𝑚𝑖𝑛𝑤,𝑏,∈ + 𝐶 2 𝑠𝑣𝑐 ∑ ∈𝑖 (1) radius of influence of the support vectors, effectively controlling the decision boundary's smoothness and 𝑖=1 𝑦𝑖(𝑤 𝑇 . ∅(𝑥𝑖) + 𝑏) ≥ 1 −∈𝑖 𝑖 = 1, . . . , 𝑁 (2) sensitivity to individual data points. ∈𝑖≥ 0 𝑖 = 1, . . . , 𝑁 (3) 𝐾(𝑥𝑖 , 𝑥𝑗) = ∅(𝑥𝑖) 𝑅∅(𝑥𝑗) (8) The function ∅(𝑥𝑖) represents a nonlinear mapping = 𝑒𝑥𝑝(−𝛾‖𝑥𝑗 − 𝑥𝑖‖) that projects each input observation 𝑥𝑖, defined by its Once the optimization process is completed and the explanatory variables, into a higher-dimensional feature optimal weight vector and bias term are obtained, the space where linear separation of classes becomes more trained model can be used to generate predictions for feasible. Within this space, 𝑤 denotes the weight vector unseen samples by evaluating the decision function as that defines the orientation of the separating hyperplane, defined in Eq. (9). while 𝑏 is the bias term that shifts the hyperplane to −1 𝑖𝑓 𝑤𝑇∅(𝑥 𝑆𝑉𝐶 𝑦𝑖 = { 𝑖) + 𝑏 ≤ 0 (9) achieve optimal separation. The parameter 𝐶𝑠𝑣𝑐 serves as 1 𝑖𝑓 𝑤𝑇∅(𝑥𝑖) + 𝑏 > 0 a regularization factor that balances the trade-off between maximizing the margin and minimizing classification 3.3 Extra trees classifier errors. The slack variables ∈𝑖 quantify the degree to which The Extra trees classifier, proposed by Geurts et al. [22], individual observations violate the margin constraints, represents an advanced ensemble learning technique that allowing for soft-margin classification to accommodate builds upon and extends the Random Forest framework. misclassified or non-linearly separable data points. Unlike traditional ensemble methods that rely on Determining the optimal hyperplane, as formulated in bootstrapped datasets and deterministic split criteria, Extra Eq. (4), entails maximizing the margin between classes in Trees introduces two levels of randomness to enhance the high-dimensional feature space. This objective is model diversity and generalization. First, it selects split mathematically achieved by minimizing the Euclidean thresholds at random rather than searching for the most norm of the weight vector, which directly corresponds to optimal ones. Second, instead of using bootstrap maximizing the margin width. Simultaneously, the model sampling, it grows each decision or regression tree using incorporates a penalty for misclassified instances to ensure the entire training dataset. This approach not only a balance between model complexity and classification accelerates the training process but also reduces variance, accuracy. Ultimately, the predicted output labels indicate making Extra Trees particularly effective for high- the class membership of each sample, based on their dimensional and noisy datasets. position relative to the decision boundary. 𝐷(𝑥𝑖) = 𝑊𝑇𝜑(𝑥𝑖) Extra Trees operates by introducing controlled + 𝑏 (4) randomness into the decision tree construction process, The computational complexity of the primal particularly for numerical features. At each node, the formulation is primarily dependent on the number of input algorithm selects K random features and determines split features (dimensionality), whereas the dual formulation's thresholds uniformly at random, rather than through complexity scales with the number of training samples. traditional optimization. The minimum number of samples Therefore, in scenarios involving high-dimensional required to allow further splitting is defined by 𝑛 feature spaces, it is often more computationally efficient 𝑚𝑖𝑛, ensuring regularization. Unlike methods that rely on and advantageous to employ the dual form of the model, bootstrap resampling, Extra Trees trains each of its 𝑀 trees as outlined in Eqs. (5–7). 𝑁 𝑁 on the entire original dataset, promoting stability and 1 𝑚𝑎𝑥𝑎 ∑ 𝑎 − ∑ 𝑎 𝑎 𝑦 𝑦 𝐾 𝑥 , minimizing bias. For prediction, the ensemble outputs are 𝑖 2 𝑖 𝑗 𝑖 𝑗 ( 𝑖 𝑥𝑗) (5) combined using majority voting in classification tasks or 𝑖=1 𝑖=1 averaged in regression settings. This explicit 286 Informatica 49 (2025) 281–298 X. Li et al. randomization strategy—both in attribute selection and the search space, with randomness and minimal cut-point determination—significantly reduces variance movement controlled using factorial-based adjustments. and enhances generalization performance, especially in 𝑆𝑒𝑒𝑑1 𝑖 = 𝑥𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝐺𝐵 − 𝑀𝐺𝑖), 𝑖 (12) high-dimensional and noisy contexts. Although the = 1, 2, … , 𝑛. algorithm exhibits a time complexity of 𝑁 log𝑁, its Assuming that 𝑋𝑖 represents theith potential solution computational efficiency is bolstered by the lightweight and the randomly generated factorial used to describe nature of the node-splitting process. The key the limitations of seeds on movement is called 𝛼𝑖. To hyperparameters—𝐾, 𝑛𝑚𝑖𝑛, and 𝑀—govern the diversity simulate the potential to roll a pair of dice 𝛽𝑖 and 𝛾𝑖 stand of splits, regularization, and ensemble size, respectively. for a random number of 0 𝑜𝑟 1. While the algorithm supports fine-tuning, default 𝑆𝑒𝑒𝑑2 𝑖 = 𝐺𝐵 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝑀𝐺𝑖) (13) parameter configurations often yield strong performance, 𝑆𝑒𝑒𝑑3 𝑖 = 𝑀𝐺𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝐺𝐵), making Extra Trees both effective and computationally (14) 𝑖 = 1, 2, … , 𝑛. autonomous. A fourth seed is produced by using an additional technique to carry out the mutation phase in the search 3.4 Chaos game optimization space's position updates of the qualified seeds. This update The amalgamation of basic principles of chaotic games of the seed's position is based on arbitrary modifications and fractals provide a mathematical model for the to the choice variables chosen at random. algorithm 𝐶𝐺𝑂 [23]. The 𝐶𝐺𝑂 algorithm examines several 𝑆𝑒𝑒𝑑4 = 𝑋 𝑘 𝑖 𝑖(𝑥𝑖 = 𝑥𝑘 𝑖 + 𝑅), 𝑘 = [1, 2, … , 𝑑]. (15) potential solutions (𝑋) for this goal, that depicts a few A random integer in the interval [1, 𝑑] is denoted by suitable seeds within a sierpinski triangle, so that a group 𝑘, and 𝑅 is a uniformly distributed random number in the of answers that have developed by chance and selection region [0,1]. changes is maintained by many natural evolution The 𝐶𝐺𝑂 algorithm's exploration and exploitation rate algorithms. According to this technique, a few chosen can be controlled and modified by varying the movement variables (𝑥𝑖,𝑗) reflect where these eligible seeds are limits of the seeds, represented by four different located inside the triangle formed by sierpinski. with every formulations for 𝛼𝑖. potential solution (𝑋𝑖). 𝑅𝑎𝑛𝑑 𝑑 2 × 𝑅𝑎𝑛𝑑 𝑋1 𝑥 1 2 𝑗 1 𝑥1 ⋯ 𝑥1 ⋯ 𝑥1 𝛼𝑖 = { (𝛿 × 𝑅𝑎𝑛𝑑) + 1 (16) 𝑋 𝑥1 2 2 𝑥 𝑗 𝑑 2 2 ⋯ 𝑥2 ⋯ 𝑥 2 (휀 × 𝑅𝑎𝑛𝑑) + (∼ 휀) ⋮ = ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ 𝑋 = 𝑋 (10) In this case, 𝛿 and 휀 are random integers Rand is a 𝑖 𝑥1 2 𝑑 𝑖 𝑥 𝑗 ⋮ 𝑖 ⋯ 𝑥𝑖 ⋯ 𝑥 random number with a uniform distribution in the interval 𝑖 𝑋𝑛] ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ [0,1]. [ [𝑥1 𝑛 𝑥2 𝑗 𝑛 ⋯ 𝑥𝑛 ⋯ 𝑥𝑑 The process involves evaluating new seeds against 𝑛 ] According to the sierpinski triangle, where 𝑛 existing ones to determine their eligibility for inclusion is the within the area used for searching. The new solution number of eligible seeds and 𝑑 is the seed's dimension. candidates' quality is evaluated, with better candidates Based on random starting positions, these qualifying seeds retained and seeds with low fitness values removed. The are arranged in the search space. 𝑗 𝑥 ( 𝑗 𝑗 replacement procedure is employed to simplify the 𝑗 0) = 𝑥𝑖,𝑚𝑖𝑛 + 𝑟𝑎𝑛𝑑. (𝑥𝑖,𝑚𝑎𝑥 mathematical model and ensure a more efficient 𝑗 𝑖 = 1, 2, … , 𝑛. (11) − 𝑥 mathematical method. 𝑖,𝑚𝑖𝑛), { 𝑗 = 1, 2, … , 𝑑. 𝑗 In this approach, 𝑥𝑗 (0) represents the initial position 3.5 Transit search algorithm 𝑗 𝑗 of qualified seeds. The values 𝑥𝑖,𝑚𝑖𝑛 and 𝑥𝑖,𝑚𝑎𝑥 define the Host star number (𝑛𝑠) and the definition of signal-to-noise lower and upper bounds for the jth decision variable of the ratio (𝑆𝑁) is algorithm structure. The transit model ith candidate. A random number between 0 and 1 guides determines 𝑆𝑁 Standard deviation of measurements made the movement direction. outside of transit is used to estimate noise. There is always Qualified seeds symbolize core concepts from chaos noise in photons received from stars. The starting theory. These seeds represent candidate solutions in an population for 𝑇𝑆 is equal to the product of 𝑛𝑠 and 𝑆𝑁 optimization problem, where higher and lower fitness [24]. values indicate better and worse suitability, respectively. • Galaxy phase To explore the search space, qualified seeds are used After identifying habitable zones, the program to construct a Sierpinski triangle—a structure made from chooses a galactic center at random from the search space. three points: the current candidate (𝑋𝑖), the group mean The optimal stellar systems are found by evaluating (𝑀𝐺𝑖), and the global best (GB). This triangle is a basis random regions 𝐿𝑅. With the capacity to support life, the for generating new seeds using a chaos game approach. regions that have been identified with the best fitness are Each triangle uses a virtual die with green and red chosen, and the algorithm starts with these regions. faces to decide movement: green directs the seed toward 𝐿𝑅,𝑙 = 𝐿𝐺𝑎𝑙𝑎𝑥𝑦 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒 the global best (GB), and red toward the group mean (17) 𝑙 = 1, … , (𝑛𝑠 × 𝑆𝑁) (𝑀𝐺𝑖). A random binary value (0 or 1) determines the face. This process allows seeds to move stochastically within Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 287 𝑐1𝐿𝐺𝑎𝑙𝑎𝑥𝑦 − 𝐿𝑟 𝑖𝑓 𝑧 = 1 (𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) 𝐷 𝐷 = { 18) 𝑐1𝐿𝐺𝑎𝑙𝑎𝑥𝑦 + 𝐿𝑟 𝑖𝑓 𝑧 = 2 (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) ( 𝑐4𝐿𝑅,𝑖 − 𝑐3𝐿𝑟 𝑖𝑓 𝑧 = 1 (𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) (21) = { 𝑁𝑜𝑖𝑠𝑒 = (𝑐2) 3𝐿𝑟 (19) 𝑐4𝐿𝑅,𝑖 − 𝑐3𝐿𝑟 𝑖𝑓 𝑧 = 2 (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) 𝐿𝐺𝑎𝑙𝑎𝑥𝑦 denotes where the center of the galaxy is 𝑁𝑜𝑖𝑠𝑒 = (𝑐 )3 5 𝐿𝑟 (22) located, and in the optimization problem, two coefficients The next stage involves utilizing Eq. (20) to (22) to are present ranging from 𝑧𝑒𝑟𝑜 to 𝑜𝑛𝑒, denoting an choose a star from each of the areas that have been chosen accidental integer 𝑐1 and an accidental vector to belong to a stellar system. 𝐿𝑠 indicates where the stars 𝑐2 representing the number of variables. To demonstrate are located. the variation in the research area's situation, one definition In addition to the coefficient 𝑐5, which is a random vector of parameter 𝐷 is the difference between the galaxy's between 0 and 1, the coefficients 𝑐3 and 𝑐4 are random center and its present condition. This region may be found values between 0 and 1. either on the back of the galaxy or in the front (positive Before beginning iterations, the suggested method portion) of its middle area. Here, parameter zone (𝑧) is a executes the galaxy phase once to choose appropriate randomly generated number that is either one or two. The situations for the primary stages (2– 5). 𝑁𝑜𝑖𝑠𝑒 parameter is used to eliminate noise from received • Transit Phase signals to improve location accuracy. To minimize To identify the transit, a re-measurement of the light computational value, the coefficient 𝑐2 with a power of 3 received from the beginning is required to identify any is used, as noise cannot noticeably deviate from desired potential decrease in the received light signals. 𝐿𝑆 and its situations. corresponding fitness 𝑓𝑆 have two meanings (𝑀1 and 𝑀2). 𝐿𝑠,𝑖 = 𝐿𝑅,𝑖 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒 𝑖 = 1, … , 𝑛𝑠 (20) The light spectrum (star class) that the telescope between the telescope and the star since the light comes receives and the star's distance from the observer may be from the star. used to determine the luminosity of the star. It is evident 𝑐8𝐿𝑇 + 𝑅𝐿𝐿𝑆,𝑖 𝐿 that a short distance results in a higher photon count. The 𝑧 = 𝑖 = 1, … , 𝑛 ) 2 𝑠 (30 star's luminosity is acquired by: 𝐿𝑆,𝑛𝑒𝑤,𝑖 𝑅 𝑅𝐿 = (31) 𝑖 𝐿𝑆,𝑖 𝑛 𝐿 𝑠 The planet's original position upon detection is 𝑖 = 𝑖 = 1, … , 𝑛𝑠 𝑅 ∈ {1, … , 𝑛 (23) (𝑑 2 𝑖 𝑠} 𝑖) demonstrated by 𝐿𝑧 and luminance ratio is determined by 𝑑𝑖 = √(𝐿𝑠 − 𝐿 2 𝑇) 𝑖 = 1, … , 𝑛𝑠 (24) 𝑅𝐿. Also, 𝑐8 has a random value between 0 and 1. Here, Star I's luminance and rank are depicted by the 𝐿𝑚,𝑗 variables 𝐿𝑖 and 𝑅𝑖. Additionally, the space between the 𝐿𝑧 + 𝑐9𝐿𝑟 𝑖𝑓 𝑧 = 1 𝑓𝑜(𝑟3 𝐴𝑝ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜 star I and the telescope are covered by 𝑑𝑖 . Since it is = {𝐿𝑧 − 𝑐9𝐿𝑟 𝑖𝑓 𝑧 = 2 𝑗 = 1, … , 𝑆𝑁 𝑓𝑜𝑟2 𝑃) 𝑒𝑟𝑖ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖 chosen at random at the beginning of the method, the 𝐿𝑧 + 𝑐10𝐿𝑟 𝑖𝑓 𝑧 = 3 𝑓𝑜𝑟 𝑁𝑒𝑢𝑡𝑟𝑎𝑙 𝑟𝑒𝑔𝑖 location of the telescope 𝐿𝑇 remains constant throughout ∑𝑆𝑁 𝑗=1 𝐿 𝑚𝑗 (3 the optimization. 𝐿𝑃 = 𝐿 𝑆𝑁 3) 𝑆,𝑛𝑒𝑤,𝑖 = 𝐿𝑆,𝑖 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒 𝑖 = 1, … , 𝑛𝑠 (25) To validate travel and reducing the noise's influence, 𝐷 = 𝑐6𝐿𝑆,𝑖 (26) one of the most crucial factors is 𝑆𝑁The planet's position 𝑁𝑜𝑖𝑠𝑒 = (𝑐 3 7) 𝐿𝑆 (27) inside its star system is specified by analyzing the quantity The coefficients 𝑐6 and 𝑐7 are a random vector from 0 of signals received, which is derived from the planet's to 1 and a random integer from −1 to 1. The amount of estimated position. Several 𝑆𝑁 signals are taken into new luminosity, 𝐿𝑖,𝑛𝑒𝑤 is determined by: account for this reason in the 𝑇𝑆 algorithm Eq. (32). The 𝑅 𝑛𝑒𝑤 𝑖, coefficient 𝑐 𝑛 9 is an accidental number ranging from −1 to 𝐿 𝑠 𝑖,𝑛𝑒𝑤 = 2 𝑖 = 1, … , 𝑛𝑠 (28) 1. 𝑐10 is a random vector with values in the range of −1 to (𝑑𝑖,𝑛𝑒𝑤) 1. Once signals 𝐿𝑚 have been determined, the average The new 𝐿𝑆 and the position of the telescope may be of 𝑆𝑁 signals are used to adjust the detected final planet used to compute the parameter 𝑑𝑖,𝑛𝑒𝑤 . It is possible to position 𝐿𝑃. The terms Aphelion and Perihelion refer to assess the possibility of transit by comparing 𝐿𝑖 and 𝐿𝑖,𝑛𝑒𝑤. the relative furthest and closest distances, in astronomy, If 𝑇 = 1, the phase of the planet is utilized; if not, the between a planet (such as Earth) and the Sun or another phase of the neighbor e is used in this iteration. host star. Three zones—Aphelion, Perihelion, and Neutral 𝐼𝑓 𝐿𝑖,𝑛𝑒𝑤 < 𝐿𝑖 𝑃𝑇 = 1 (𝑇𝑟𝑎𝑛𝑠𝑖𝑡) regions (the area between Aphelion and Perihelion areas), (29) 𝐼𝑓 𝐿𝑖,𝑛𝑒𝑤 ≥ 𝐿𝑖 𝑃𝑇 = 0 (𝑁𝑜 𝑇𝑡𝑎𝑛𝑠𝑖𝑡) Eq. (32), are affected by the TS technique, which estimates This probability 𝑃𝑇 is represented by the numbers0 the planet's orbital location using the zone parameter (𝑧) (non-transit) and 1 (probability of transit). If 𝑃𝑇 = 1, if in the planet phase. the planet phase cannot be used, this iteration uses the • Neighbor Phase neighbor phase. In this phase, the present planet of the star will take its • Planet Phase position whether the neighbor has superior circumstances Initially, at this stage, the discovered initial position compared to the current planet. of planet is identified. The quantity of light that the (𝑐11𝐿𝑠,𝑛𝑒𝑤 + 𝑐12𝐿𝑟) telescope receives decreases during a planet's transit 𝐿𝑧 = (34) 2 288 Informatica 49 (2025) 281–298 X. Li et al. 𝐿𝑛,𝑗 number 𝑐𝑘, which can be 1, 2, 3, or 4. A random power 𝐿𝑧 − 𝑐13𝐿𝑟 𝑖𝑓 𝑧 = 1 𝑓𝑜𝑟 𝐴𝑝ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 between 1 and (𝑛𝑠 ∗ 𝑆𝑁) is represented by 𝑃. (35) = {𝐿𝑧 + 𝑐13𝐿𝑟 𝑖𝑓 𝑧 = 2 𝑓𝑜𝑟 𝑃𝑒𝑟𝑖ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑐16𝐿𝑃 + 𝑐15𝑘 𝑖𝑓 𝑐𝑘 = 1 (𝑆𝑡𝑎𝑡𝑒 1) 𝐿𝑧 + 𝑐14𝐿𝑟 𝑖𝑓 𝑧 = 3 𝑓𝑜𝑟 𝑁𝑒𝑢𝑡𝑟𝑎𝑙 𝑟𝑒𝑔𝑖𝑜𝑛 𝑐16𝐿𝑃 − 𝑐15𝑘 𝑖𝑓 𝑐 𝐿 𝑘 = 2 (𝑆𝑡𝑎𝑡𝑒 2) 𝐸,𝑗 = { ∑𝑆𝑁 (37) 𝐿𝑃 − 𝑐15𝐾 𝑖𝑓 𝑐𝑘 = 3 (𝑆𝑡𝑎𝑡𝑒 3) 𝑗=1 𝐿𝑛.𝑗 𝐿 (36) 𝑁,𝑖 = 𝐿𝑃 + 𝑐15𝐾 𝑖𝑓 𝑐𝑘 = 4 (𝑆𝑡𝑎𝑡𝑒 4) 𝑆𝑁 𝐾 = (𝑐17) 𝑃 Eq. (34) is used to estimate the neighbor 𝐿𝑧 𝐿𝑟 (38) beginning position Considering its host star 𝐿𝑠,𝑛𝑒𝑤 and an accidental place 𝐿𝑅. 𝐿𝑁 determines the neighbor planet’s 3.6 K-Fold Cross validation ultimate position planets Eq. (35) and (36). The K-fold cross-validation is a widely utilized and reliable coefficients 𝑐11 and 𝑐12 in Eq. (41) handle a randomized approach for evaluating and selecting models, especially integer in the range of 0 to 1. Moreover, the in classification and regression tasks. This technique coefficients 𝑐13 and 𝑐14 represent a vector with a random involves dividing the dataset into k equally sized subsets number and a range of −1 to 1, respectively. (folds). During each iteration, one-fold is reserved for • Exploitation phase validation while the remaining k−1 folds are used for The ideal planet for every star is identified in the training. This process is repeated k times, ensuring that earlier stages. Finding a planet by itself is meaningless. every subset serves once as the validation set. In this study, Understanding the features of the planet and the a 5-fold cross-validation scheme (k = 5) was adopted to circumstances that support life is essential. This is carried thoroughly evaluate the proposed models and improve out during the TS algorithm's Exploitation step. This stage their generalization capability by systematically rotating expresses a revised definition of the 𝐿𝑃. 𝐿𝑃 in the present the training and testing partitions. As illustrated in Fig. 2, phase 𝐿𝐸 alludes to the features of the planet. Using Eq. the Support Vector Classifier (SVC) model demonstrated (37), (38), the planet’s ultimate properties are adjusted its peak performance during Fold 5, achieving a maximum 𝑆𝑁 times (𝑗 = 1,… , 𝑆𝑁) by adding new knowledge (𝐾). Accuracy of 0.82. Similarly, the Extra Trees Classifier 𝑐15 is an accidental number ranging from zero 𝑡𝑜 𝑡𝑤𝑜, (ETC) also recorded its highest accuracy in Fold 5, with and 𝑐16 is an accidental number ranging from 𝑧𝑒𝑟𝑜 to an Accuracy of 0.846, indicating consistent model 𝑜𝑛𝑒. 𝑐17 is an accidental vector ranging from zero to one. performance across folds. The knowledge index is represented by the random Figure 2: The results of 5-Fold Cross validation. 3.7 Evaluation metrics • Accuracy Accuracy is the ratio of correctly predicted The evaluation metrics of the classification models observations to the total observations. It is a general provide a quantitative measure of the performance of the measure of a model’s effectiveness. models [25]. In this study, four fundamental evaluation Accuracy serves as a baseline metric to understand the metrics were employed to assess the performance of the overall performance of the model. However, it may be classification models: Accuracy, Precision, Recall, and misleading when dealing with imbalanced datasets, which F1-score. These metrics provide a comprehensive is why complementary metrics are also considered. understanding of model performance, especially in the context of imbalanced or complex classification problems. Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 289 • Precision in environmental science are helpful and important to Precision is the ratio of correctly predicted positive predict occurrences such as the spread of pollution, observations to the total predicted positives. It reflects climate change, and water resource availability. The how well the model avoids false positives. models are helpful in supporting sustainability Precision is particularly valuable in scenarios were management and conservation. Water quality prediction predicting a false positive may lead to unnecessary actions grounded on models such as 𝐸𝑇𝐶 and 𝑆𝑉𝐶 is among the or costs. most vital inputs into the planning and regulation of water • Recall (Sensitivity) quality. Recall is the ratio of correctly predicted positive Most advanced optimization techniques, such as observations to all actual positives. It shows how well the 𝑇𝑆𝑂𝐴 and 𝐶𝐺𝑂, have been employed in the enforcement model detects actual positive cases. Recall is emphasized of 𝑆𝑉𝐶 and 𝐸𝑇𝐶 for the much more improved when it is more critical to identify all positive cases, even classification of water quality according to 𝑝𝐻. As a at the cost of some false positives. result, the base models 𝐸𝑇𝐶 and 𝑆𝑉𝐶 are involving the • F1-score application of optimizers to constitute hybrid models such The F1-score is the harmonic mean of Precision and as 𝐸𝑇𝑇𝑆, 𝐸𝑇𝐶𝐺, 𝑆𝑉𝑇𝑆, and 𝑆𝑉𝐶𝐺. Performance checking Recall. It provides a single metric that balances both of the derived hybrid models is to be done for water concerns, particularly useful when class distribution is quality prediction with respect to the 𝑝𝐻 level. uneven. The F1-score provides a consolidated metric for • Hyperparameters’ results overall classification performance, particularly useful In machine learning, hyperparameters are essential when neither precision nor recall alone is sufficient for settings defined prior to training that influence model model evaluation. performance and learning behavior. Unlike trainable 𝑇𝑃 + 𝑇𝑁 parameters, hyperparameters must be optimized to Accuracy = (39) 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 achieve the best results. In this study, random search was 𝑇𝑃 used to tune the hyperparameters of the proposed SVC- 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (40) 𝑇𝑃 + 𝐹𝑃 and ETC-based hybrid models. 𝑇𝑃 As shown in Tables 2 and 3, ETC-based models were 𝑅𝑒𝑐𝑎𝑙𝑙 = (41) 𝑇𝑃 + 𝐹𝑁 optimized using parameters such as n_estimators, 2. 𝑇𝑃 max_depth, min_samples_split, min_samples_leaf, and 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = (42) 2. 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 max_leaf_nodes. For example, ETTS used n_estimators = 143 and max_leaf_nodes = 1431, while ETCG had higher 4 Result and riscussion values like n_estimators = 1805 and max_leaf_nodes = 17090. Prediction is actually something quite central to scientific SVC-based models were tuned with C and gamma. research and practical decision-making, dealing with the SVTS used C = 103.098, gamma = 138.373, while SVCG estimation of the future state or event given current and had C = 679.000, gamma = 111.500. The base SVC and historical data. Precise predictions are important in diverse ETC models retained simpler, default configurations. This fields such as meteorology to finance, for which the tuning improved accuracy and computational efficiency information furnished stands useful in planning, risk across all hybrid models. management, and policy development. Predictive models Table 2: The results of Hyperparameters for ETC-based hybrid models. Hyperparameter Models n_estimators max_depth min_samples_split min_samples_leaf max_leaf_nodes ETTS 143 143 0.001 0.000 1431 ETCG 1805 142 0.972 0.500 17090 ETC 100 None 2.000 1.000 None Table 3: The results of Hyperparameters for SVC-based hybrid models. Hyperparameter Models C gamma SVTS 103.098 138.373 SVCG 679.000 111.500 SVC 1.000 scale • Convergence curves successive iterations, with the y-axis representing model Figure 3 illustrates the convergence curves of the accuracy and the x-axis denoting the number of iterations. proposed hybrid models, which combine machine learning The convergence behavior varies notably across the classifiers (SVC and ETC) with metaheuristic hybrid configurations. The SVTS model (SVC optimized optimization algorithms (TSOA and CGO). The figure by TSOA) exhibits a steady, linear improvement in captures the progression of classification accuracy across accuracy, reflecting a stable convergence pattern. In 290 Informatica 49 (2025) 281–298 X. Li et al. contrast, the SVCG model (SVC optimized by CGO) sharper rise in accuracy, ultimately reaching a highly demonstrates a less consistent trajectory, with noticeable competitive performance level. fluctuations in accuracy, though an overall upward trend Among all models, ETTS achieved the highest final is still evident. accuracy of 0.84, showcasing the effectiveness of the Similarly, the ETTS model (ETC optimized by TSOA optimizer with the ETC classifier. Conversely, TSOA) shows a smooth and consistent increase in SVCG attained the lowest peak accuracy of approximately accuracy, indicating robust convergence characteristics. 0.77, suggesting less stable convergence when SVC is The ETCG model (ETC optimized by CGO) achieves a paired with CGO. Figure 3: The convergence curve of the four presented hybrid models Table 4 presents the performance metrics—Accuracy, enhancements are also visible in Precision, Recall, and F1 Precision, Recall, and F1 Score—for both the training and Score. During testing, although the performance gap testing phases of the base classifiers (ETC and SVC) and slightly narrows, SVTS still outpaces the base model with their corresponding hybrid variants (ETTS, ETCG, SVTS, an accuracy of 0.760, whereas SVCG and SVC followed and SVCG). Additionally, Figure 4 complements these at 0.755 and 0.745, respectively. results with 3D bar plots that provide a visual The visualized results in Figure 4 reinforces these representation of the metric distributions for each model, findings. The 3D bar plots clearly illustrate the consistent highlighting comparative strengths in both learning and superiority of hybrid models, particularly ETTS, across all generalization capabilities. evaluation metrics. The visual spacing between the bars Comparing the base model ETC with its hybrids, reflects the degree of improvement, emphasizing how ETTS and ETCG, it is evident that both optimized variants optimization algorithms—especially TSOA—enhance consistently outperform the base model in both training both model learning (training performance) and and testing phases. For example, in the training stage, generalization (testing performance). The graphics also ETTS achieved the highest accuracy (0.910), followed highlight that the ETTS model maintains the most closely by ETCG (0.897), while ETC lagged at 0.881. balanced and highest-performing profile among all tested Similar trends are observed across Precision, Recall, and classifiers. F1 Score. These performance gains continue in the testing In summary, the combination of numerical evidence phase, where ETTS and ETCG maintained superior from Table 4 and graphical insights from Figure 4 generalization, with accuracies of 0.778 and 0.770, confirms that hybrid models deliver significantly respectively, compared to ETC’s 0.750. improved performance over their base classifiers. ETTS Likewise, for the SVC-based models, both SVTS and stands out as the most effective model, demonstrating the SVCG outperformed the baseline SVC during training. highest overall accuracy and stability across all metrics in SVTS achieved an accuracy of 0.894, and SVCG recorded both training and testing phases. 0.879, compared to SVC’s 0.850. Performance Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 291 Table 4: ETC and SVC base models achieved results through the performance evaluators Metrics Section Model Accuracy Precision Recall F1 _Score ETTS 0.910 0.911 0.910 0.910 ETCG 0.897 0.901 0.897 0.897 ETC 0.881 0.888 0.881 0.880 Training SVTS 0.894 0.894 0.894 0.894 SVCG 0.879 0.879 0.879 0.879 SVC 0.850 0.850 0.850 0.849 ETTS 0.778 0.781 0.778 0.778 ETCG 0.770 0.780 0.770 0.769 ETC 0.750 0.762 0.750 0.749 Testing SVTS 0.760 0.764 0.760 0.760 SVCG 0.755 0.757 0.755 0.755 SVC 0.745 0.746 0.745 0.745 292 Informatica 49 (2025) 281–298 X. Li et al. Figure 4: 3D bar plot for the performance of the models in train and test phases. Table 4 outlines the performance metrics of base Neutral condition. Whereas 𝐸𝑇𝑇𝑆 achieves higher scores models and hybrid models. In a similar manner, Table 5 with 0.860 in precision, 0.901 in recall, and 0.880 in present the models' precision, Recall, and 𝐹1 𝑠𝑐𝑜𝑟𝑒 but in 𝐹1 𝑠𝑐𝑜𝑟𝑒. These numbers highlight the enhanced a more detailed breakdown of machine learning models performance of 𝐸𝑇𝑇𝑆, particularly in recall and that applied to water quality classification based on pH 𝐹1 𝑠𝑐𝑜𝑟𝑒𝑠, demonstrating the effectiveness of levels and categorized into Acidic, Basic, and Neutral optimization. Both 𝐸𝑇𝐶 and 𝑆𝑉𝐶 show substantial conditions. The performance comparison of 𝐸𝑇𝐶 with improvements in precision, recall, and 𝐹1 𝑠𝑐𝑜𝑟𝑒𝑠 when 𝐸𝑇𝑇𝑆 reveals significant improvements across all 𝑝𝐻 optimized with 𝑇𝑆𝑂𝐴 and 𝐶𝐺𝑂, respectively. For conditions. For the Acidic condition, 𝐸𝑇𝐶 displays an instance, in the acidic condition, 𝑆𝑉𝐶 achieves a precision 𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.842, recall of 0.804, and precision of of 0.800 while 𝑆𝑉𝑇𝑆 outperforms 𝑆𝑉𝐶 by improvement 0.883 , whereas 𝐸𝑇𝑇𝑆 improves these metrics to 0.874 in in precision to 0.865. The optimized models demonstrate precision, 0.868 in recall, and 0.871 in 𝐹1 𝑠𝑐𝑜𝑟𝑒. In the superior capability in accurately classifying water quality, Basic condition, with a precision of 0.919, recall of 0.732, with 𝐸𝑇𝑇𝑆 and 𝐸𝑇𝐶𝐺 performing notably well in various and 𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.815, 𝐸𝑇𝐶 trails behind 𝐸𝑇𝑇𝑆, which metrics. Among all the models evaluated, the 𝐸𝑇𝑇𝑆 model performs better with a precision of 0.890, recall of 0.807, emerges as the best performer, achieving the highest and 𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.846. 𝐸𝑇𝐶 reports an 𝐹1 𝑠𝑐𝑜𝑟𝑒 of overall accuracy in 𝑝𝐻 − 𝑏𝑎𝑠𝑒𝑑 water quality 0.852, recall of 0.919, and precision of 0.794 for the classification. Table 5: Model performance in the three different conditions Metric Model Condition P-value precision recall f1-Score Acidic 0.874 0.868 0.871 0.032 ETTS Basic (alkaline) 0.890 0.807 0.846 0.027 Neutral 0.860 0.901 0.880 0.018 Acidic 0.887 0.834 0.860 0.04 ETCG Basic (alkaline) 0.922 0.764 0.836 0.035 Neutral 0.821 0.921 0.868 0.022 Acidic 0.883 0.804 0.842 0.045 ETC Basic (alkaline) 0.919 0.732 0.815 0.039 Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 293 Neutral 0.794 0.919 0.852 0.025 Acidic 0.865 0.841 0.853 0.048 SVTS Basic (alkaline) 0.841 0.811 0.826 0.041 Neutral 0.852 0.883 0.867 0.029 Acidic 0.840 0.825 0.832 0.052 SVCG Basic (alkaline) 0.827 0.804 0.815 0.047 Neutral 0.849 0.872 0.860 0.031 Acidic 0.800 0.801 0.801 0.059 SVC Basic (alkaline) 0.814 0.764 0.788 0.053 Neutral 0.833 0.855 0.844 0.010 Figure 5 depicts a line plot illustrating the numerical predicts 558, 348, and 205 samples in neutral, acidic, and differences in how well different machine learning models alkaline groups. While 𝐸𝑇𝑇𝑆 improves upon this with a perform when used to classify water quality based on 𝑝𝐻. predicted value of 547, 376, and 226 samples in neutral, This figure's main purpose is to compare various models' acidic, and alkaline, indicating an enhancement in efficaciousness visually. Particularly focusing on the accuracy. This improvement is quantified as a percentage performance improvements achieved by incorporating difference in the accuracy of the models, with 𝐸𝑇𝑇𝑆, in sophisticated optimization algorithms. 𝐸𝑇𝐶 and its hybrid general, showing lower percentage differences compared version, 𝐸𝑇𝑇𝑆, show distinct differences. 𝐸𝑇𝐶 correctly to 𝐸𝑇𝐶, highlighting its enhanced predictive capability. Figure 5: Line plot representing the number of correct predictions by ETC-based models A comprehensive evaluation of each model's accuracy performance. 𝐸𝑇𝑇𝑆 predicts acidic samples with can be done thanks to the confusion matrix, which is 376 correct, seven misclassified as alkaline, and 50 as depicted in Figure 6 and compares actual and predicted neutral. For alkaline samples, 𝐸𝑇𝑇𝑆 predicts classifications. An illustration of the confusion matrix 226 correctly, with 15 misclassified as acidic and 39 as created by different machine-learning models for neutral. Neutral samples are predicted with 547 correctly, determining the pH-level-𝑏𝑎𝑠𝑒𝑑 classification of water 39 as acidic, and 21 as alkaline. Comparatively, the 𝐸𝑇𝑇𝑆 quality is shown in Figure 6. Each model's accuracy can model outperforms its base model 𝐸𝑇𝐶, especially in be thoroughly evaluated thanks to the confusion matrix, predicting neutral samples with significantly higher which displays actual versus predicted classifications. accuracy. In acidic classification, 𝐸𝑇𝑇𝑆 shows slight 𝐸𝑇𝐶 predicts acidic samples with 348 correct, three improvement with fewer misclassifications. For alkaline misclassified as alkaline, and 82 as neutral. For alkaline predictions, both models show comparable performance, samples, it predicts 205 samples correctly, with 12 though 𝐸𝑇𝑇𝑆 has a marginally better accuracy. Among all samples misclassified as acidic and 63 samples as neutral. models, the best performance is observed in the 𝐸𝑇𝑇𝑆 Neutral samples are predicted, with 558 samples model, indicating its superior capability in accurate 𝑝𝐻 − correctly, 34 as acidic, and 15 as alkaline. When 𝑏𝑎𝑠𝑒𝑑 water quality classification. optimized using the Transit Search Optimization Algorithm, the hybrid model (𝐸𝑇𝑇𝑆) shows improved 294 Informatica 49 (2025) 281–298 X. Li et al. Acidic Alkaline Neutral Acidic Alkaline Neutral Acidic 376 7 50 Acidic 361 4 68 Aalkaline 15 226 39 Aalkaline 12 214 54 Neutral 39 21 547 Neutral 34 14 559 ETTS ETCG Acidic Alkaline Neutral Acidic 348 3 82 Aalkaline 12 205 63 Neutral 34 15 558 ETC Figure 6: Confusion matrix for the accuracy of each model. To evaluate the classification performance of the Performance across specific pH categories is also models in predicting pH-based water quality, the Receiver shown: Operating Characteristic (ROC) curves in Figure 7 are • The acidic class (brown line) demonstrates analyzed. These curves illustrate the trade-off between the moderate sensitivity at the outset, improving with true positive rate and the false positive rate at various higher false positive rates. threshold settings, providing a visual assessment of each • The basic (alkaline) class (cyan line) exhibits the model's diagnostic ability. most favorable curve, with a sharp ascent The micro-average ROC curve (green dashed line) indicating excellent classification performance at aggregates the contributions of all classes, treating each low false positive rates. prediction equally. It reflects the classifier's overall ability • The neutral class (purple line) shows a more across all samples. The curve’s steep initial rise indicates gradual increase, reflecting a balanced but less strong overall performance, with high sensitivity achieved pronounced trade-off between true and false at low false positive rates. positives. The macro-average ROC curve (red dashed line) Overall, the cyan curve representing basic pH calculates the average performance across classes by conditions shows the highest classification accuracy, assigning equal weight to each one, regardless of class while the green micro-average curve confirms the imbalance. It provides a balanced view of performance robustness of the models in handling all classes and shows a smoother increase in true positive rate collectively. compared to the micro-average. Figure 7: The ROC curves for the performance of the most efficient hybrid models • Wilcoxon test the Wilcoxon test statistic for each model when compared Figure 8 presents a radar plot of the Wilcoxon test pairwise, quantifying relative performance in terms of statistics for all single and hybrid models: SVC, SVTS, statistical ranking. SVCG, ETC, ETTS, and ETCG. The plotted values reflect From the figure: Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 295 • SVC records the highest Wilcoxon statistic • ETCG and ETC show intermediate values (7725 (13,521), indicating that its performance and 10,063.5), reflecting moderate performance significantly differs—statistically outperforming consistency. or underperforming—relative to others. The shaded blue region visually represents the • ETTS also scores high (12,648.5), suggesting a distribution and spreads of the Wilcoxon test statistics strong and consistent performance validated by across all models. A wider area suggests higher variability statistical evidence. in model ranks, while more compact regions suggest more • In contrast, SVTS and SVCG have lower stability. statistics (9313 and 10,945.5, respectively), Overall, the Wilcoxon analysis complements pointing to less statistical dominance or more accuracy-based evaluation by statistically confirming the variability across comparisons. comparative significance of the observed model performance differences. Figure 8: The results of Wilcoxon test for models’ performance. applicable to broader real-world conditions. Additionally, 5 Discussion the integration of deep learning architectures—such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs)—can be investigated for their potential 5.1 Limitations of the study to capture temporal or spatial correlations in water quality While the proposed hybrid models (ETTS, ETCG, SVTS, trends. Furthermore, an ensemble framework combining and SVCG) demonstrated superior classification multiple hybrid models could be tested using voting or performance over their baseline counterparts, the study stacking strategies to further improve classification presents several limitations that warrant attention. First, performance. the dataset used for model training and evaluation comprises only 1,320 daily records, which may limit the 5.3 Practical implications of the study generalizability of the models across diverse geographical The findings of this study highlight the practical viability regions or seasonal variations. A larger and more of hybrid machine learning and optimization frameworks heterogeneous dataset could improve robustness and in environmental monitoring applications. By accurately reduce the risk of overfitting. Secondly, the models focus classifying water quality based on pH levels, the proposed solely on pH as the output classification parameter, models can assist water resource managers, environmental potentially neglecting the complex interactions of other agencies, and public health officials in making informed water quality indicators (e.g., turbidity, nitrate levels) that decisions regarding water treatment and ecosystem may jointly influence classification outcomes. preservation. The enhanced predictive accuracy of the hybrid models ensures timely identification of acidic or 5.2 Potential future studies alkaline deviations, which are critical for preventing metal Building upon the promising results of this study, future toxicity, preserving aquatic biodiversity, and maintaining research can explore several enhancements. One key water usability for irrigation and drinking purposes. direction is the expansion of the dataset, both temporally Moreover, the lightweight nature of the models (especially and spatially, to include diverse water bodies, seasonal ETC and SVC) makes them suitable for deployment in dynamics, and additional environmental indicators. This embedded or real-time monitoring systems, offering would allow for the training of more generalizable models 296 Informatica 49 (2025) 281–298 X. Li et al. scalable solutions for smart water quality surveillance in (KNN) classifier and reported an accuracy of 0.9067. In both urban and rural settings. contrast, the present study's ETC+TSOA model attained an accuracy of 0.91, outperforming the KNN-based model 5.4 Comparison between the results of and demonstrating competitive results relative to more present study and previous works complex ensemble methods. While the accuracy of the ETC+TSOA model is Table 6 presents a comparative analysis between the slightly lower than that of RFR and CATBoost, it is proposed hybrid model (ETC+TSOA) from the present important to note that the proposed model leverages study and several existing state-of-the-art methods in the advanced metaheuristic optimization to enhance model domain of water quality classification. The comparison is performance while maintaining a balance between based on classification accuracy, which is a key interpretability, computational efficiency, and performance metric. Among the referenced studies, Putra generalization capability. This underscores the value of et al. [17] achieved the highest accuracy (0.9828) using a hybrid machine learning and optimization approaches, Random Forest Regressor (RFR), followed closely by especially in resource-constrained or real-time Idroes et al. [15] with a CATBoost model (0.9781). environmental monitoring contexts. Sasmita et al. [16] employed a K-Nearest Neighbors Table 6: The Comparison between the results of present study and previous works. Article Reference Model Metrics Accuracy Idroes et al. [15] CATBoost 0.9781 Sasmita et al. [16] KNN 0.9067 Putra et al. [17] RFR 0.9828 Present study - ETC+TSOA 0.91 improved Precision by 2.36%, increased Accuracy and 6 Conclusion Recall by 2.67%, and a better 𝐹1 𝑆𝑐𝑜𝑟𝑒 by 2.67% also. For 𝑆𝑉𝐶 models, 𝑆𝑉𝑇𝑆 increased Accuracy and Recall by Water quality is a very important aspect in which 2.01%, increased Precision by 2.41%, and also increased environmental health and safety can be ensured. For the 𝐹1 𝑆𝑐𝑜𝑟𝑒 by 2.01% from the base 𝑆𝑉𝐶. Similarly, understanding aquatic ecosystems for the purpose of 𝑆𝑉𝐶𝐺 also outperformed 𝑆𝑉𝐶, with increases of 1.34% in monitoring and management, proper classification of Accuracy and Recall, and it boosted Precision by 1.47%. water quality is required, mainly based on their 𝑝𝐻 levels. 𝐸𝑇𝑇𝑆 turned out to be the best improvement among all, This research article applied various methods of artificial with the highest scores on all metrics. intelligence and optimization algorithms for the High capability of hybrid models to provide more categorization of the quality of water based on pH levels, reliable and accurate pH-based water quality prediction hence providing a robust framework for environmental underlines the potential for such advanced techniques in monitoring. In this research, the dataset used contains environmental monitoring and management. These results 1320 records in total; each record has information on the demonstrate how combining machine learning with following input parameters: Date, Salinity, Dissolved advanced optimization algorithms yields significantly Oxygen, secchi Depth, Water Depth, Water Temperature, higher predictive accuracy and reliability for 𝑝𝐻-based and Air Temperature. The output parameter in this water quality classification. The usefulness of hybrid analysis is 𝑝𝐻, or the level of acidity, alkalinity, and models in these applications, due to their increased neutrality indicative of water. These are daily records; accuracy, makes them very handy tools in the prediction hence, they provide a holistic view of how the respective of water quality, therefore helping in water body environmental matters are changing from day to day. management and conservation. In the presented study, SVC and 𝐸𝑇𝐶 were used for water quality prediction by considering pH as one of the main influential parameters. In the present study, a more References advanced class of optimizers in the form of the Transit [1] Boyd, C.E (2019). Water quality: an introduction. Search Optimization Algorithm and Chaos Game Springer Nature. Optimization were coupled with the svcand 𝐸𝑇𝐶 to [2] Mekonnen, M.M. and A.Y. Hoekstra (2016). Four improve their corresponding predictive accuracies. The billion people facing severe water scarcity. obtained results reflected that the hybrid models 𝐸𝑇𝑇𝑆, Science Advances, Science, 2(2), p. e1500323. 𝐸𝑇𝐶𝐺, 𝑆𝑉𝑇𝑆, and 𝑆𝑉𝐶𝐺 outperformed their base model https://doi.org/10.1126/sciadv.1500323. with a significant difference in performance. [3] Vorosmarty, C., P. Green, J. Salisbury and R. Comparing 𝐸𝑇𝑇𝑆, when all models are taken into Lammers (2000). Global Water Resources: consideration against the 𝐸𝑇𝐶 base model, it improves Vulnerability from Climate Change and Accuracy by 3.73%, with increased Precision by 2.49%, Population Growth. Science, Science, 289, p. 284. boosted Recall by 3.73%, and increased 𝐹1 𝑆𝑐𝑜𝑟𝑒 by https://doi.org/10.1126/science.289.5477.284. 3.87%. On the other hand, 𝐸𝑇𝐶𝐺 outperforms 𝐸𝑇𝐶 with Hybrid Machine Learning and Optimization Algorithms for pH-Based… Informatica 49 (2025) 281–298 297 [4] Chapman, D (1992). Water Quality Assessments - environmental monitoring. Leuser Journal of A Guide to Use of Biota, Sediments and Water in Environmental Studies, Heca Sentra Analitika, Environmental Monitoring - Second Edition. 1(2), pp. 62–68. Taylor & Francis, https://doi.org/10.60084/ljes.v1i2.99. https://doi.org/10.1201/9781003062103. [16] Sasmita, N.R., S. Ramadeska, Z.M. Kesuma, T.R. [5] Schwarzenbach, R., B. Escher, K. Fenner, T. Noviandy, A. Maulana, M. Khairul and R. Hofstetter, C. Johnson, U. Gunten and B. Wehrli Suhendra (2024). Decision Tree versus k-NN: A (2006). The Challenge of Micropollutants in Performance Comparison for Air Quality Aquatic Systems. Science (New York, N.Y.), Classification in Indonesia. Infolitika Journal of Science, 313, pp. 1072–1077. Data Science, Heca Sentra Analitika, 2(1), pp. 9– https://doi.org/10.1126/science.1127291. 16. https://doi.org/10.60084/ijds.v2i1.179. [6] Yang, X (2025). Economic Cost Prediction Model [17] Putra, F.M. and I.S. Sitanggang (2020). for Building Construction Based on CNN-DAE Classification model of air quality in Jakarta using Algorithm. Informatica, Slovenian Society decision tree algorithm based on air pollutant Informatika, 49(5). standard index, In IOP Conf Ser Earth Environ Sci, https://doi.org/10.31449/inf.v49i5.7029. IOP Publishing, Purpose-led Publishing, p. 12053. [7] Dash, C.S.K., S.C. Nayak, A.K. Behera and S. DOI: 10.1088/1755-1315/528/1/012053 Dehuri (2023). A Neuro-Fuzzy Predictor Trained [18] Saxena, A. and S. Shekhawat (2017). Ambient air by an Elitism Artificial Electric Field Algorithm quality classification by grey wolf optimizer-based for Estimation of Compressive Strength of support vector machine. Journal of Environmental Concrete Structures. Informatica, Slovenian and Public Health, Wiley Online Library, 2017(1), Society Informatika, 47(5). pp. 3131083. https://doi.org/10.31449/inf.v47i5.3951. https://doi.org/10.1155/2017/3131083. [8] Benkaddour, M.K (2021). CNN based features [19]https://www.kaggle.com/datasets/supriyoain/water- extraction for age estimation and gender quality-data. classification. Informatica, Slovenian Society [20] Vapnik, V (1998). Statistical Learning Theory. Informatika, 45(5). New York. John Willey & Sons. Inc. https://doi.org/10.31449/inf.v45i5.3262. [21] Maldonado, S., J. Pérez, R. Weber and M. Labbé [9] Maktum, T., N. Pulgam, V. Chandgadkar, P. (2014). Feature selection for support vector Pathak and A. Solanki (2025). A Machine machines via mixed integer linear programming. Learning Based Framework for Bankruptcy Information Sciences, Elsevier, 279, pp. 163–175. Prediction in Corporate Finances Using https://doi.org/10.1016/j.ins.2014.03.110. Explainable AI Techniques. Informatica, [22] Geurts, P., D. Ernst and L. Wehenkel (2006). Slovenian Society Informatika, 49(15). Extremely randomized trees. Machine Learning, https://doi.org/10.31449/inf.v49i15.6745. Springer Nature, 63, pp. 3–42. [10] Mitchell 1951-, T.M (1997). Machine Learning. https://doi.org/10.1007/s10994-006-6226-1. McGraw-Hill. [23] Talatahari, S. and M. Azizi (2021). Chaos game [11] Al-Adhaileh, M. and F. Alsaade (2021). optimization: a novel metaheuristic algorithm. Modelling and Prediction of Water Quality by Artificial Intelligence Review, Springer Nature, Using Artificial Intelligence. Sustainability, 54(2), pp. 917–1004. MDPI, 13, p. 4259. https://doi.org/10.1007/s10462-020-09867-w. https://doi.org/10.3390/su13084259. [24] Hippke, M. and R. Heller (2019). Optimized transit [12] Zhou, J., Y. Wang, F. Xiao, Y. Wang and L. Sun detection algorithm to search for periodic transits (2018). Water Quality Prediction Method Based of small planets. Astronomy & Astrophysics, EDP on IGRA and LSTM. Water, MDPI, 10(9). Sciences, 623, p. A39. https://doi.org/10.3390/w10091148. https://doi.org/10.1051/0004-6361/201834672. [13] Zhang, Y., P. Thorburn, M. Vilas and P. Fitch [25]https://medium.com/@impythonprogrammer/evalu (2019). Machine learning approaches to improve ation-metrics-for-classification- and predict water quality data. fc770511052d#:~:text=Accuracy. https://doi.org/10.36334/MODSIM.2019.D5.ZH ANGYIF. [14] Hastie, T., R. Tibshirani, J.H. Friedman and J.H. Friedman (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Nature. https://doi.org/10.1007/978-0- 387-21606-5. [15] Idroes, G.M., T.R. Noviandy, A. Maulana, Z. Zahriah, S. Suhendrayatna, E. Suhartono, K. Khairan, F. Kusumo, Z. Helwani and S. Abd Rahman (2023). Urban air quality classification using machine learning approach to enhance 298 Informatica 49 (2025) 281–298 X. Li et al. https://doi.org/10.31449/inf.v49i12.9298 Informatica 49 (2025) 299–318 299 Hybrid Machine Learning Framework for Type 2 Diabetes Prediction Using Metaheuristic Optimization Algorithms Naiyue Zhang1, Ying Liu2, * and Zheng Zhang3 1Department of Computer Application Engineering, Hebei Software Institute, Baoding 071000, China 2Department of Internet Commerce, Hebei Software Institute, Baoding 071000, China 3BeiJing HuaDe Eye Hospital, Beijing 100020, China E-mail: chanyu13579@163.com *Corresponding Author Keywords: diabetes, machine learning, gaussian process classification, henry gas solubility enhancement schemes (HGSO), and metaheuristic algorithms The general basis of diabetes prediction using machine learning involves the application of algorithms that take an overall look at multiple features like BMI and glucose levels, age, genetic predispositions, and other conditions that may predict the likelihood of developing diabetes. The data-driven schemes, such as neural networks or DTs, find patterns in past data and use these to provide reliable predictions about future diabetes cases. These schemes keep learning and improving; they grow with new inputs. ML now helps in early detection by the use of large datasets, thus enabling early actions such as lifestyle changes or medical therapies. Finally, it enhances healthcare by providing individualized risk assessment and thus enables timely actions to diminish the burden of diabetes. In addition, the application of ML schemes, including Gaussian Process Classification-GPC, Linear Discriminant Analysis-LDA with Henry Gas Solubility Optimization-HGSO, Chaos Game Optimization-CGO, and Chef-Based Enhancement scheme-CBOA, has greatly benefited the process of prediction. These schemes were combined with optimizers, guided by the objective of this work, which deals with predicting the type of diabetes and the diagnosis of persons vulnerable to it. This was a strategic fusion aimed at creating new hybrid schemes with increased precision in prediction. Further analysis showed that the GPCB model was the best, with an impressive 0.981 during training. By contrast, the GPCG and GPHG schemes are relatively less accurate, with an accuracy of 0.963 and 0.946, respectively. These results justify the utility of the integrated approach, where advanced ML algorithms were able to generate predictive schemes superior in terms of accuracy and efficiency compared to the classical methods. Povzetek: V članku je opisan sistem za napovedovanje sladkorne bolezni tipa 2 s pomočjo strojnega učenja. Algoritem GPCB združuje klasifikacijo Gaussovega procesa z metahevrističnimi optimizacijskimi algoritmi za kvalitetno diagnozo. 1 Introduction insulin to meet its needs, thereby resulting in high blood sugar levels [8]. Gestational diabetes develops during Type 2 diabetes, another name for diabetes, is a long-term pregnancy when fluctuations in hormones compromise metabolic illness marked by elevated blood glucose levels insulin activity, increasing the risk of complications for because of either the pancreas's insufficient production of both mother and child [9], [10]. Diabetes' persistent High insulin or its inability to utilize insulin effectively [1]. blood sugar levels can cause a stream of issues affecting Insulin is a hormone produced by the pancreas that many organ systems [11]. These include cardiovascular regulates blood sugar levels by allowing the absorption of disorders including strokes and heart attacks; nerve glucose into the cells to use it as energy [2]. Whenever the damage; diabetic neuropathy; kidney damage causing mechanism is disturbed, glucose builds up in the numbness, tingling (diabetic nephropathy), and circulation and causes hyperglycemia [3]. Diabetes is discomfort; as well as eye disturbances that can cause sorted into 3 types: type I, type II, and gestational diabetes blindness due to diabetic retinopathy, if not addressed [4]. Type 1 diabetes, which is frequently diagnosed in [12], [13]. Diabetes also raises the risk of ulcers in feet and childhood or adolescence, is caused by the immune system amputations owing to impaired circulation and damage to erroneously targeting and killing the insulin-generating nerves [14]. beta cells in the pancreas [5]. This involves lifetime insulin Management includes frequent testing of blood treatment to control blood sugar levels. Type 2 diabetes, glucose, proper nutrition, regular physical activity, and the most prevalent kind, usually develops in adulthood and insulin therapy or medication when necessary. Other is often associated with overweight, lack of exercise, and treatments for type 2 diabetes include weight loss and genetic risk [6], [7]. Type 2 diabetes develops when the smoking cessation. People with diabetes need training and body becomes resistant to or cannot produce sufficient support, as enabling them with skills for optimum self- 300 Informatica 49 (2025) 299–318 N. Zhang et al. management reduces complications, reflecting a LR, and SVM, might also classify individuals into types collaborative approach by all involved [15]: providers of of diabetes based on sets of different variables. This will healthcare, the patient, and family members [16], [17]. enable individual risk assessments and prevention Type 2 is a complex metabolic condition that casts ripples methods based on an individual profile, and in time, will in personal life since it has myriad implications for many allow healthcare professionals to offer more personalized facets [18], [19]. It presents physically as a constellation and effective preventative treatment [29]. of symptoms that include chronic thirst and frequent urination, fatigue, and unexpected weight gain or loss 1.1 Objectives [20], [21]. This chronic fight against blood sugar This article proposes developing a scheme for diagnosing becoming normal turns out to be an everyday obsession types of diabetes and predicting the likelihood of a person with food intake, medication routines, and even social being affected with it. In order to solve this issue, the use interactions [22]. Besides the physical discomforts, type 2 of ML schemes including LDA and GPC is chosen, along diabetes also has a great psychological and emotional with 3 optimizers: CGO, HGSO, and CBOA. The impact. The constant monitoring required to manage the integration of these optimizers with the schemes leads to disease can lead to feelings of anxiety, stress, and some new hybrid model generation, which is supposed to depression. The fear of complications is huge, with every give better performance in the prediction process. Further, increase or decrease in glucose triggering a snowball these newly designed hybrid schemes are evaluated for effect of questions about what this could mean for long- their performances using different plots and tables. It is term health and well-being. expected that through their dense analysis, information Type 2 diabetes can negatively affect social about the most effective performance of the different relationships and interactions. Even going out for meals schemes can be extracted, along with the potential deficit may become a maze of counting carbohydrates and in functionality among them. Such an inclusive strategy administering insulin, while social events may become will provide thorough knowledge about various schemes' distressing in their demand to explain dietary restrictions strengths and flaws that help in formulating approaches or personally withdraw to check blood glucose levels [23]. related to the diagnosis and prediction of diabetes. The stigma associated with diabetes can also make people Gaussian Process Classification (GPC) and Linear feel isolated or humiliated, disrupting interpersonal Discriminant Analysis (LDA) were picked owing to their interactions [24]. Besides that, type 2 diabetes may lead to complimentary capabilities in modeling classification serious financial burdens. Pharmaceutical treatment, challenges. GPC is a non-parametric, probabilistic model apparatus for blood glucose monitoring, and frequent that captures complicated, nonlinear interactions and medical consultations are not cheap, especially when offers uncertainty estimates, making it suited for the insurance coverage is inadequate. Further, loss of working nuanced and high-risk nature of diabetes prediction. days due to poor health or visiting doctors may affect Conversely, LDA is a basic yet powerful linear classifier earnings and professional development [25]. that performs well when class distributions are nearly Notwithstanding such constraints, persons with type 2 Gaussian. Its interpretability and minimal computing cost diabetes often show remarkable resilience and make it suitable for baseline comparison. LDA is good for resourcefulness [26]. Most learn to manage the efficiency and understanding, while GPC is good for complexity of their disease through education, proactive making strong, adaptable models of complicated health self-management, and support networks and feel data patterns. Together, they make a balanced framework. empowered by taking responsibility for their health. However, the pervasive nature of type 2 diabetes ensures its impacts are felt at all levels of life, making 2 Material and methods comprehensive approaches to prevention, treatment, and care of utmost importance. 2.1 Data collection Machine learning algorithms can predict the risk a person has for diabetes and even define which type of Prior to model training, the dataset underwent several diabetes the person is most probable to get, considering his preprocessing procedures to enhance data quality and or her medical history, life style habits, biomarkers, and model performance. Missing values were addressed using genetic trends. These algorithms are trained on large mean imputation for numerical features. Outliers were datasets consisting of data from diabetic and non-diabetic detected and mitigated using z-score normalization. All patients through a method called supervised learning. The continuous features were standardized to zero mean and computers learn to find, through patterns and links in data, unit variance. Categorical variables, if any, were encoded small signs and risk factors associated with different types using one-hot encoding. Feature selection was conducted of diabetes [27]. For example, ML schemes for the using mutual information to retain only the most relevant diagnosis of type 2 diabetes consider age, BMI, family predictors. The final dataset was randomly shuffled and medical history of diabetes, cholesterol levels, blood split into training and testing sets using an 70:30ratio to pressure, and glucose tolerance. These combined ensure unbiased model evaluation. Fig. 1 displays the far- indicators may, therefore, enable the model to project the reaching consequences of diabetes on a person's life, likelihood of a person developing type 2 diabetes over a spanning blood pressure to pregnancy, as it affects an specific period [28]. Other ML methods, including DT, Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 301 individual's well-being and lifestyle in general. This study importance in effective management and tries to make meaning out of the interaction of diabetes reduction of adverse effects of diabetes on with these major determinants, therefore, basically general health. Pregnancy complicates the care of determining the trend of the illness. diabetes because of fluctuating hormonal changes and increased insulin resistance. • High blood pressure worsens diabetes • Gestational diabetes may be developed during complications by essentially destroying blood pregnancy, increasing the risk for complications vessels and organs. High blood pressure and in both mother and child, including macrosomia, atherosclerosis accelerate the narrowing of preeclampsia, and anomalies at birth. Women arteries, which limits blood flow, thereby with previous diabetes have difficulties worsening the common diabetes consequences of managing blood sugar levels, again increasing heart disease, stroke, and kidney failure. risks for adverse outcomes such as preterm birth Hypertension further increases the risk for and cesarean section delivery. Close monitoring, diabetic retinopathy, which can cause visual dietary modification, and medication may be impairment or even total blindness. It also leads necessary to achieve appropriate risk reduction to peripheral artery disease, which raises the and optimal health for both mother and fetus. chances of foot ulcers and amputations in Such cooperation between obstetricians, diabetic patients. Good management of blood endocrinologists, and diabetes educators forms pressure through lifestyle modifications, the very foundation for the best pregnancy medication, and regular checks is of utmost outcomes among women with diabetes. Figure 1: The plot illustrating the Contour - color fill between the input and output 302 Informatica 49 (2025) 299–318 N. Zhang et al. 𝜋2 +(𝜇1 − 𝜇2) 𝑇Σ−1(𝜇1 − 𝜇2)) + 2 ln ( ) 2.2 Linear discriminant analysis (LDA) 𝜋1 An instance 𝑥's intended class is: Linear Discriminant Analysis (LDA) is a statistical 1, 𝑖𝑓 (𝑥) < 0, approach used to separate two or more classes by ?̂?(𝑥) = { (7) 2, 𝑖𝑓 𝛿(𝑥) > 0. identifying a linear combination of characteristics that best When both categories have identical priors, 𝜋1 = 𝜋differentiates them. It assumes that the different classes 2 , Eq. (5) takes a particular form: create data based on Gaussian distributions with the same 𝑇 covariance matrix. LDA is computationally efficient, 2(Σ−1(𝜇2 − 𝜇1)) 𝑥+(𝜇1 − 𝜇2) 𝑇Σ−1(𝜇1 − 𝜇2) (8) interpretable, and particularly successful when the = 0, relationship between features and labels is nearly linear, Whose statement on the left can be interpreted as making it suited for baseline comparison in medical 𝛿(𝑥) in Eq. (7). classification problems like diabetes prediction. LDA assumes that the 2 categories' matrices of covariance 2.3 Gaussian process classification (GPC) are similar [30], and one of the 2 categories has a greater Gaussian Process Classification (GPC) puts a average than the other, as seized 𝜇1 < 𝜇2. One of these Gaussian process prior over a latent function to predict the examples is the one provided for 𝑥 ∈ 𝑅 classes: chance of being in a certain class. This lets GPC capture Σ1 = Σ2 = Σ. (1) nonlinear patterns in big datasets in a flexible way and 1 (𝑥 − 𝜇1) 𝑇Σ−1(𝑥 − 𝜇1) 𝑒𝑥𝑝 (− )𝜋 measure how uncertain predictions are, which is very 2 1 √(2𝜋)𝑑|Σ| important for medical diagnostics. GPC is better for risk- 1 (𝑥 − 𝜇 𝑇 −1 2) Σ (𝑥 − 𝜇2) sensitive predictions like figuring out how likely someone = 𝑒𝑥𝑝 (− )𝜋 , √(2𝜋)𝑑|Σ| 2 2 is to have diabetes since it changes its complexity (𝑥 − 𝜇1) 𝑇Σ−1(𝑥 − 𝜇1) dependent on the input. This is different from fixed ⟹ 𝑒𝑥𝑝 (− )𝜋 2 1 (2) parametric models. (𝑥 − 𝜇 Given a set of N training input points, in typical 2) 𝑇Σ−1(𝑥 − 𝜇2) = 𝑒𝑥𝑝 (− )𝜋 2 2 , classification using Gaussian methods, procedure 𝑋 = (𝑎) 1 [𝑥1, … , 𝑥𝑁] 𝑇 and their associated class designations 𝑌 = ⇒ − (𝑥 − 𝜇 )𝑇Σ−1 2 1 (𝑥 − 𝜇1) + ln(𝜋1) [𝑌1, … , 𝑌𝑁] 𝑇 , one would like to forecast the class 1 participation percentage of a fresh test point 𝑥 = − (𝑥 − 𝜇2) 𝑇Σ−1(𝑥 − 𝜇 ×. This may 2) + ln (𝜋2 2) be accomplished by utilizing a latent function f, which is The simple logarithm of the equation's sides is found then mapped onto the [0; 1] interval utilizing the probit by (𝑎). The equation may be written as: operator. For binary classification, use the notion that y (𝑥 − 𝜇1) 𝑇Σ−1(𝑥 − 𝜇1) = (𝑥 𝑇 − belongs to {0,1}, where 1 displays the positive class and 0 𝜇𝑇)Σ−1(𝑥 − 𝜇1) = 𝑥 𝑇Σ−11 𝑥 − 𝑥𝑇Σ−1𝜇1 − displays the negative. Therefore, the likelihood of class 𝜇𝑇 ( Σ−11 𝑥 + 𝜇𝑇 − (𝑎) 1Σ 1𝜇1 𝑥𝑇Σ−1𝑥 + 𝜇𝑇 −1 3) membership 𝑝(𝑦 = 1|𝑥) might be expressed as Φ(f(x)), = 1Σ 𝜇1 − where Φ(. ) is the probit purpose. Gaussian procedure 2𝜇𝑇1Σ −1𝑥 classification is then performed by applying a GP prior to Where (𝑎) is because 𝑥𝑇Σ−1𝜇1 = 𝑥 𝑇Σ−1𝑥 since Σ−1 the latent function of 𝑓(𝑥). A GP [31] is a random is balanced and Σ−𝑇 = Σ−1. As a result, it is observed: procedure completely described by a mean function 1 1 − 𝑥𝑇Σ−1𝑥 − 𝜇𝑇Σ−1𝜇 + 𝜇𝑇 1 2 2 1 1 1Σ − 𝑥 + ln(𝜋1) 𝑚(𝑥) = 𝔼[𝑓(𝑥)] and a positive definite covariance method 𝕜(𝑥; ?́?) = 𝕧[𝑓(𝑥); 𝑓(?́?)]. To project an 1 1 (4) = 𝑥𝑇Σ−1𝑥 − 𝜇𝑇 −1 2 2Σ 𝜇2 + 𝜇 𝑇 −1 2Σ 𝑥 + ln (𝜋2) additional test point 𝑥× , first calculate the range of the 2 related latent variable 𝑓×. As an outcome of multiplying both sides of the equation by 2, the expression that follows is obtained: 𝑝(𝑓×|𝑥×, 𝑋, 𝑦) = ∫𝑝(𝑓×|𝑥×, 𝑋, 𝑓) 𝑝(𝑓|𝑋, 𝑦)𝑑𝑓 (9) 2(Σ−1(𝜇2 − 𝜇1)) 𝑇𝑥 + (𝜇1 − 𝜇2) 𝑇Σ−1(𝜇1 − 𝜇2) 𝜋 Where 𝑓 = [𝑓1, … , 𝑓𝑁] 𝑇, and then using this 2 + 2 ln ( ) = 0 (5) distribution, calculate the class participation distribution: 𝜋1 𝑝(𝑦× = 1|𝑥The equation of a line may be represented as 𝑎𝑇𝑥 + ×, 𝑋, 𝑦) (10) 𝑏 = 0. T As a result, if the Gaussian distributions of the 2 = ∫Φ(𝑓×) 𝑝(𝑓×|𝑥×, 𝑋, 𝑦)𝑑𝑓× classes are considered, and the covariance matrices are considered to be equal, a line displays the categorization choice border. This approach is called LDA because the 2.4 HGSO choice border between the 2 classes is linear. The The following subsection describes the motivation for expressions were relocated to the correct side, which HGSO, which depends on the act of Henry's law. related to the second class, to create Eq. (5). Therefore, if used 𝛿(𝑥): ℝ𝑑 → ℝ as the left-hand side calculation 2.4.1 Henry’s Law (function) in Eq. (6). In 1803, William Henry created Henry's Law, a gas law 𝑇 (𝑥) ∶= 2(Σ−1(𝜇2 − 𝜇1)) 𝑥 (6) [32]. Henry's law reads as follows: "At a temperature that remains constant, the amount of a given gas that dissolves Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 303 in a given type and volume of liquid is inversely related to 𝑗. The optimal gas for the entire colony is then determined the partial pressure that exists for that gas in equilibrium by rating the gasses. with that liquid." Consequently, Henry's law is greatly Step 4: Update Henry’s coefficient. dependent on temperature [33] and displays that a gas's Eq. (18), which updates Henry's factor, is as follows: solubility (𝑆𝑔) is directly proportional to its relative 𝐻𝑗(𝑡 + 1) = 𝐻𝑗(𝑡) pressure (𝑃𝑔), as represented in the subsequent equation: 1 1 𝑆𝑔 = 𝐻 × 𝑃𝑔 (11) × 𝑒𝑥𝑝 (−𝐶𝑗 × ( − )) , 𝑇(𝑡) (18) 𝑇(𝑡) 𝑇𝜃 Where 𝐻 is Henry's stable, which is particular to the = exp (−𝑡/𝑖𝑡𝑒𝑟) given gas-solvent mixture at a certain temperature, and 𝑇 displays the temperature, 𝑇𝜃 displays a constant 𝑃𝑔 is the gas's relative pressure. equal to 298.15, iter is the overall count of cycles, and 𝐻𝑗 𝑑𝑙𝑛𝐻 −∇𝑠𝑜𝑙𝐸 = (12) is Henry's factor for cluster 𝑗 in this equation. 𝑙(1/𝑇) 𝑅 Step 5: Update solubility. Furthermore, the impact of temperature dependency The following formula is used to modify the on Henry's law variables has to be addressed. The Van't solubility: Hoff equation describes how Henry's law constants vary 𝑆𝑖,𝑗(𝑡) = 𝐾 × 𝐻𝑗(𝑡 + 1) × 𝑃𝑖,𝑗(𝑡) (19) when a system's temperature varies: 𝑆𝑖,𝑗 is the soluble content of gas 𝑖 in cluster 𝑗, 𝑃𝑖,𝑗 is 𝐻(𝑇) = exp (𝐵/𝑇) × 𝐴 (13) the amount of partial pressure on gas 𝑖 in cluster 𝑗, and 𝐾 Where 𝐻 is an expression of 2 parameters, 𝐴 as well is a value that is constant. as 𝐵, which are the 2 factors that determine H's 𝑇 Step 6: Update position. dependency. In addition, one can generate a function The position was revised below: based on 𝐻 at the standard temperature 𝑇 = 298.15𝐾. 𝑋𝑖,𝑗(𝑡 + 1) = 𝑋𝑖,𝑗(𝑡) −∇𝑠𝑜𝑙𝐸 𝐻(𝑇) = 𝐻𝜃 × 𝑒𝑥𝑝 ( (1/𝑇 − 1/𝑇𝜃)) (14) +𝐹 × 𝑟 × 𝛾 × (𝑋𝑖,𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖,𝑗(𝑡)) 𝑅 +𝐹 × 𝑟 × 𝛼 × (𝑆𝑖,𝑗(𝑡) × 𝑋𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖,𝑗(𝑡)) (20) The Van't Hoff formula applies if −∇𝑠𝑜𝑙𝐸 is a stable, hence Eq. (14) may be rewritten as follows: 𝐹𝑏𝑒𝑠𝑡(𝑡) + 휀 𝛾 = 𝛽 × 𝑒𝑥𝑝 (− ) , 휀 = 0.05 𝐻(𝑇) = 𝑒𝑥𝑝(−𝑐 × (1/𝑇 − 1/𝑇𝜃) × 𝐻𝜃) (15) 𝐹𝑖,𝑗(𝑡) + 휀 Where 𝑋𝑖,𝑗 displays the location of gas 𝑖 in cluster 𝑗, 2.4.2 HGSO mathematical scheme and 𝑟 and 𝑡 are the random constant and cycle time, This part describes the mathematical formulas for the respectively. The best gas in cluster j is indicated by 𝑋𝑏𝑒𝑠𝑡 , suggested HGSO method. The mathematical procedures while the best gas in the entire swarm is shown by 𝑋𝑖,𝑏𝑒𝑠𝑡. are outlined below: In addition, 𝛾 displays gas 𝑗′𝑠 capacity to interact with Step 1: Initialization process. other gases in cluster 𝑖, 𝛼 displays the effect of other gases The count of gases (population size N) and the on gas i in cluster j and is equal to 1, and 𝛽 is a constant. placements of gases have been set up using the subsequent The fitness of gas i in cluster j is denoted by 𝐹𝑖,𝑗 , whereas equation: 𝐹𝑏𝑒𝑠𝑡 displays the fitness of the best gas in the overall 𝑋𝑖(𝑡 + 1) = 𝑋𝑚𝑖𝑛 + 𝑟 × (𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛) (16) system. 𝐹 is the flag that modifies the direction of the where t is the repetition time, 𝑋𝑚𝑖𝑛 and 𝑋𝑚𝑎𝑥 are the search agent and gives variety (±). 𝑋𝑖,𝑏𝑒𝑠𝑡and 𝑋𝑏𝑒𝑠𝑡 are issue bounds, 𝑟 is a random number between 0 and 1, and the 2 parameters that control the exploration and 𝑋𝑖 is the location of the ith gas in population 𝑁. The below exploitation capabilities. Particularly, 𝑋𝑖,𝑏𝑒𝑠𝑡 displays the equation is used to establish the count of gasses 𝑖, Henry's best gas 𝑖 in cluster 𝑗, whereas𝑋𝑏𝑒𝑠𝑡 displays the best gas constant of type 𝑗 (𝐻𝑗(𝑡)) partial pressure 𝑃𝑖,𝑗 of gas 𝑖 in in the whole swarm. cluster 𝑗, and −∇𝑠𝑜𝑙𝐸/𝑅steady value of type 𝑗 (𝐶𝑗). Step 7: Escape from local optimum. 𝐻𝑗(𝑡) = 𝑙1 × 𝑟𝑎𝑛𝑑(0,1), 𝑃𝑖,𝑗 The purpose of this phase is to leave the local (17) = 𝑙2 × 𝑟𝑎𝑛𝑑 (0,1), 𝐶𝑗 = 𝑙3 × 𝑟𝑎𝑛𝑑(0,1) optimum. The count of worst agents 𝑁𝑤 can be chosen and where 𝑙1, 𝑙2, and 𝑙3 are designated as constants with ranked using the following equation: corresponding amounts of 5𝐸 − 02, 100, and 1𝐸 − 02. 𝑁𝑤 = 𝑁 × (𝑟𝑎𝑛𝑑(𝑐2 − 𝑐1) + 𝑐1), 𝑐1 (21) Step 2: Clustering. = 0.1 𝑎𝑛𝑑 𝑐2 = 0.2 In proportion to the count of gas types, the entire The count of search agents is denoted by 𝑁. number of agents is split into equal clusters. Every cluster Step 8: Update the position of the worst agents. has the same Henry's constant measurement (𝐻𝑗) since 𝐺(𝑖,𝑗) = 𝐺min (𝑖,𝑗) + 𝑟 × (𝐺max (𝑖,𝑗) − 𝐺min (𝑖,𝑗)) (22) they all contain the same gases. In Eq. (22), 𝐺(𝑖,𝑗) displays gas 𝑖's position in cluster 𝑗, Step 3: Evaluation. 𝑟 is a random integer, and 𝐺min (𝑖,𝑗) and 𝐺max (𝑖,𝑗) represent The gas having the largest equilibrium state among the problem boundaries. The steps of the process are the others of its sort is identified by analyzing each cluster depicted in Fig. 2. 304 Informatica 49 (2025) 299–318 N. Zhang et al. Figure 2: The flowchart of the HGS. Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 305 as well as the lowest and highest levels of eligibility are 2.5 Chaos game optimization (CGO) connected. The basic idea of this mathematical model is to create The reasons behind the groundbreaking metaheuristic the general shape of a Sierpinski triangle by producing algorithm known as CGO and its computational several appropriate seeds inside the search area. In this architecture are covered in this section. way, fresh seeds are also produced via the Sierpinski triangle technique. An intermediate triangle with three 2.5.1 Mathematical model seeds is created as follows for each appropriate seed in the This section presents an optimization technique based on search field 𝑋𝑖: the ideas of chaos theory. The mathematical foundation of • Positioning of the previously identified Global the CGO algorithm is developed based on the basic Best (GB), concepts of fractals and chaotic games. The CGO • The average group's location (𝑀𝐺𝑖), algorithm considers several solution candidates (X) that • The 𝑖th resolution competitor (𝑋𝑖) is the chosen suggest certain able seeds within a Sierpinski triangle seed. because many natural evolution algorithms keep an array Although the mean values of randomly chosen of solutions that evolve through random modifications and eligible seeds with an equal chance of integrating the selections. Each solution candidate (𝑋𝑖) in this method currently regarded starting eligible seed (𝑋𝑖) are reflected 𝑗 contains a set of choice factors (𝑥𝑖 ) that represent where in the 𝑀𝐺𝑖, the GB is the best solution candidate with the the eligible seeds are located inside a Sierpinski triangle. highest eligibility levels. Together with the identified The enhancement scheme uses the Sierpinski triangle to eligible seed (𝑋𝑖), the GB and 𝑀𝐺𝑖 create a Sierpinski explore potential solutions. In the enhancement scheme, triangle. In order to generate some more seeds that can be the Sierpinski triangle is used to look for possible regarded as fresh eligible seeds for finishing the Sierpinski solutions. The quantitative treatment of these aspects is triangle, a temporary triangle is made inside the search given below: area for each of the first eligible seeds, as was previously 𝑋1 indicated. Four strategies are suggested to accomplish this 𝑋 2 aim. The 𝑖th permanent triangle (ith repetition) includes a 𝑋 = ⋮ Sierpinski triangle's three vertices [GB (green seed), 𝑀𝐺𝑖 𝑋𝑖 (red seed), and 𝑋𝑖 (blue seed)] in addition to the n ⋮ appropriate seeds that were accessible in the previous [𝑋𝑛] cycle. This homemade triangle uses the chaotic game 𝑥 1 2 𝑗 𝑑 1 𝑥1 … 𝑥1 … 𝑥 (23) 1 principle to produce fresh seeds using one die and three 𝑥12 𝑥2 𝑗 2 … 𝑥2 … 𝑥𝑑 seeds. 𝑋𝑖 is used to hold the first seed, GB for the second, 2 and 𝑀𝐺𝑖 for the third. For the first seed, a die with three = ⋮ ⋮ … ⋮ ⋱ ⋮ 𝑖 = 1,2, . . , 𝑛. , { green and three red faces was utilized. Upon rolling the 𝑥 1 𝑖 𝑥2 𝑗 𝑖 … 𝑥𝑖 … 𝑥𝑑 𝑖 = 1,2, … , 𝑑. 𝑖 dice, the seed in the 𝑋𝑖is shifted to the 𝑀𝐺𝑖 (red face) or ⋮ ⋮ … ⋮ ⋱ ⋮ the GB (green face) based on the resulting color. This [𝑥1𝑛 𝑥2𝑛 … 𝑥𝑖 𝑑 𝑛 … 𝑥𝑛 ] element is replicated using a random number generation For each seed in the Sierpinski triangle (search area), method that generates just 2 values, 0 as well as 1, the count of permissible seeds, or potential solutions, is n; enabling the choice of red or green faces. When the green and d is the seed's size. Random selection is used to face is visible, the 𝑋𝑖 seed advances in the direction of the determine where these appropriate seeds are initially GB; it moves toward the 𝑀𝐺𝑖. Even if each green or red placed in the search space. face has an equal chance of appearing in the game, the 𝑗 𝑥𝑖 (0) = 𝑥 𝑖 𝑚𝑖𝑛 + 𝑟𝑎𝑛𝑑 potential of getting two equivalent random integers for the 𝑗 𝑗 𝑖 = 1,2, … , 𝑛. (24) GB and the 𝑀𝐺𝑖 is also taken into account. The direction ∙ (𝑥𝑖,𝑚𝑎𝑥 − 𝑥𝑖,𝑚𝑖𝑛), { 𝑗 = 1,2,… , 𝑑. of the 𝑋𝑖 's seed advancement is a line segment that The beginning position of the eligible seeds is defined connects the GB with the 𝑀𝐺𝑖. The flow of seeds within 𝑗 𝑗 𝑗 by 𝑥𝑖 ; 𝑥𝑖,𝑚𝑎𝑥 as well as 𝑥𝑖,𝑚𝑖𝑛 indicate the maximum and the search area must be restricted because of the chaotic lowest permitted values for the ith solution candidate's jth game method; hence, this component is controlled by choice variable; rand is a random integer within the range certain at-random factorials that were created: 1 [0,1]. The way dynamical systems, often known as self- 𝑆𝑒𝑒𝑑𝑖 = 𝑋𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝐺𝐵 − 𝛾𝑖 ×𝑀𝐺𝑖), 𝑖 (25) similar and self-organizing systems, behave, as was = 1,2, … , 𝑛. previously described, and display specific fundamental 𝑋𝑖 displays the 𝑖𝑡ℎ resolution candidate, GB denotes patterns serves as the foundation for the core ideas of the global best discovered thus far, and 𝑀𝐺𝑖 displays the chaos theory. The fundamental dynamical system patterns mean of a few selected, qualified seeds. While 𝛽𝑖 and 𝛾𝑖 according to chaos theory are exhibited by eligible seeds, indicate a random integer between 0 and 1 to enable die which are acquired beginning positions. It is possible to rolling, 𝛼𝑖 is a randomly generated factorial to reflect seed ascertain whether these seeds are suitable to function as movement limitations. Three blue and three red-faced dice fundamental patterns (self-similarity) for an optimization are used for the next seed (GB). Either the 𝑀𝐺𝑖 (red face) issue by employing potential solutions (𝑋). The candidates or the 𝑋𝑖 (blue face) receives the seed in the GB, for the solutions with the greatest and worst fitness values depending on the color that emerges from rolling the dice. 306 Informatica 49 (2025) 299–318 N. Zhang et al. The model used in this section is the same as the original constructed. For the variables that violate the technique, a seed. If a blue face emerges, the seed travels to the 𝑋𝑖; if a 𝑗 boundary change is ordered if the 𝑥𝑖 is beyond the red face appears, the seed goes to the 𝑀𝐺𝑖. Another seed, parameter's range. The most repetitions that can be done like the first, can travel towards a location on the in which the optimization process takes place serves as the connecting lines between 𝑋𝑖and 𝑀𝐺𝑖 . This motion is basis for the termination criterion. restricted by randomly produced factorials. 𝑆𝑒𝑒𝑑21 = 𝐺𝐵 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 ×𝑀𝐺𝑖), 𝑖 (26) 2.6 Chef-Based Enhancement scheme = 1,2, … , 𝑛. (CBOA) where each of the variables 𝛽𝑖 and 𝛾𝑖 is a random value of 0 or 1 to simulate the option of rolling a die, and A metaheuristic method called CBOA was just introduced 𝛼𝑖 is the randomly generated factorial for characterizing by [34]. The CBOA's mathematical representation and the mobility limitations of the seeds. The remaining natural architecture are covered in this section. requirements are the same as those listed for the initial seed. The third seed is employed to roll a die with green 2.6.1 Mathematical model of CBOA and blue faces, 𝑀𝐺𝑖. The seed is directed toward either the Below is a presentation of the CBOA mathematical model 𝑋𝑖 (blue face) or the GB (green face) depending on the using the situation from Section 2.1. First, the color. An approach for generating random numbers is used initialization stage of the algorithm is initiated, much like to duplicate this element. It yields just 2 values, 0 and 1, in other metaheuristics. There are 2 populations as a result so that users may select between the blue or green faces. of the CBOA: elite agents and candidate solutions. Additionally, the lines connecting the 𝑋𝑖 and GB can be Therefore, as shown by Eq. (30), a matrix may be used to followed by the seed. Some random factorials are also represent the CBOA members. used to achieve this goal, such as: 𝑋1 𝑥1,1 … 𝑥1,𝑑𝑖𝑚 𝑆𝑒𝑒𝑑31 = 𝑀𝐺𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝐺𝐵), 𝑖 𝑋 = [ ⋮ ] = [ ⋱ ] (27) (30) = 1,2, … , 𝑛. 𝑋𝑁 𝑥 𝑁×1 𝑁,𝑑𝑖𝑚 … 𝑥𝑁,𝑑𝑖𝑚 𝑁×𝑑𝑖𝑚 In order to generate the fourth seed, an additional where 𝑁 is the population size, dim is the issue length method is employed to carry out the modification stage in (𝑎 ∈ [1, 𝑁], 𝑏 ∈ [1, 𝑑𝑖𝑚]), 𝑋 is the CBOA population the qualifying seeds' position updates within the search matrix, and 𝑥𝑎,𝑏 indicates the value of the bth problem area. Changes in this seed's position are made depending parameter for the ath CBOA member. CBOA members' on arbitrary adjustments made to the randomly chosen locations are established using Eq. (31): decision criteria. Eq. (28) depicts a schematic depiction of 𝑥𝑎,𝑏 = 𝐿𝑂𝑊𝑏 + 𝑟𝑎𝑛𝑑 ∙ (𝑈𝑃𝑏 − 𝐿𝑂𝑊𝑏) (31) the specified procedure for the 4th seed; it has the Where rand is an arbitrary number in the range of [0, following mathematical representation: 1], 𝐿𝑂𝑊𝑏 and 𝑈𝑃𝑏 are the lower and upper limits of the 𝑆𝑒𝑒𝑑4 = 𝑋 𝑘 𝑖 𝑖(𝑥𝑖 = 𝑥 𝑘 𝑖 + 𝑅), 𝑘 = [1,2, … , 𝑑]. (28) 𝑏𝑡ℎ problem factor, correspondingly. Each member's goal Where 𝑘 is an integer at random in the interval [1, 𝑑] function may be determined and expressed as a vector and 𝑅 is a random number with uniform distribution in the according to Eq. (32): region [0, 1]. Four formulations for 𝛼𝑖, which controls the 𝐹𝑖𝑡𝑥𝑋1 mobility limitations of the seeds, are provided in order to 𝐹𝑖𝑡 = [ ⋮ ] (32) alter the exploration and exploitation rate of the CGO 𝐹𝑖𝑡𝑋𝑁 𝑁×1 algorithm. 𝐹𝑖𝑡 symbolizes the values of objective functions, 𝑅𝑎𝑛𝑑 whereas 𝐹𝑖𝑡𝑋𝑎 displays the value of a member. The 2 × 𝑅𝑎𝑛𝑑 𝛼𝑖 = { (𝛿 × 𝑅𝑎𝑛𝑑) + 1 (29) objective function's value is used as the selection criteria for selecting the best candidate solution. The optimal (휀 × 𝑅𝑎𝑛𝑑) + (~휀) member of the population and potential solution is the one In this case, δ as well as ε are indeterminate numbers that has the highest value for the objective function. It's in the interval [0,1], and Rand is a randomly dispersed, time to complete the CBOA's processing steps after the equally distributed number in that interval. Given the self- algorithm has been launched. The CBOA is composed of similarity problems in the fractals, the eligibility of the two demographic groups: elite agents and candidate new and existing seeds should be jointly assessed to solutions. These two groups' update procedures are decide if the additional seeds ought to be included in the different. Its elements are changed at each cycle, and the search space's overall count of eligible seeds. The best values of the aim function are computed and evaluated. As new solution candidates are retained after being vetted; a result, the best member is changed after each repetition. seeds with the lowest fitness values, or the lowest degrees Upon comparing the values of the objective function, elite of self-similarity, are removed. It is important to note that agents are selected from among the CBOA members with the mathematical method reduces the mathematical the highest values. The values of the goal function are used model's complexity by using substitution. Actually, the to sort the population matrix in decreasing order. entire form of the Sierpinski triangle has been completed using all of the qualifying seeds found in the search region. 𝑗 To cope with the solution variables 𝑥𝑖 breaching the boundaries of the factors, a mathematical flag is Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 307 𝑆𝑋1 get the aim function's ideal value. This updating technique ⋮ is beneficial since every person searches for better 𝑆𝑋 = 𝑆𝑋 𝑁𝐶 opportunities in the vicinity, independent of the location ⋮ of other community members. This idea is to use Eqs. (37) [ 𝑆𝑋𝑁 ]𝑁×1 to (38) to produce a random position around each culinary 𝑠𝑥1,1 𝑠𝑥1,𝑑𝑖𝑚 (33) instructor in the search space for each issue variable 𝑏 ∈ ⋮ ⋮ [1, 𝑑𝑖𝑚]. If this random site increases the goal function's 𝑠𝑥𝑁𝐶,1 𝑠𝑥𝑁𝐶,𝑑𝑖𝑚 = value, it can be updated. Eqs. (39) to (40) are used to 𝑠𝑥 … 𝑁𝐶+1,1 𝑠𝑥 𝑁𝐶+1,𝑑𝑖𝑚 model this scenario. ⋮ ⋮ (𝑙𝑜𝑐𝑎) (𝑙𝑜𝑐𝑎𝑙) 𝐿𝑂𝑊 [ 𝑠𝑥𝑁,1 𝑠𝑥𝑁,𝑑𝑖𝑚 ] 𝑏 = 𝐿𝑂𝑊𝑏 /𝑖𝑡𝑒𝑟 (37) 𝑁×𝑑𝑖𝑚 (𝑙𝑜𝑐𝑎𝑙) 𝑆𝐹𝑖𝑡𝑋 𝑈𝑃 38) 1 𝑏 /𝑖𝑡𝑒𝑟 ( ⋮ (𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙) Here, 𝐿𝑂𝑊 𝑏 and 𝑈𝑃𝑏 show the local 𝑆𝐹𝑖𝑡𝑋 𝑁𝐶 𝑆𝐹𝑖𝑡 = boundaries of the 𝑏𝑡ℎ issue variable, where 𝑖𝑡𝑒𝑟 is a 34) 𝑆𝐹𝑖𝑡𝑋 ( 𝑁𝐶+1 parameter for repetition. ⋮ ( 𝑠𝑥 𝐶𝑆𝑆 ) ( [ 𝑆𝐹𝑖𝑡𝑋 𝑎,𝑏 = 𝑠𝑥 𝑊 𝑙𝑜𝑐𝑎𝑙) 𝑎,𝑏 + 𝐿𝑂 𝑏 𝑁 ] 𝑁×1 ( ) ( Where NC is the count of chef instructors, 𝑆𝑋 denotes +𝑟𝑎𝑛𝑑. (𝑈𝑃 𝑙𝑜𝑐𝑎𝑙 𝑐𝑎𝑙) (39) 𝑏 − 𝐿𝑂𝑊 𝑙𝑜 𝑏 ), 𝑗 − 1, 𝑁𝐶, 𝐽 the sorted demographic matrix, and SFit displays the = 1,… , 𝑑𝑖𝑚𝑚 (𝐶𝑆𝑆) (𝐶𝑆𝑆) ascending objective function value vector. Following that, 𝑆𝑋 , 𝑆𝐹𝑖𝑡 𝑆𝑋 𝑎 𝑎 𝑎 = { < 𝐹𝑖𝑡𝑎 (40) changes will be made in 2 steps for each group, from 1 to 𝑆𝑋𝑎 , 𝑒𝑙𝑠𝑒 𝑁𝐶 and 𝑁𝐶 + 1 𝑡𝑜 𝑁. 𝑁𝐶 has started to represent one- (𝐶𝑆𝑆) 𝑆𝑋𝑎 is the new location for the ath-ranked fifth of the entire population in the first group division. For membership according to the chef's next strategy called instance, 𝑁𝐶 = 6 if there are 30 populations in the (𝐶𝑆𝑆) (𝐶𝑆𝑆) 𝐶 𝑆𝑆, 𝑠𝑥 beginning. All cycles or the end of the epochs result in the 𝑎,𝑏 displays its 𝑏𝑡ℎ manage, and 𝑆𝐹𝑖𝑡𝑎 is the goal variable value. availability of a single chef. Step 2- candidate solutions ' updates As per the Step 1- Updating for chef instructors: CBOA, candidate solutions pursuing culinary arts use Chef instructors use the two best chef instructors' these three methods to enhance their cooking abilities: strategies to hone their culinary skills. At first, they try to A chef trains each student, randomly assigning them acquire chef educator methods by imitating the best elite to a class. This method has the benefit of having a chef agent. This plan describes the global exploration and mentor the pupils, which helps them acquire new skills. It capabilities of the CBOA. The primary benefit of this alludes to users who have moved to the other search zone upgrade is that before instructing candidate solutions, chef in the technique. If the best chef instructor teaches pupils, educators may test their skills against the best chefs. This on the other hand, there won't be a worldwide search since method allows for the upgrading of candidate solutions, there will be a computational bias in favor of the best. The not only the most gifted individuals. By doing this, it guidance and training of the elite agent determine each prevents the algorithm from being stuck in the local culinary student's new role. This situation is expressed in optimum and promotes more precise and effective Eq. (41). scanning over the many search space regions. In this ( example, freshly established cooking teacher posts are 𝑠𝑥 𝑆𝐹𝑆 ) 𝑎,𝑏 = 𝑠𝑥𝑎,𝑏 + 𝑟𝑎𝑛𝑑 (41) filled using Eq. (35). ∙ (𝐶𝐼𝑅 − 𝐼𝑛𝑑 ∙ 𝑠𝑥 𝑎,𝑏 𝑎,𝑏) 𝑠𝑥 (𝐶𝐹𝑆) 𝑎,𝑏 = 𝑠𝑥𝑎,𝑏 + 𝑟𝑎𝑛𝑑 Based on the learner's initial strategy, known as SFS, (35) ∙ (𝐵𝑒𝑠𝑡𝐶𝑏 − 𝐼𝑛𝑑 ∙ 𝑠𝑥𝑎,𝑏) the updated position for the 𝑎th-sorted member is 𝑠𝑥 𝐶 ( ) 𝑎,𝑏 𝐹 𝑆 specifies the first strategy for switching expressed as 𝑠𝑥 𝑆𝐹𝑆𝑎,𝑏 , where 𝐶𝐼𝑅 is the elite agent and 𝑅 𝑎,𝑏 chef instructors, and 𝐶𝐹𝑆 indicates the new role for the is an arbitrary index in the interval [0, 𝑁𝐶]. New locations ath-ordered member in the bth manage. The best chef are found using Eq. (42). instructor in the bth coordinate, or 𝑆𝑋1 in the 𝑆𝑋 matrix, is 𝑆𝑋𝑆𝐹𝑆 (𝑆𝐹𝑆) , 𝑆𝐹𝑖𝑡 represented by 𝐵𝑒𝑠𝑡𝐶𝑏. I nd is a randomly chosen number 𝑆𝑋 𝑎 𝑎 𝑎 = { < 𝐹𝑖𝑡𝑎 (42) 𝑆𝑋 from the set {1,2}, and rand is an arbitrary number in the 𝑎, 𝑒𝑙𝑠𝑒 (𝑆𝐹𝑆) interval [0,1]. Eq. (36) is used to determine this condition: 𝑆𝐹𝑖𝑡𝑎 is the ultimate value for SFS. (𝐶𝐹𝑆) (𝐶𝐹𝑆) 𝑆𝑋 𝑖𝑡 The CBOA's technique involves treating every factor 𝑆𝑋 = { 𝑎 , 𝑆𝐹 𝑎 < 𝐹𝑖𝑡𝑎 𝑎 (36) as a skill. Each student learns and mimics one of the chef 𝑆𝑋𝑎, 𝑒𝑙𝑠𝑒 (𝐶𝐹𝑆) instructor's skills. 𝐴𝑛 instructor chosen at random from the In this equation, 𝑆𝐹𝑖𝑡𝑎 displays the objective collection 𝐶𝐼𝑅 is used (𝑅 is selected from [1, 𝑁𝐶]). This is (𝐶𝐹𝑆) function of 𝑆𝑋𝑎 , and Fita is the fitness function ath comparable to changing just one variable instead of every member. Based on the second method, each culinary possible answer in terms of algorithms. This enhances teacher strives to develop their abilities via individual global exploration and search. In order to recreate this practice. This method intends to increase CBOA's situation, the first lead instructor, represented by the exploitation capabilities and local search. Every elite 𝐶𝐼𝑅 vector, is randomly selected for each culinary learner 𝑎 agents culinary expertise identifies the factors needed to 𝑠𝑥𝑎 (a CBOA member selected at random from Ra's index 308 Informatica 49 (2025) 299–318 N. Zhang et al. from [1, 𝑁𝐶]). To represent a talent of the selected head also allowing the algorithm to find more practical answers instructor, the cth coordinate of the vector of 𝑠𝑥𝑎, the that are closer to previously discovered solutions. When culinary pupil, is picked at random from [1, 𝑑𝑖𝑚]. 𝐶𝐼𝑅 is every obstacle is viewed as a skill, kids will work to 𝑐 this value. In this case, Eq. (43) may be used to calculate improve these skills in order to become more fit. Thus, Eq. the new location: (45) is used to find new locations. The selection of HGSO, CGO, and CBOA stems from (𝑆𝑆𝑆) 𝐶𝐼𝑅 , 𝑏 = 𝑐 𝑠𝑥𝑎, = { 𝑎,𝑏 𝑏 (43) 𝑠𝑥 their distinct abilities to enhance exploration and 𝑎,𝑏 , 𝑒𝑙𝑠𝑒 exploitation during model optimization critical in high- where 𝑏 is the problem size ([1, 𝑑𝑖𝑚]), 𝑎 matches the dimensional, nonlinear domains like diabetes prediction. population and takes a value in the range of [𝑁𝐶 + HGSO draws on thermodynamic principles to escape local 1, 𝑁𝐶 + 𝑁], c is a random integer selected from optima, improving convergence reliability. CGO [1, 𝑑𝑖𝑚], and SSS is the student's next strategy. leverages fractal-inspired chaotic dynamics, offering Consequently, the location update is established using Eq. effective global search in complex spaces. CBOA mimics (44). human learning strategies to balance global and local (𝑆𝑆𝑆) (𝑆𝑆𝑆) 𝑆𝑋𝑎 , 𝐹𝑖𝑡𝑆𝑎 < 𝐹𝑖𝑡 𝑆𝑋𝑎,𝑏 = { 𝑎 (44) refinement. While these optimizers are general-purpose, 𝑆𝑋𝑎 , 𝑒𝑙𝑠𝑒 their adaptability makes them suitable for fine-tuning (𝑆𝑆𝑆) 𝑆𝑋𝑖 relates to the new position of 𝑎𝑡ℎ ranked model parameters in sensitive health-related tasks. These member based on 𝑆𝑆𝑆. schemes were integrated to boost classification Using one of the two last methods, personal activities performance beyond what standalone models achieve. or research, each culinary student aims to grow personally. Although formal ablation studies were not conducted here, This is the algorithm's exploitation stage. The benefit of the comparative evaluation highlights clear improvements this approach is that it makes local search stronger while in predictive metrics, justifying their inclusion. (𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙) (𝑆𝑇𝑆) 𝑠𝑥 𝑂 = { 𝑎,𝑏 + 𝐿 𝑊𝑏 + 𝑟𝑎𝑛𝑑 ∙ (𝑈𝑃𝑏 − 𝐿𝑂𝑊 𝑠𝑥 𝑏 ) 𝑎,𝑏 (45) 𝑠𝑥𝑎,𝑏 , 𝑒𝑙𝑠𝑒 where 𝑟 dim is a random number chosen where in the further analysis the sign TP designates (𝑆𝑇𝑆) from [1, 𝑑𝑖𝑚] and 𝑠𝑥𝑎,𝑏 displays the updated calculated the case of a positive forecast of the good luck, FP - the state of the ath member based on the student's third abbreviation of fall positive - is used in the case when the strategy (𝑆𝑇𝑆). Eq. (46) displays the changes: outcome of a case is bad. In the case when the forecast is (𝑆𝑇𝑆) (𝑆𝑇𝑆) negative and the real result is really negative TN gives the 𝑆𝑋𝑎 , 𝐹𝑖𝑡𝑆 𝑆𝑋 𝑎 < 𝐹𝑖𝑡 𝑎,𝑏 = { 𝑎 (46) same result. The FN means a bad forecast when the real 𝑠𝑥𝑎,𝑏 , 𝑒𝑙𝑠𝑒 result is good. (𝑆𝑇𝑆) Fit 𝑆𝑋𝑎 displays the desired function value of (𝑆𝑇𝑆) 𝑆𝑋𝑎 as 𝑆𝑇𝑆. Culinary learners and elite agents discuss 3 Result and discussion 𝐶𝐵𝑂𝐴 tactics. The results obtained from these hybrid schemes are represented comprehensively with various graphs and 2.7 Performance evaluator tables. These tools systematically compare and contrast A variety of indicators are utilized to assess classifier each model's performance for an in-depth assessment of performance. The term "accuracy" refers to the proportion the functions of each model. From a careful study of the of accurately predicted observations. Three commonly results represented in the graphs and tables, insightful used metrics are recall, accuracy, and precision. Total analysis is performed to identify the best model that accuracy, which encompasses both real negatives and performs well in terms of predictive accuracy and positives, is referred to as accuracy. Unbalanced datasets suitability for the prediction process. Moreover, this can lower accuracy. Recall finds only positives and review also points out schemes with flaws or limits, assumes minimal mistakes. The F1 score is helpful in adding a critical perspective to the work, especially in schools with different distributions since it balances respect of their applicability to real-life scenarios. This recollection and accuracy. It can handle both false strong assessment methodology allows researchers to negatives and real positives. These measures assist in make informed decisions on model selection and estimating the efficacy of ML schemes. optimization for prediction tasks, helping to advance not TP + TN only the science but also practical applications behind Accuracy = (47) TP + TN + FP + FN predictive modeling. TP Precision = (48) TP + FP 3.1 Convergence curve TP TP Recall = TPR = = (49) The convergence curve has a significant influence on P TP + FN 2 × Recall × Precision prediction processes since it displays the rate at which a F1 score = (50) scheme learns. A steep slope in the convergence curve Recall + Precision displays that convergence happens fast, and hence, the Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 309 model quickly learns the pattern and forecasts stabilize. In iterations, revealing learning stability and showing which contrast, a shallow curve indicates slower convergence, schemes reach optimal accuracy most efficiently during which means the model takes longer to comprehend training. It can be seen from this figure that, among the patterns, and hence, the predictions are highly LDCB, LDCG, and LDHG schemes, the LDCG model, unpredictable throughout training. This helps to which has reached an accuracy of 0.930, has been understand this curve for optimizing the training tactics outperformed by the LDCB model with 0.968 accuracy, and finding a balance between underestimating and whereas its accuracy is higher than that of the LDHG overfitting. The suggestions made include those of model, which stands at 0.921. Similarly, among the learning rate changes, batch size changes, and model GPHG, GPCG, and GPCB schemes, the GPHG schemes topology for best prediction performance with no showed an accuracy of 0.942, proving that their accuracy convergence or wasted time in unnecessary training. The is the lowest compared to the GPCG model, which had an convergence curve in Fig. 3 illustrates and compares the accuracy of 0.960, and the GPCB model, which had an results of the hybrid schemes presented. Fig. 3 displays the accuracy of 0.980. Their optimal condition was achieved convergence behavior of each hybrid model across after 60 cycles. Figure 3: 3D The convergence curve for the 3 schemes outperforms the precision value of the LDCG model, 3.2 Schemes comparison which stands at 0.935, during the training phase. Upon comparing the outcomes of the schemes during Table 1 displays the outcomes of both the LDR and GPC the testing phase, it becomes apparent that the recall value schemes, as well as their respective hybrid forms in of the hybrid forms of GPC schemes exceeds that of the different phases. Table 1 summarizes the accuracy, hybrid form of the LDR model. Specifically, during the precision, recall, and F1-scores of all models during testing phase, it is evident that LDCG, with a recall value training, testing, and overall phases, enabling side-by-side of 0.922, demonstrates weaker functionality than GPCG, evaluation of classifier performance. In the training phase, which achieves a recall value of 0.957. However, it becomes apparent that the functionality of the LDR following the LDCB model with a recall value of 0.961, model, boasting an accuracy of 0.916, falls short than the LDCG model boasts the highest value among its group another base model, GPC, achieving 0.937 accuracy in the members. Conversely, GPCG, with a recall value of 0.957, same phase. Similarly, its hybrid counterpart, the LDHG signifies that its performance surpasses that of the GPHG model, with an accuracy of 0.926, also lags behind the and GPC schemes, which have recall values of 0.935 and GPHG model with 0.946 accuracy. Furthermore, the 0.909, in that order, although it does not outperform precision value of the GPCG model, reaching 0.963, GPCB, with a recall value of 0.978, during the testing phase. Table 1: The outcome of the showcased developed schemes Metric values Section Model Accuracy Precision Recall F1-score LDR 0.916 0.917 0.916 0.917 LDHG 0.926 0.925 0.926 0.925 Train LDCG 0.935 0.935 0.935 0.935 LDCB 0.972 0.972 0.972 0.972 310 Informatica 49 (2025) 299–318 N. Zhang et al. GPC 0.937 0.937 0.937 0.937 GPHG 0.946 0.947 0.946 0.946 GPCG 0.963 0.963 0.963 0.963 GPCB 0.981 0.981 0.981 0.981 LDR 0.874 0.876 0.874 0.875 LDHG 0.913 0.913 0.913 0.913 LDCG 0.922 0.921 0.922 0.921 LDCB 0.961 0.961 0.961 0.961 Test GPC 0.909 0.914 0.909 0.910 GPHG 0.935 0.937 0.935 0.936 GPCG 0.957 0.961 0.957 0.957 GPCB 0.978 0.979 0.978 0.978 LDR 0.904 0.905 0.904 0.904 LDHG 0.922 0.922 0.922 0.922 LDCG 0.931 0.931 0.931 0.931 LDCB 0.969 0.969 0.969 0.969 All GPC 0.928 0.929 0.928 0.929 GPHG 0.943 0.944 0.943 0.943 GPCG 0.961 0.962 0.961 0.961 GPCB 0.980 0.981 0.980 0.980 The 3D wall plot of Fig. 4 visualizes model accuracy between its measures, which remain around 0.928 with comparison across three different phases, namely regard to both accuracy and recall, demonstrating an Training, Testing, and All. By taking into account the overall robust behavior in performance. In sharp contrast, performances for all the phases of three schemes, a the LDHG model displays very consistent results in all number of thrilling trends can be found out. First and four metrics, reaching a stable performance of 0.922 in all, foremost, during the All phase, the LDR model performed reflecting a balanced performance considering different best among them with a marvelous score of its precision evaluation standards. In contrast, the GPHG model has metric 0.905, which really exhibits the competency of this strengths and weaknesses mixed up on the metrics. model with a touch towards precision. With that said, GPC Although it has a very commendable score in the precision outcompetes all its contenders during the same stage with metric of 0.944, the value is low in other metrics, having outstanding precision and F1 score records at an 0.943 for accuracy, recall, and F1 score, showing its astonishing 0.929, while it preserves high consistency relative weakness in those aspects. Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 311 Figure 4: 3D Walls-plot for the performance of the schemes across phases Table 2 presents a comparison of the functional conditions. For instance, the LDR model showcases an performance of schemes under both healthy and diabetes accuracy of 0.93 under healthy conditions, aligning with 312 Informatica 49 (2025) 299–318 N. Zhang et al. the precision value of the LDHG model. However, the Nevertheless, the hybrid forms of the GPC model LDCB model emerges as the top performer with a showcase superior functionality in contrast to the LDA precision value of 0.97, indicating its superiority over the scheme and its variants. LDCG model, which achieves a precision value of 0.94, Furthermore, under diabetes conditions, the LDCB as well as other preceding schemes. Among the hybrid model exhibits a higher recall value of 0.95, surpassing the versions of the GPC model, the GPCB and GPCG recall values of the LDCG, LDHG, and LDA schemes, schemes emerge with the highest accuracy under healthy which stand at 0.90 and 0.88, in that order. Moreover, the conditions, boasting precision values of 0.99 and 0.98, recall value of the LDCB model exceeds that of the GPC respectively. Following closely, the GPHG model and GPHG schemes, which are 0.91 and 0.94, achieves a precision value of 0.97, while the GPC model respectively. However, it falls short of surpassing the records a precision value of 0.95, indicating slightly recall values of the GPCG and GPCB schemes, which are weaker functionality compared to the former schemes. 0.96 and 0.98, respectively. Table 2: Categorization of assessment criteria for the performance of the developed schemes Metric Model Condition values LDR LDHG LDCG LDCB GPC GPHG GPCG GPCB Healthy 0.93 0.93 0.94 0.97 0.95 0.97 0.98 0.99 Precision Diabetes 0.85 0.90 0.91 0.96 0.88 0.90 0.93 0.97 Healthy 0.92 0.95 0.95 0.98 0.94 0.94 0.96 0.98 Recall Diabetes 0.88 0.88 0.90 0.95 0.91 0.94 0.96 0.98 Healthy 0.93 0.94 0.95 0.98 0.94 0.96 0.97 0.98 F1-score Diabetes 0.86 0.89 0.90 0.95 0.90 0.92 0.95 0.97 The column line symbol plot in Fig. 5 provides a values. Conversely, under the healthy condition, both comparison between the values recorded in both healthy GPC and GPHG schemes achieve 468 and 471 out of 500 and diabetic situations and the values predicted by the measured values, respectively, indicating lower accuracy schemes. Under the diabetes condition, it is evident that compared to the GPCG and GPCB schemes, which the LDCB model, with 254 out of 268 measured values, achieve 480 and 491 out of 500 measured values, demonstrates higher accuracy than the LDCG model, respectively. Besides, the GPCG and GPHG schemes which achieves 240 out of 267 measured values. attain values of 258/268 and 253/268, respectively, under Similarly, the base model, LDR, performs better with 236 the diabetes condition, which indicates moderate out of 268 measured values compared to the LDHG performance by the GPCB model, with attained values of model, which also achieves 236 out of 268 measured 262/268, and the GPC model, at 245/268. Figure 5: Column-line symbol plot to represent the difference among the schemes To avoid overfitting, the model's performance was performance in all phases give us an idea of how strong checked at three different phases: training, testing, and they are. In the future, we will use cross-validation and overall. Also, the fact that the training and testing explicit regularization approaches to better control measures show the same patterns means that the model is overfitting and make the model more generalizable. generalizing instead of overfitting. Even though there wasn't a formal validation set, the hybrid schemes' Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 313 The ROC is a measure that fundamentally depends on the suggested schemes are carefully analyzed with the how well binary classifiers work. It compares the false help of the ROC curve, which is a perfect inseparable tool positive rate (1-specificity) to the true positive rate used to analyze the performance of the classifier. It is (sensitivity) at various thresholds. This graph conveys observed, upon detailed analysis, that GPCB and GPCG useful information about the capability of the classifier to are ahead of their competitors in reaching a TPR value of differentiate classes in all possible threshold settings. The 1.0 at an earlier stage and hence delivers exceptional ROC is a tool that actually enables the researchers to study performance in classification problems. After that, LDCB the compromise between true positives and false positives, and GPHG come very close as the second and third thus giving a complete view of the efficiency of the schemes, reaching a TPR of 1.0 just a little later but with classifier. Besides, the ROC's AUC gives a quantitative a sharp increase, further establishing their effectiveness. measure of the discriminatory power of a classifier, where In sharp contrast, the LDR model lags far behind its larger AUC means better performance. Also, the ROC plot counterparts since its vector has the gentlest slope among allows for better selection of the optimal cut-off value to the compared schemes. Nevertheless, the LDR model classify the samples according to the needs of the specific eventually attains 1.0 TPR but takes its time in comparison application, considering sensitivity and specificity to get with the others. The above analysis displays how different the same result desired. Therefore, the ROC curve schemes may perform to the extent and also how often the displays a very important means for testing, comparing, ROC curve proves useful for making subtle choices and fine-tuning binary classification schemes, thus regarding classifier behavior, which might not be contributing to enhanced ML model predictive power in a immediately apparent in other forms, and helps drive slew of applications. Moreover, in Fig. 6, the outcomes of better decisions for predictive modeling tasks. Figure 6: ROC curves depict the performance of the most efficient hybrid schemes The SHAP additive explanations in Fig. 7 depict the nutrition, regular physical activity, and medication is effects of various factors such as glucose or BMI considered a significant approach to diabetic indicators that influence the possibility of diabetes. The prevention and management. BMI, which is following explanation succinctly defines the effects of determined using weight and height measures, is such factors on the occurrence of diabetes. another widely accepted indicator of body fatness • High levels of blood glucose, normally due to associated with the risk of developing diabetes. excessive consumption of sugar or reduced action of • A high BMI means excess adipose tissue interferes insulin, may eventually lead to the development of with insulin action, apart from increasing the diabetes. Blood glucose that remains high over a inflammatory component, leading to insulin continuous period places a load on the pancreas resistance and impaired glucose tolerance. The secreting insulin, and, with time, may make it lose its underlying fat also secretes hormones and cytokines, efficiency. This can result in insulin resistance-a further dampening metabolic processes and condition whereby cells become unable to efficiently increasing diabetes risk. Also, a higher BMI is more act in response to insulin signals, causing more often than not associated with other risk factors like accumulation of glucose. Besides, high levels of sedentary lifestyle and lousy food, increasing the glucose can cause the damage of blood vessels and chances of diabetes. By enhancing insulin sensitivity neurons, which raise the risk of complication and overall metabolic health, dietary and activity development in diabetic patients. Hence, keeping changes that control body mass index (BMI) can blood glucose within the norm through proper lower the risk of diabetes. Therefore, maintaining a 314 Informatica 49 (2025) 299–318 N. Zhang et al. healthy BMI is crucial for both preventing and treating diabetes. Figure 7: The sensitivity analysis results Table 3 provides the results of a 5-fold cross-validation strong generalization and low variance. In contrast, the for the GPC and LDR models, assessing their stability and LDR model shows slightly lower accuracy across all folds, generalization across different subsets of the dataset. Each with values ranging from 0.887 to 0.904. The results fold (K1 to K5) represents an independent split where the clearly suggest that GPC outperforms LDR not only in model was trained on 80% of the data and tested on the individual experiments but also in terms of cross-validated remaining 20%. The GPC model demonstrates reliability. These findings reinforce the robustness of GPC consistently high performance across all folds, with for diabetes prediction tasks under varying training-test accuracy values ranging from 0.916 to 0.928, indicating partitions. Table 3: K-fold cross validation. K Fold Number Models K1 K2 K3 K4 K5 GPC 0.920 0.927 0.924 0.916 0.928 LDR 0.887 0.895 0.901 0.896 0.904 Table 4 presents the results of the Wilcoxon signed- significant result with a p-value of 0.0679, while others rank test conducted to compare the performance such as GPC-CBOA and LDR-based hybrids did not show differences between baseline classifiers and their hybrid statistically significant improvements, as their p-values optimized variants. The test evaluates whether observed exceeded 0.1. The stat column represents the test statistic differences in classification performance are statistically for ranking the difference between paired models. These significant. A lower p-value (typically < 0.05) indicates a findings validate that only specific optimizer integrations statistically meaningful improvement. Among the models, particularly with GPC deliver meaningful predictive the GPCHG scheme achieved a p-value of 0.0348, advantages, supporting the selective use of metaheuristics indicating a statistically significant enhancement over the in medical classification contexts like Type 2 diabetes base GPC model. Similarly, GPCG produced a marginally prediction. Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 315 Table 4: Wilcoxon test. Models stat P value GPC 644 2.25E-01 GPC Henry gas solubility optimization 338 3.48E-02 GPC chaos game Optimization 155 6.79E-02 GPC Chef-Based Optimization Algorithm 48 4.39E-01 LDR 1200 2.45E-01 LDR-Henry gas solubility optimization 824 4.39E-01 LDR-chaos game Optimization 675 6.80E-01 LDR-Chef-Based Optimization Algorithm 125 4.14E-01 GPC 644 2.25E-01 GPC-Henry gas solubility optimization 338 3.48E-02 GPC-chaos game Optimization 155 6.79E-02 GPC-Chef-Based Optimization Algorithm 48 4.39E-01 4 Conclusion • Limitations: There are several drawbacks to projection using ML The various advantages of early detection of diabetes by techniques. The most critical problem of overfitting that using ML are: it enables early interference, thus most schemes biased the training data and gather noise preventing the development of complications such as rather than underlying patterns, which is poor in cardiovascular diseases and neuropathy; ML algorithms generalization in unknown data. When the schemes are sift through enormous volumes of data to spot patterns that relatively simple to represent the complexity of the data, are so subtle they could indicate diabetes risk, hence underfitting happens with poor accuracy in the forecast. improving their accuracy. This will, therefore, be enabling Biases in training data can persist in ML schemes, leading personalized treatment plans for better patient care. Also, to biased forecasts, especially in sensitive domains like automating diagnostics cuts down the healthcare costs and healthcare and criminal justice. Furthermore, ML workload for medical staff. In a nutshell, ML aims at early algorithms need big, high-quality datasets for training, diabetes detection, providing an improvement for patient which are not always available, especially in specialist outcomes through easy healthcare access, thus adopting a sectors or when dealing with sensitive data. The dynamic proactive stance towards the disease's management. nature of real-world data makes it challenging to sustain However, this work aims to project diabetes using ML model correctness over time; hence, regular monitoring schemes comprising GPC and LDA, coupled with 3 and updating become necessary. To solve these optimizers: Henry Gass Solubility Optimization, Chef limitations, several methods have been tried to reduce Base Enhancement Algorithm, and Chaos Game overfitting, such as regularization; feature engineering to Optimization. With the view of improving the accuracy of make the schemes perform better; and algorithms that are the prediction, it was decided to couple the schemes with fair-aware to reduce biases. All of the above can be further the optimizers. These results mean that the model GPC improved by enhancing openness and interpretability of and its hybrid forms provide better performance than the schemes, thus building trust and enabling their adoption in LDA scheme and its hybrids. Comparing results in GPC, applications of importance. This calls for more research GPHG, GPCG, and GPCB, for instance, out of these, the and development on these issues so that the MLC forecasts best result was from the GPCB model in the "All" phase, become increasingly accurate and dependable. with an accuracy value of 0.980. In that respect, the GPCG model stands out as the second-best model with an accuracy of 0.961, while the GPHG model gives medium performance in this comparison, with an accuracy of 0.943. In this comparison, the GPC model has the weakest functionality, with an accuracy of 0.928. 316 Informatica 49 (2025) 299–318 N. Zhang et al. References type 2 diabetes,” Endocr Rev, vol. 37, no. 3, pp. 190–222, 2016. Publisher: Oxford Academic. [1] S. M. Haffner, “Epidemiology of type 2 diabetes: https://doi.org/10.1210/er.2015-1116. risk factors,” Diabetes Care, vol. 21, no. [13] L. S. Greci et al., “Utility of HbA1c levels for Supplement_3, pp. C3–C6, 1998. Publisher: diabetes case finding in hospitalized patients with American Diabetes Association. hyperglycemia,” Diabetes Care, vol. 26, no. 4, pp. https://doi.org/10.2337/diacare.21.3.C3. 1064–1068, 2003. Publisher: American Diabetes [2] Y. Wu, Y. Ding, Y. Tanaka, and W. Zhang, “Risk Association. factors contributing to type 2 diabetes and recent https://doi.org/10.2337/diacare.26.4.1064. advances in the treatment and prevention,” Int J [14] K. Plis, R. Bunescu, C. Marling, J. Shubrook, and Med Sci, vol. 11, no. 11, p. 1185, 2014. Publisher: F. Schwartz, “A machine learning approach to National Library of Medicine. predicting blood glucose levels for diabetes https://doi.org/10.7150/ijms.10001. management,” in Workshops at the Twenty-Eighth [3] E. Wilmot and I. Idris, “Early onset type 2 AAAI conference on artificial intelligence, 2014. diabetes: risk factors, clinical impact and Publisher: AAAI. management,” Ther Adv Chronic Dis, vol. 5, no. [15] R. E. Glasgow, “A practical model of diabetes 6, pp. 234–244, 2014. Publisher: Sage management and education,” Diabetes Care, vol. Publications. 18, no. 1, pp. 117–126, 1995. Publisher: American https://doi.org/10.1177/2040622314548679. Diabetes Association. [4] G. L. Robertson, “Diabetes insipidus,” Endocrinol https://doi.org/10.2337/diacare.18.1.117. Metab Clin North Am, vol. 24, no. 3, pp. 549–572, [16] S. Nam, C. Chesla, N. A. Stotts, L. Kroon, and S. 1995. Publisher: Elsevier. L. Janson, “Barriers to diabetes management: https://doi.org/10.1016/S0889-8529(18)30031-8. patient and provider factors,” Diabetes Res Clin [5] J. R. GREEN, G. C. BUCHAN, E. C. ALVORD Pract, vol. 93, no. 1, pp. 1–9, 2011. Publisher: JR, and A. G. SWANSON, “Hereditary and Elsevier. idiopathic types of diabetes insipidus,” Brain, vol. https://doi.org/10.1016/j.diabres.2011.02.002. 90, no. 3, pp. 707–714, 1967. Publisher: National [17] P. J. Watkins, P. L. Drury, K. W. Taylor, and W. Library of Medicine. G. Oakley, Diabetes and its management. Wiley https://doi.org/10.1093/brain/90.3.707. Online Library, 1990. [6] M. Babey, P. Kopp, and G. L. Robertson, [18] S. H. Ley, O. Hamdy, V. Mohan, and F. B. Hu, “Familial forms of diabetes insipidus: clinical and “Prevention and management of type 2 diabetes: molecular characteristics,” Nat Rev Endocrinol, dietary components and nutritional strategies,” vol. 7, no. 12, pp. 701–714, 2011. Publisher: The Lancet, vol. 383, no. 9933, pp. 1999–2007, Nature. https://doi.org/10.1038/nrendo.2011.100. 2014. Publisher: The Lancet. [7] R. D. Lawrence, “Three types of human diabetes,” [19] S. Vijan, “Type 2 diabetes,” Ann Intern Med, vol. Ann Intern Med, vol. 43, no. 6, pp. 1199–1208, 152, no. 5, pp. ITC3-1, 2010. Publisher: ACP. 1955. Publisher: ACP. https://doi.org/10.7326/0003-4819-152-5- https://doi.org/10.7326/0003-4819-43-6-1199. 201003020-01003. [8] R. D. Lawrence, “Types of human diabetes,” Br [20] J. E. B. Reusch and J. E. Manson, “Management Med J, vol. 1, no. 4703, p. 373, 1951. Publisher: of type 2 diabetes in 2017: getting to goal,” JAMA, National Library of Medicine. vol. 317, no. 10, pp. 1015–1016, 2017. Publisher: https://doi.org/10.1136/bmj.1.4703.373. Jama Network. DOI: 10.1001/jama.2017.0241. [9] A. Ota and N. P. Ulrih, “An overview of herbal [21] E. Ahmad, S. Lim, R. Lamptey, D. R. Webb, and products and secondary metabolites used for M. J. Davies, “Type 2 diabetes,” The Lancet, vol. management of type two diabetes,” Front 400, no. 10365, pp. 1803–1820, 2022. Publisher: Pharmacol, vol. 8, p. 224659, 2017. Publisher: The Lancet. frontiers. [22] R. A. DeFronzo et al., “Type 2 diabetes mellitus,” https://doi.org/10.3389/fphar.2017.00436. Nat Rev Dis Primers, vol. 1, no. 1, pp. 1–22, 2015. [10] W. J. Pories, J. H. Mehaffey, and K. M. Staton, Publisher: Nature. “The surgical treatment of type two diabetes https://doi.org/10.1038/nrdp.2015.19. mellitus,” Surgical Clinics, vol. 91, no. 4, pp. [23] D. H. Laursen, A. Frølich, and U. Christensen, 821–836, 2011. Publisher: Surgical Clinics. “Patients’ perception of disease and experience [11] J. Zhang, Y. Deng, Y. Wan, J. Wang, and J. Xu, with type 2 diabetes patient education in D “Diabetes duration and types of diabetes treatment enmark,” Scand J Caring Sci, vol. 31, no. 4, pp. in data-driven clusters of patients with diabetes,” 1039–1047, 2017. Publisher: Wiley Online Front Endocrinol (Lausanne), vol. 13, p. 994836, Library. https://doi.org/10.1111/scs.12429. 2022. Publisher: frontiers. [24] O. Peleg, E. Hadar, and A. Cohen, “Individuals https://doi.org/10.3389/fendo.2022.994836. with type 2 diabetes: an exploratory study of their [12] Y. Yang and L. Chan, “Monogenic diabetes: what experience of family relationships and coping it teaches us on the common forms of type 1 and with the illness,” Diabetes Educ, vol. 46, no. 1, pp. Hybrid Machine Learning Framework for Type 2 Diabetes Prediction… Informatica 49 (2025) 299–318 317 83–93, 2020. Publisher: Sage Publications. https://doi.org/10.1177/0145721719888625. [25] E. A. Beverly, C. K. Miller, and L. A. Wray, “Spousal support and food-related behavior change in middle-aged and older adults living with type 2 diabetes,” Health Education & Behavior, vol. 35, no. 5, pp. 707–720, 2008. Publisher: Sage Publications. https://doi.org/10.1177/1090198107299787. [26] J. L. Browne, A. Ventura, K. Mosely, and J. Speight, “‘I call it the blame and shame disease’: a qualitative study about perceptions of social stigma surrounding type 2 diabetes,” BMJ Open, vol. 3, no. 11, p. e003384, 2013. Publisher: BMJ Journals. https://doi.org/10.1136/bmjopen-2013- 003384. [27] A. Dagliati et al., “Machine learning methods to predict diabetes complications,” J Diabetes Sci Technol, vol. 12, no. 2, pp. 295–302, 2018. Publisher: Sage Publications. https://doi.org/10.1177/1932296817706375. [28] M. Soni and S. Varma, “Diabetes prediction using machine learning techniques,” International Journal of Engineering Research & Technology (IJERT), vol. 9, no. 9, pp. 921–925, 2020. Publisher: IJERT. [29] L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, “Early detection of type 2 diabetes mellitus using machine learning-based prediction models,” Sci Rep, vol. 10, no. 1, p. 11981, 2020. Publisher: Nature. https://doi.org/10.1038/s41598-020-68771-x. [30] H. Sifaou, A. Kammoun, and M.-S. Alouini, “High-dimensional linear discriminant analysis classifier for spiked covariance model,” Journal of Machine Learning Research, vol. 21, no. 112, pp. 1–24, 2020. Publisher: JMLR. Available: https://www.jmlr.org/papers/v21/19-428.html. [31] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning, vol. 1. Springer, 2006. [32] Y. Li, W. Zhang, L. Wang, F. Zhao, W. Han, and G. Chen, “Henry’s law and accumulation of weak source for crust-derived helium: A case study of Weihe Basin, China,” Journal of Natural Gas Geoscience, vol. 2, no. 5–6, pp. 333–339, 2017. Publisher: Elsevier. https://doi.org/10.1016/j.jnggs.2018.02.001. [33] J. Staudinger and P. V Roberts, “A critical review of Henry’s law constants for environmental applications,” Crit Rev Environ Sci Technol, vol. 26, no. 3, pp. 205–297, 1996. Publisher: Taylor & Francis. https://doi.org/10.1080/10643389609388492 [34] E. Trojovská and M. Dehghani, “A new human- based metahurestic optimization method based on mimicking cooking training,” Sci Rep, vol. 12, no. 1, p. 14861, 2022. Publisher: Nature. https://doi.org/10.1038/s41598-022-19313-2. 318 Informatica 49 (2025) 299–318 N. Zhang et al. https://doi.org/10.31449/inf.v49i12.10230 Informatica 49 (2025) 319–332 319 A Comprehensive Evaluation Model for the State of Electric Energy Metering Devices Based on Fuzzy Analytic Hierarchy Process Chen Xu*, Zhang Chao, Zhang HaoMiao, Su YingChun, Yan Yu, Xu YinZhe State Grid Ningxia Marketing Service Center (State Grid Ningxia Metrology Center) ,Yin chuan 750000, Ning xia, China E-mail: cxhbdl@126.com *Corresponding uthor Keywords: fuzzy analytic hierarchy process, electric energy metering device, state evaluation, comprehensive evaluation model Received: July 17, 2025 Accurately evaluating the status of electric energy metering devices is the foundation for ensuring their stable operation on smart grids, and is conducive to the development of equipment management towards refinement and intelligence. This article proposes a comprehensive evaluation model through the fuzzy analytic hierarchy process (F-AHP), which is characterized by establishing a multi-index system and taking into account subjective opinions and objective data, thereby improving the scientificity of its evaluation and enhancing its anti-interference ability. It starts with establishing a hierarchical structure, dividing the functions of indicators such as structural reliability, measurement accuracy, communication stability, and environmental adaptability. Then, based on the fuzzy decision matrix assignment, the importance of each indicator is calculated, and the assignment and overall score of the indicators are obtained, completing the quantitative evaluation of the health of the measuring device. In the experimental verification, 50 typical electric energy metering device samples were selected for state evaluation modeling. The average CI value of the model was 0.016, the coefficient of variation CV was 0.069, and the accuracy of state recognition reached 92.5%. The evaluation results have high stability, can effectively identify boundary fuzzy samples, and have strong robustness and practical value. The results indicate that the evaluation model proposed in this article can better solve multiple practica l cases and the overall evaluation error does not exceed 5%; Compared with traditional AHP and weighted average method (WAM), this model performs better in state recognition accuracy and boundary blurring processing ability. Simultaneously conducting noise experiments and sensitivity analysis, and proving that the model has high stability and reliability under various abnormal conditions. Povzetek: F-AHP model z večkazalčno hierarhijo izboljša ocenjevanje stanja merilnih naprav, združuje subjektivne in objektivne podatke ter krepi robustnost. Na 50 vzorcih doseže 92,5 -odstotno točnost, nizko varianco ter boljšo prepoznavo mejnih primerov. 1 Introduction factor, and cannot provide sufficient measurement scales. The proportion of weights is too subjective and the With the development of smart grids, the position of definition is not clear enough, which cannot adapt to the energy metering devices in the operation and production operation of large-scale equipment. of power grid enterprises is becoming increasingly Different factors can affect the operating status of important. They not only serve as the basic unit of energy metering devices, such as installation location, wear measurement for billing and metering, but also perform and tear of components, data transmission quality, power important tasks such as data collection, load monitoring, supply quality, and changes in grid noise. There are not only and equipment status recognition, playing an important quantitative factors that can be quantified to a single value, role in ensuring the quality of power supply and serving but also qualitative evaluation factors that cannot be customer rights. With the rapid development of smart directly quantified. Whether the instrument interface grids and the increasing number of connected devices, it docking is reasonable is an indicator that cannot be directly is essential to accurately grasp the operating status of measured. Historical data shows that the trend of error rate metering devices and be able to detect risks early. changes has a strong human explanatory factor. Therefore, However, traditional evaluation methods often rely on it is not easy to simultaneously balance "orderliness" and human visual inspection or judgment based on a single "fuzziness" using only traditional analytic hierarchy 320 Informatica 49 (2025) 319–332 C. Xu et al. process or fuzzy mathematics methods. To more grey model, etc. [7]; The second is to use intelligent accurately and comprehensively characterize the overall analysis methods to achieve intelligent determination of working status of electric energy metering, it is device operating status, such as applying deep learning necessary to establish a performance evaluation model technology to establish multi-sensor models for data with clear hierarchy and measurement fuzzy anomaly analysis and anomaly source localization [8]; The acceptability requirements [3]. This article establishes a third is a comprehensive equipment operation status comprehensive performance evaluation model using the evaluation model formed by integrating multiple decision- fuzzy analytic hierarchy process to design a specific making methods such as fuzzy reasoning technology, grey evaluation system model for electric energy metering target theory, and analytic hierarchy process. devices in real-world operating scenarios. This model is Some scholars have discussed the challenges of state based on a multi-dimensional evaluation index system identification under special conditions such as nonlinear and integrates professional knowledge and real-time data. loads and power quality disturbances. For example, Shah After establishing a fuzzy judgment matrix, determining (2023) [9] designed an artificial intelligence based weights, and conducting consistency checks, it forms a nonlinear load detection and identification system, which comprehensive evaluation model with clear hierarchy, can reasonably identify power data containing noise and reasonable weights, and practical effectiveness, thus structural abnormalities; Yu et al. (2022) [10] established changing the shortcomings of traditional methods that an online power quality monitoring mode using grey target cannot cope with fuzziness and human factors. Through theory and achieved multi-layer classification and this model, equipment managers can achieve identification of key indicator trends in practical problems. quantitative diagnosis of equipment operating status, Zhang et al. (2022) [11] also proposed using Software identify operational defects, and assist in developing Defined Networking (SDN) to reconfigure the data differentiated maintenance and repair plans. transmission path for system architecture, in order to ensure The structure of this article is arranged as follows: the reliability and effectiveness of the acquisition process Chapter 2 provides an overview of the research status of of electricity metering data under diverse input conditions. existing power measurement equipment status It is worth noting that the application of Fuzzy Analytic evaluation; Chapter 3 further elaborates on the design Hierarchy Process (FAHP) in power status assessment and ideas and construction methods of the proposed model; evaluation has also received more attention. For example, Chapter 4 mainly presents the implementation methods Taherikhonakdar et al. (2023) [12] used a combination of and evaluation process of the model; Chapter 5 provides Fuzzy Analytic Hierarchy Process and Grey System to a discussion of an example and presents a comparative evaluate the status of 750kV energy metering devices. In F, analysis of the example, as well as an analysis of the they classified and rated the measured 750kV energy practicality and robustness of the model; The final metering devices, and obtained a more reasonable and chapter six provides a comprehensive summary of the comprehensive evaluation result of 750kV energy metering research content and prospects for future development devices. Paunkov et al. (2023) [13] proposed an adaptive directions. correction mechanism for real-time calibration of measurement deviation using fuzzy control rules, which 2 Related work achieved real-time adjustment of measurement deviation and improved the consistency and accuracy of device Although energy metering is becoming increasingly ratings. From this, it can be seen that the Fuzzy Analytic important in intelligent power grids, there are still many Hierarchy Process (FAHP) has the advantages of fuzzy challenges in identifying and optimizing measurement analytic hierarchy process due to its consideration of the deviations in energy meters. The various complex and fuzzy modeling process between multiple factors and the ever-changing environments in which electric energy allocation of weights for multiple factors. It is a powerful meters are used make errors in electric energy metering means to achieve "accurate comprehensive use" state devices not only caused by external electromagnetic evaluation. interference, nonlinear loads, harmonics, but also by The research on transfer learning and generative aging of the equipment itself and constraints on design models has also expanded the scope of multi characteristic accuracy [4]. Especially in situations where different analysis for device state assessment. Alrobaie et al. (2023) types of instruments are shared, voltage fluctuations are [14] utilized a balanced comprehensive evaluation method large, and large amounts of data are transmitted, for power quality issues based on CVAE-TS, which traditional methods are no longer able to meet the considers the effectiveness and wide applicability of the requirements of power network operation efficiency and evaluation method; Qu et al. (2024) [15] used an improved accuracy. Therefore, researchers hope to find new online XGBoost to construct an evaluation model for power inspection methods and self diagnostic models to use system stability transfer degree, which has good scalability digital technology to track the evolution process of in multi scenario analysis applications. This has laid a monitoring errors [5, 6]. theoretical foundation for the subsequent construction of an In recent years, the focus of discussions on abnormal adaptive state evaluation mode suitable for power grid energy metering has mainly been on anomaly detection measurement devices. schemes based on feature extraction and modeling, such Based on the existing research results, it can be found as equipment operation status classification detection that the main technologies at present have made certain and prediction based on gradient boosting tree (GBDT), progress, such as error detection and data processing, but A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 321 there are still many areas that urgently need consistency analysis, and state level grading standards, it improvement. One is that the current indicator system effectively solves the problems of current models in lacks strict hierarchical relationships and adaptation structural design, weight allocation, and result rules, which limits the performance that can be used in interpretation, and can provide reference for later complex situations. The other is that although expert maintenance plan formulation and maintenance evaluations have a certain degree of reliability and arrangement optimization. flexibility, cognitive biases or subjective uncertainties Table 1 compares the performance of existing may still occur in some situations, and fuzzy representative state evaluation methods in terms of data mathematical methods need to be introduced to establish type, evaluation path, accuracy, and robustness. It can be quantitative evaluation models. The third issue is that seen that the state evaluation model based on fuzzy analytic most of the models cannot clearly provide the level hierarchy process (F-AHP) proposed in this article is classification and visual effects of the results, which superior to traditional models in terms of accuracy and affects the effectiveness of the output [16]. In response applicability, especially in supporting hierarchical output to these shortcomings, this article establishes a state and fuzzy boundary recognition, which provides theoretical evaluation model based on fuzzy analytic hierarchy support for the subsequent construction of intelligent process to establish a hierarchical structure, fuzzy weight metering device management and control mechanisms. reconstruction model, and evaluation model that is easy to understand. Based on fuzzy judgment matrix, Table 1: Comparison between existing methods and the model proposed in this paper Method Technical Robustnes Sample Type Evaluation Metrics Name Approach s GBDT Smart meter Gradient Boosting Single error metric Moderate Model [7] time-series data Decision Tree Grey Target Power quality Grey decision Multi-feature trend Strong Theory monitoring data model analysis [10] SDN Prediction SDN monitoring Communication metrics Prediction optimization + Fair and control data focused Model [11] graph structure FAHP + 750kV high- Grey Multi-layer weight voltage Four state dimensions Moderate System fusion equipment [12] Three-phase F-AHP (Fuzzy Proposed meters, terminal Four-layer metrics + Analytic Hierarchy Strong Method devices (50 graded evaluation Process) samples) This article mainly emphasizes several key issues in applying on a large scale, the lack of a unified level the current state evaluation of electric energy metering judgment logic and hierarchical strategy for evaluation can devices. easily lead to monitoring delays and failure to identify risks Most existing models use a fixed weight in a timely manner. superposition method, which has not formed an effective In response to the above issues, improving the hierarchical structure and failed to reflect the relative scientific construction, fuzzy adaptability, and hierarchical importance between indicators. In reality, there are establishment of state assessment models has become the significant differences in the equipment level of each core content of current research. Therefore, this article will device, and the same evaluation may not highlight focus on the exposition of these two issues as the main line individual issues, which reduces the specificity of the to carry out the research in the following text. evaluation. Can the fuzzy analytic hierarchy process balance clear Existing research has shown that there is no structure and fuzzy information processing to enhance the emphasis on the processing and application of fuzzy scientific evaluation of the state of electric energy metering information. In the actual evaluation process, many devices? subjective and fuzzy factors, such as "connection How to build a multi-level classification system with standards" and "operational stability", have not been discriminative power, so that the evaluation model can considered in the system design, resulting in fixed adapt to the equipment management needs in different thinking in the evaluation results and insufficient application scenarios? response to complex and changing real states. This article proposes a comprehensive state evaluation In terms of evaluating output expression, there is a model based on fuzzy analytic hierarchy process to address lack of hierarchical expression, making it difficult to the above issues. Its main innovations lie in the following achieve refined management. When promoting and aspects. 322 Informatica 49 (2025) 319–332 C. Xu et al. Build a multidimensional indicator system that is the comprehensive status of the electric energy metering covers key elements such as structure, error, and device, while the criterion layer includes four key attributes: communication, and allocate weights through FAHP to structural reliability, measurement accuracy, enhance the hierarchical and explanatory power of the communication stability, and adaptability to operating model. environments, The indicator layer is further refined into Introduce fuzzy judgment matrix and consistency more than ten quantifiable or determinable specific check mechanism to enhance the ability to accommodate indicators (such as error drift rate, wiring standardization, subjective evaluation information and solve the signal packet loss rate, etc.). There are significant attribute instability problem in traditional AHP applications. differences and cognitive ambiguity among various A systematic evaluation workflow was developed indicators, making it suitable to use triangular fuzzy and examples were used to verify the effectiveness of the numbers to construct a judgment matrix and calculate model in identifying weak links and assisting precise relative weights and comprehensive scores. management. Experimental results also showed that this Compared with traditional single weighted sum model has advantages in stability and adaptability methods, FAHP has the following advantages: firstly, it compared to traditional models, and is easier to promote. allows experts to use fuzzy language (such as "slightly higher" or "significantly stronger") to evaluate when constructing the judgment matrix, and improves the 3 Design of model construction flexibility and closeness of judgment through fuzzy number methods transformation; Secondly, FAHP introduces the maximum In the comprehensive evaluation model proposed in this membership degree and fuzzy consistency check article, the selection of fuzzy analytic hierarchy process mechanism in the weight calculation process, which can as the core method is based on its advantages in dealing effectively reduce the impact of subjective errors on the with complex and multi-level indicator systems, which evaluation structure, thereby improving the consistency and combine structural clarity and fuzzy adaptability. robustness of the evaluation model. Although the traditional Analytic Hierarchy Process The difference between FAHP and evaluation models (AHP) has good structural modeling capabilities and is such as weighted average, entropy weight, and TOPSIS is suitable for multi factor evaluation problems, it often that FAHP can more clearly, accurately, reasonably, and exhibits limitations such as strong subjectivity and poor intuitively handle such problems when multiple indicators consistency of judgment matrices when facing practical coexist and subjective and objective factors are intertwined. problems such as fuzzy expert cognition and unclear When the equipment conditions become increasingly boundaries between indicators. The fuzzy analytic diverse and complex, and there is a certain degree of hierarchy process, by introducing fuzzy numbers and ambiguity in expert evaluations, the advantages and fuzzy judgment matrices, not only retains the superiority of this method in weight setting and result hierarchical logical structure of AHP, but also description will be fully reflected. In addition, this method significantly enhances the model's ability to does not require too much historical data and complex accommodate fuzzy information, improving the stability optimization algorithms to support, so it can be well applied and practicality of comprehensive evaluation. to online monitoring systems or distributed management This model divides the comprehensive status of systems, which greatly improves its computing speed and electric energy metering devices into three levels: target applicability. level, criterion level, and indicator level. The target layer Comprehensive status of Target electric energy metering Measurement Communication Adaptability to operating Structural reliability Criterion accuracy stability environment layerlayer rosion Installation Shell Sealing Cor Wiring measurement Error standard Calibration Signal packet Communicati anti- frequency loss rate on delay interference integrit performance degree of standardizati correctness error drift deviation terminals on ability ↑ Indicator layer (can be simplified or expanded according to actual needs) Figure 1:Structure diagram of comprehensive state evaluation model for electric energy metering devices based on fuzzy analytic hierarchy process (F-AHP) A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 323 As shown in Figure 1, this article uses the Fuzzy The indicator layer specifically refers to observable Analytic Hierarchy Process (F-AHP) to construct a indicators. Set indicators such as "external shell integrity", multi-level hierarchical structure consisting of the "joint corrosion condition", and "fixed fastening" based on objective layer, criterion layer, and indicator layer. It the "structural reliability" index of external structural integrates fuzzy judgment matrix, weight extraction, and integrity to measure the true degree of object damage; Set consistency check to achieve comprehensive evaluation indicators such as "error bounce rate", "standard error of multi-source indicator information. degree", and "regular calibration frequency" based on the "measurement accuracy" indicator to measure the 3.1 Construction of state indicator system measurement accuracy and precision of the instrument on for electric energy metering devices electrical data; Set indicators such as "communication stability" to measure the integrity and timeliness of This article uses the Analytic Hierarchy Process (AHP) communication information between equipment and central to evaluate the overall situation of power metering stations, including "communication delay degree", "data devices, and constructs a clear logical hierarchical model loss ratio", and "noise immunity"; The environmental to analyze the overall state of power metering devices tolerance index is composed of indicators such as during operation. Based on this, three different modules "adaptability to usage environment", "temperature range of are formed, including the target layer, criterion layer, working environment", "humidity range of working and indicator layer. Its core function is to transform the environment", "anti-interference degree of electromagnetic fuzzy status of power metering device operation and environment", and "outdoor protection category". management into a hierarchical system with a systematic These indicators together constitute the feature vector structure, and thus become a comparable and computable for evaluating the input of the model. Let the indicator data system. vector of the i-th measuring device be: At the target layer, the overall operational performance of the measuring device is defined as the X i=[xi1,xi2  ,xin] (1) evaluation criterion and is the ultimate object of the Among them, xij represents the observation or rating model. This layer contains four types of primary attribute values, including structural reliability, value of the i-th device on the jth indicator, and n is the total measurement accuracy, communication stability, and number of indicators. To eliminate the influence of environmental adaptability. These four types of attribute dimensionality, all indicators will be normalized in the values are the structural performance, metrological future. performance, communication performance, and Unlike traditional evaluation methods that simply environmental adaptability of the measuring device. weight and add various indicators, this paper establishes a They cover the main functions of the physical judgment matrix based on a fuzzy hierarchical structure for performance, metrological performance, communication weight extraction, and introduces fuzzy language variables performance, and environmental performance of the to quantitatively express qualitative indicators, thereby measuring device, and are all key elements for enhancing the model's ability to handle "subjective evaluating the operational quality of the measuring fuzziness" and "cross indicator correlation". device. Table 2:State index system of electric energy metering devices Dimension Metric Name Metric Type Reference Range Category Structural Intact / Minor Damage / Enclosure Integrity Qualitative Reliability Severe Damage Terminal Corrosion Structural Qualitative None / Mild / Severe Level Reliability Structural Installation Stability Qualitative Firm / Loose / Detached Reliability Measurement Error Drift Rate Quantitative 0% ~ 2% Accuracy Measurement Standard Deviation Quantitative 0 ~ 0.05 Accuracy Communication Packet Loss Rate Quantitative 0% ~ 5% Stability Communication Communication Quantitative 0ms ~ 300ms Latency Stability Communication Noise Immunity Qualitative Weak / Moderate / Strong Stability Temperature Environmental Quantitative -25℃ ~ +60℃ Adaptability Suitability Protection Rating Environmental Qualitative IP20 / IP54 / IP65 etc. (IP) Suitability 324 Informatica 49 (2025) 319–332 C. Xu et al. In the entire model system, the selection of weights for each level, and deblurring is applied to convert indicators follows the principle of "comprehensive the fuzzy numbers into clear weight values, ultimately coverage, quantifiability, and distinguishability", forming a standardized weight vector. This process ensures striving to ensure the evaluation accuracy and that the weight allocation of each indicator to the overall discriminative ability of the model while considering the state evaluation results is interpretable. To adapt to feasibility of the project. practical application scenarios, the model also introduces a hierarchical synthesis mechanism, which weights and 3.2 Principles and applications of fuzzy summarizes the evaluation values of each sub indicator to obtain the comprehensive score of each device's state. At analytic hierarchy process the same time, to avoid extreme value interference, a This article uses the Fuzzy Analytic Hierarchy Process normalization processing function is set up within the (FAHP) to solve the multi factor and multi-level system to standardize the mapping of the original scores, uncertainty problems encountered in the overall state thereby making different devices comparable. evaluation process of power metering equipment. Compared with the traditional Analytic Hierarchy Process (AHP), FAHP based on fuzzy mathematical 3.3 Hierarchical structure and weight theory can better adapt to the fuzziness and subjectivity calculation of evaluation model of expert judgment, improving the scientific and robust To achieve a systematic evaluation of the operating status overall evaluation. of electric energy metering devices, this paper constructs a The core idea of FAHP is mainly to model the three-level fuzzy analytic hierarchy process model. The evaluation level range, construct a fuzzy judgment model structure consists of a target layer, a criterion layer, matrix, calculate fuzzy weight vectors, and perform and an indicator layer from top to bottom, with clear hierarchical summarization. This method uses the three- hierarchical logic and comprehensive evaluation (l dimensions. It can effectively cover multiple key aspects ij,mij,uij) sided fuzzy number to score experts, involved in device operation, such as performance, indicating the weight relationship and objectivity environment, maintenance, and faults. between different indicators, thereby reducing subjective The target layer is set as the "comprehensive state level misjudgments caused by human operation. Specifically, of electric energy metering devices", representing the it represents the lowest judgment value, and if it is the overall goal that needs to be judged ultimately; The criteria most likely judgment value, it is the highest judgment layer includes four dimensions: "structural value. To achieve fuzzy quantification of subjective reliability,""metrological accuracy,""communication judgments, this article adopts a nine-level fuzzy stability,"and "en vironmental adaptability,Evaluation language scale, and its corresponding relationship with dimensions are constructed from the perspectives of triangular fuzzy numbers is shown in Table 3. equipment stability, external environmental resilience, implementation of operation and maintenance systems, and Table 3:Mapping table of fuzzy language and fault susceptibility; The indicator layer is refined into triangular fuzzy numbers several observable sub indicators, such as measurement accuracy, voltage load response, resistance to temperature Triangular Fuzzy Fuzzy Term and humidity fluctuations, calibration frequency, fault Number (l, m, u) repair cycle, etc., to ensure that the evaluation of each Equally Important (1, 1, 1) dimension has practical operability and measurement basis. Slightly More (1, 2, 3) Important In the stage of determining model weights, the Fuzzy Moderately Analytic Hierarchy Process (FAHP) is used for weight (2, 3, 4) Important calculation. Firstly, organize multiple experts in power Clearly More equipment operation and maintenance, as well as (4, 5, 6) Important measurement technicians, to conduct pairwise comparisons Strongly Important (6, 7, 8) around the elements of the criterion layer and indicator Extremely Important (8, 9, 10) layer, and construct a fuzzy judgment matrix. The relative importance range of each comparison result is expressed as Among them, (l, m, u) respectively represent the (lij,mij,u lower limit, median, and upper limit of the uncertain a triangular fuzzy number of ij) , effectively interval for judgment. Experts use this as a basis for quantifying the fuzziness in subjective judgments. language evaluation when constructing a fuzzy judgment Subsequently, the weight calculation and consistency check matrix, and further use it for weight calculation and are completed through the following steps: consistency testing. ①Fuzzy synthesis weight calculation: using the fuzzy In the matrix construction stage, the relative weights arithmetic mean method to perform fuzzy synthesis between each criterion are compared using fuzzy calculation on each judgment matrix, obtaining the fuzzy pairwise comparisons to generate a fuzzy judgment weight vector of each layer element; matrix, and fuzzy consistency checks are used to ensure ②De fuzzification processing: Convert triangular fuzzy that the judgment logic is reasonable. Subsequently, the numbers into corresponding clear weight values. The commonly used methods are "maximum membership fuzzy synthesis algorithm is used to calculate the fuzzy A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 325 Indicator data Constructing a Consistency Consist collection and Fuzzy Judgment check ency preprocessing Matrix passed? Generate a Fuzzy Weight comprehensive comprehensive calculation and yes status evaluation and normalization Reconstruct no Figure 2:Model implementation and evaluation flowchart 4 Model implementation and degree method" or "center average method". In this study, the latter is chosen to improve computational efficiency; evaluation process ③Normalization adjustment: Scale each weight so This study constructed a hierarchical comprehensive that the sum of each weight is 1, to ensure comparability evaluation model from the data collection layer to the and accuracy of the model's weighting calculation. processing layer, and from the evaluation layer to the ④Consistency test: Use the CR ratio to determine warning layer. According to its order, it can be divided into: whether the consistency of each judgment matrix is good. firstly, using a predetermined set of state indicators to If cr<0.1, it is considered that the consistency of each collect standardized basic data; Then, construct a fuzzy judgment matrix is excellent and the calculation result is decision matrix and verify its consistency to ensure that the acceptable. weights of each indicator value in each layer are reasonable; Finally, weighting the elements at different levels in After the consistency verification is completed, the fuzzy the system and using them as weighting vectors to analytic hierarchy process (F-AHP) is used for multi-level participate in the fuzzy evaluation system in the data correlation to quantify the membership degree of following text is beneficial for the classification of various electrical measurement and metering equipment power measurement and control equipment status. It not states and determine the operating level of the equipment. only enhances the scientificity and practicality of the Fully considering the actual situation of the power grid, it model, but also improves the state analysis and decision- has high applicability and openness. The workflow is making performance of the measurement and control shown in Figure 2. equipment. xij - min( x j ) x′ 4.1 Indicator data acquisition and ij = max( x j ) - min( x j )standardization processing (2) The first step in model implementation is to obtain the For negative indicators (the smaller the value, the raw indicator data of the energy metering device. The better the state), use the reverse standardization formula: selected state indicators in this article cover five max( x j ) - x dimensions: structural reliability, measurement accuracy, ′ ij xij = communication stability, and environmental adaptability. max( x j ) - min( x j ) The relevant data mainly comes from multiple channels (3) such as on-site inspection records of enterprises, online Among them, xij represents the original value of the , monitoring systems, device self diagnosis modules, and jth indicator in the i-th object, and xij is its standardized historical maintenance archives, ensuring the value. This standardization process can unify all indicator comprehensiveness and representativeness of data data into the [0,1] interval, avoiding interference from sampling. numerical dimensions in the calculation of model weights, Due to differences in the measurement units and ensuring the fairness and scientificity of the evaluation numerical ranges of each indicator, direct use for system, and laying a data foundation for the subsequent evaluation may result in weight shift and result distortion. construction of fuzzy judgment matrices and weight Therefore, it is necessary to standardize the original data. analysis. The standardization method is divided into two categories based on indicator attributes: for positive indicators (the larger the value, the better the state), the range standardization method is used: 326 Informatica 49 (2025) 319–332 C. Xu et al. 4.2 Construction of fuzzy judgment also further enhances the credibility of the total weights, matrix and consistency test and provides a scientific basis for our later fuzzy analytic hierarchy process (F-AHP). On the basis of standardized indicator data, in order to achieve the importance ranking of factors between different evaluation levels, it is necessary to construct a 4.3 Calculation of comprehensive fuzzy judgment matrix and conduct consistency checks. evaluation value and classification of status Its core lies in introducing subjective judgment through levels expert scoring method, while combining fuzzy After completing the weight calculation and indicator mathematics to handle ambiguity and uncertainty, to standardization processing, the most important model enhance the adaptability and practical operability of the evaluation task to be executed next is the allocation of model. comprehensive evaluation values and state levels. By using The basic steps for constructing a fuzzy judgment the fuzzy analytic hierarchy process (F-AHP), qualitative matrix are as follows: Firstly, based on the hierarchical evaluation is transformed into quantitative evaluation to structure model, the importance of each indicator in the accurately reflect the status of energy metering equipment. same layer is compared pairwise, and a judgment matrix Based on the constructed weight vector w=(w1, w2,..., is established by referring to the 1-9 nine level scaling wn) and the indicator membership matrix R, use fuzzy method operations to comprehensively evaluate and calculate the A=(aij ) nn comprehensive membership vector B. (4) B=W R=(b Among them, 𝑎𝑖𝑗represents the importance of the 1,b2 ,...,bm) (7) i-th indicator relative to the j-th indicator. In the fuzzy Among them, B is the overall membership degree of analytic hierarchy process (F-AHP), the elements of the each state level, W is the weight vector, and the judgment matrix are represented in the form of triangular membership matrix R is an n × m dimensional matrix, fuzzy numbers, namely ?̃?𝑖𝑗 = representing the membership values of each evaluation indicator at different state levels, reflecting the degree of (𝑙𝑖𝑗, 𝑚𝑖𝑗, 𝑢𝑖𝑗),representing the lowest possible value, fuzziness of the equipment belonging to the four categories the most reliable value, and the highest possible value, of "excellent, good, medium, and poor" on each indicator. respectively, reflecting the expert's judgment of the The construction method is usually based on expert scoring importance of the i-th indicator relative to the j-th or fuzzy quantification rules, mapping each original indicator under uncertain conditions. For example, indicator value to the [0,1] interval through a membership experts believe that "slightly important" can be function to form a membership vector. For example, a lower represented as a fuzzy number (2, 3, 4), while "extremely communication packet loss rate can correspond to a higher important" can be represented as (8, 9, 9). When ?̃?𝑖𝑗=(l, membership degree in the "excellent" state, while in the m, u), its reciprocal can be expressed as (1/u, 1/m, 1/l), "poor" state, the membership degree is close to 0. After satisfying a fuzzy symmetry relationship where they are vertically arranging the fuzzy membership vectors of all reciprocal to each other. After completing the indicators, a complete membership matrix R is formed. m preliminary judgment matrix, calculate the eigenvectors is the number of state levels, and the sample is calculated. and normalize them using the following method to obtain bk represents the membership degree of the sample at the the preliminary weights of each indicator kth state level, and the higher the value, the closer the sample is to the kth state level. The weighted sum of n a1/n  membership degrees that can ultimately be used to j=1 ij w comprehensively evaluate the value is calculated as follows. i = n n  a1/n m i=1 j=1 ij (5) S = bk vk To ensure the consistency of the judgment results, it k =1 (8) is necessary to perform consistency checks on the Among them, vk is the membership value judgment matrix. The specific process includes: corresponding to the state level, which is generally assigned calculating the maximum eigenvalue λmax , consistency based on the state level, such as excellent, To achieve index CI, and consistency ratio CR, where: quantitative grading of equipment operating status, this λ - CI article maps the comprehensive score S to four status levels, CI = max n ,CR = n -1 RI defined as follows: Excellent (Level I) =4 points, Good (6) (Level II) =3 points, Fair (Level III)=2 points, Poor (Level Among them, RI is a random consistency index, IV)=1 point. The scoring criteria for each level are shown which is obtained based on its parameter size n and can in Table 2. This assignment scheme adopts linear be directly consulted. If CR<0.10, it indicates that its equidistant scores to reflect the balance of level differences, matrix has met the consistency check requirements; On facilitating weighted operations and membership analysis. the contrary, it is necessary to adjust the original At the same time, it has scalability and can be adjusted to a assignment and perform another operation. This not only percentage system or non-linear weight structure according ensures the systematic rigor of the model structure, but to business needs. A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 327 Table 4: Comprehensive evaluation values and status classification standards for electric energy metering devices Comprehensive Score Status Level Status Description Range 3.5–4.0 Excellent Good condition, stable operation 2.5–3.4 Good Slight fluctuations, basically normal 1.5–2.4 Fair Operational fluctuations, attention needed 1.0–1.4 Poor Abnormal condition, maintenance required The grading criteria in Table 4 refer to the principle This is used to test and verify the adaptability and stability of linear distribution and set the scoring interval of the model. Taking into account both existing and new boundaries based on expert experience and opinions. equipment types for the selected samples, the voltage level Due to the final score S ∈ [1,4] and a total interval length involves urban-rural differences, meeting the of 3, it is divided into three complete intervals and one comprehensive and rigorous requirements of the overall compensated low interval (1.0-1.4) using the equidistant evaluation process. It should be noted that although the data method, aiming to improve the recognition sensitivity of obtained this time has real-time and practical relevance, it "poor" level devices. This design facilitates the is highly likely that some indicator data may be incomplete implementation of a hierarchical response mechanism due to human inspection errors or system failures, and some and also has good scalability. samples may have subjective descriptions or abnormal missing items. All sample data comes from the enterprise's own measurement equipment operation and maintenance 5 Analysis of experimental results management system. The data has been anonymized and This article proposes a model analysis and evaluation only retains information related to the device's operating based on the fuzzy hierarchy process for the state status, without involving user privacy. Each indicator data evaluation of electric energy metering devices. The includes quantitative values (such as error drift rate, experimental data is based on real-time data from the communication packet loss rate) and qualitative scores power distribution network and includes various (such as protection level, installation tightness). The operating modes and environmental conditions. By qualitative items are consistently scored by two operation analyzing and comparing the effects of different weights and maintenance experts and mapped to a three-level rating and classification choices on the model, it is proven that value. There are a small number of missing fields in the data, the model method proposed in this article can distinguish which will be filled in using industry standard empirical equipment states and has a better ability to classify values or adjacent device means. All raw data undergo equipment. Finally, the experimental results of each interval normalization before being input into the model to stage were analyzed and discussed, and the applicability eliminate the influence of dimensionality and ensure that all and stability were explored. indicators have a unified dimension between [0,1] before participating in fuzzy synthesis operations. 5.1 Experimental data sources and case selection 5.2 Display of model evaluation results The case data of this study is selected from the historical After constructing the Fuzzy Analytic Hierarchy Process archives of the power metering equipment management (F-AHP) model, this article conducted a comprehensive system, covering various forms such as metering state rating and grading of the 50 selected samples of equipment forms, three-phase smart meters, electric energy metering devices. According to the comprehensive substations, and power quality normalized scores of various indicators multiplied by their monitoring terminals. It is scattered in the supply and weights, the comprehensive evaluation value of each object distribution grids of urban and rural areas, presenting is calculated, and based on the preset membership function, significant differences in external environment and load its status is divided into four levels: "excellent, good, changes. The original data includes eight main indicators medium, and poor". From the overall evaluation results, including equipment reliability, counting accuracy, most of the electric energy metering devices are in the connection consistency, communication performance, "good" or "medium" level range, indicating that the working environment, and failure rate, as well as various operating status of the metering devices in the current secondary indicators. The data has strong system is generally controllable. However, some samples representativeness and completeness, and is suitable for have problems such as unstable communication, poor the design and evaluation of Fuzzy Analytic Hierarchy environmental adaptability, and decreased metering Process (F-AHP) in this article. In order to ensure the accuracy, which need to be brought to the attention of the universality and effectiveness of the case selection, the operation and maintenance department. research team selected 50 typical samples for modeling analysis. The selection principles mainly include 5.3 Comparative analysis with traditional completeness, comprehensive coverage of relevant indicator types, and typicality, which fully reflect the evaluation methods real differences in different installation positions, In order to comprehensively verify the effectiveness of the working conditions, and types of measuring equipment. proposed F-AHP model in the state evaluation of electric 328 Informatica 49 (2025) 319–332 C. Xu et al. energy metering devices, we selected the widely used compared and analyzed the comprehensive performance of traditional Analytic Hierarchy Process (AHP) and the three methods. The experimental sample consists of 10 Simple Weighted Average Method (WAM) as control representative sets of electric energy metering devices, and objects, and classified the same batch of sample data into the data is sourced from on-site monitoring records in actual state levels under a unified indicator system. We also operating environments. Table 5: Comprehensive performance comparison of different methods Fuzzy Boundary Method Average CI Average CV State Classification Sample Recognition Type Value ↓ Value ↓ Accuracy ↑ Ability F-AHP 0.016 0.069 92.5% High AHP 0.082 0.125 78.0% Medium WAM – 0.109 81.3% Low Note: CI is a consistency evaluation index for hierarchical structure weight fusion is the key to improving judgment matrices in AHP methods and is not applicable the overall evaluation quality. to methods such as WAM that do not have pairwise comparison structures. Therefore, this item is empty. 5.4 Model stability and robustness As shown in Table 5, the F-AHP model outperforms verification AHP and WAM in key indicators such as grade In order to further evaluate the applicability and stability of discrimination accuracy, consistency ratio (CI), and the proposed F-AHP model in the actual state evaluation of evaluation stability (measured by coefficient of variation electric energy metering devices, this study empirically (CV)). Specifically, the average CI of the F-AHP model verifies the stability and robustness of the model from three is 0.016, which is much lower than the traditional AHP's dimensions: input disturbance response, consistency 0.082, indicating that it has better consistency in the fluctuation amplitude, and extreme value adaptability. By multi-level weight processing process; In terms of CV, introducing perturbation factors and boundary condition the average value of F-AHP is 0.069, indicating that it perturbations on the original dataset, and comparing the has the smallest fluctuation in ratings among different fluctuation of results under different evaluation models, the samples and has stronger evaluation robustness. At the performance reliability of the F-AHP model in complex same time, the F-AHP model performs particularly well application scenarios is revealed. in handling state fuzzy boundary samples. It uses triangular fuzzy numbers to construct a judgment matrix, Firstly, in the input disturbance test, we randomly which reflects subjective judgment uncertainty while perturbed the indicator data of 10 sets of electric energy enhancing the model's ability to identify critical state metering device samples with amplitudes of ± 5% and ± devices, avoiding the problems of "fuzzy concentration" 10%, respectively, and observed whether the and "level distortion" in traditional methods. The so- comprehensive evaluation score of the model and its called 'fuzzy boundary samples' refer to samples whose corresponding level deviated. The results show that when comprehensive rating results are close to the critical the disturbance amplitude is less than 10%, more than 80% values of two state levels (such as 2.49 or 3.51). In actual of the sample levels remain unchanged in the F-AHP model, equipment status assessment, this type of sample and the change in the comprehensive score is controlled judgment is the most sensitive and susceptible to weight within 0.06 (as shown in Figure 5), indicating that the disturbances or changes in individual indicators. In this model has good input robustness. article, it is defined that when the score S of a sample Secondly, in the consistency ratio volatility test, we falls within the range of 0.1 above or below a certain conducted 500 Monte Carlo random perturbation level boundary (such as S ∈ [2.4, 2.6]), it is considered a experiments on the constructed fuzzy judgment matrix and fuzzy boundary sample. We will calculate whether recorded the consistency ratio CI values obtained from each different models experience "state level jumps" (such as calculation. The statistical results show that the CI value result changes under ± 10% perturbations) on this type fluctuation range of the F-AHP model is concentrated of sample, and judge their boundary recognition ability between [0.011, 0.021], with a standard deviation of 0.0026, based on this. The F-AHP model only showed a skip which is much lower than the fluctuation standard deviation level in 1 out of 10 boundary samples, outperforming of the AHP model of 0.0093 (see Table 6), indicating that traditional AHP (3 cases) and WAM (4 cases), indicating F-AHP can maintain stable consistency control ability its strong boundary control ability. under complex weight combinations. The experimental results show that the F-AHP Thirdly, in the extreme boundary sample test, we model balances accuracy, stability, and interpretability selected 5 groups of samples located near the boundary of in state evaluation tasks with multiple indicators, levels, the level division and observed the trend of their final state and fuzzy information, demonstrating significant determination under the condition of weight perturbation comprehensive advantages and having good practical range of ± 15%. The F-AHP model can effectively buffer application prospects. Compared with traditional boundary samples through weight processing in the form of methods, its innovation in fuzzy logic modeling and fuzzy numbers, with only one group of samples experiencing a level transition (from "level II" to "level I"), A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 329 while in traditional AHP models, there are three groups evaluation scores under different disturbance amplitudes experiencing a level change under the same conditions. clearly reflects the stability of its score curve at various This further demonstrates the robust control capability disturbance levels. of the F-AHP model in boundary fuzzy regions. As shown in Figure 3, the variation trend of F-AHP model 1 0,9 0,8 0,7 0,6 0,5 Original Score Disturbance +5% Disturbance −5% Disturbance +10% Disturbance −10% S1 S2 S3 S4 S4 Figure 3: Evaluation score fluctuation curve of F-AHP model under different disturbance amplitudes Table 6:Comparison of stability indicators between F-AHP and AHP models under different testing dimensions Test Dimension Indicator F-AHP Model AHP Model Mean score fluctuation Input disturbance stability 0.037 0.089 rate Consistency ratio fluctuation CI standard deviation 0.0026 0.0093 Extreme sample rank jump Transition frequency 10% (1/10) 30% (3/10) rate Robust boundary control Fuzzy buffering effect Strong Weak ability From the above experimental results, it can be seen metering devices to solve the state evaluation problem of that the F-AHP model exhibits better stability and power equipment. robustness than traditional methods in dealing with input disturbances, consistency changes, and boundary 6.1 Scope and limitations analysis of the disturbances. This is mainly due to the introduction of model triangular fuzzy numbers and weight fuzzy fusion strategy in the construction of fuzzy judgment matrix in This study proposes and implements a method for the model, effectively alleviating the excessive evaluating the overall state of electric energy measuring sensitivity of subjective weighting to the final result. At instruments using the Fuzzy Analytic Hierarchy Process (F- the same time, the introduction of a hierarchical structure AHP), which is widely flexible and can be used for ensures the coordination and response balance between evaluating the overall state of different power measurement different dimensions in a complex indicator system, tools. Especially when there are complex data sources and enabling the entire model system to maintain good vague or subjective information between measurement tool evaluation reliability and systematicity when facing indicators, it can effectively quantify fuzzy information, multi-source heterogeneous and uncertain data inputs in making the state evaluation results highly professional and actual power application scenarios. practical; The multi-level hierarchical structure and automatic weight adjustment function have played an important role in the inspection and evaluation of newly put 6 Discussion and expansion into operation devices, normal operation monitoring, and The F-AHP model constructed in this study handling of aging and failure exit equipment. However, the demonstrates the stability of evaluation results and the application of the model is still influenced by the rationality ability to distinguish important information in a complex of the evaluation index system design and the credibility of and diverse information environment, which is much expert evaluation data, because establishing a fuzzy higher than traditional empirical methods and fuzzy judgment matrix relies on the experience of experts. If there analytic hierarchy process (F-AHP). Based on is a significant difference in their level of understanding, it experimental opinions in different situations, the will affect the fairness of the model output results; When accuracy and adaptability of this model are good, and it the actual application has special working conditions or has strong scalability and practical value, providing an newly added types, extremely low data volume, and intelligent evaluation method for electric energy extremely poor regularity, it may be limited by the model's 330 Informatica 49 (2025) 319–332 C. Xu et al. generalization ability, and it may be necessary to adjust three common types of measuring instruments, namely user the indicator weights or evaluation levels according to side smart meters, station side multifunctional meters, and the actual situation. To further ensure the universality of enterprise side measuring systems. the model, expansion experiments were conducted on 0,09 Measurement system for industrial and mining enterprises 0,62 0,07 Multi functional energy meter for station use 0,74 0,05 Household smart meter 0,82 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 standard deviation Sample mean score Figure 4: Applicability performance of the model in different energy metering devices Figure 4 shows the mean state scores generated by some weights, resulting in the model being unstable. the model in different device types, and the standard Therefore, when evaluating systems such as power deviation range is indicated by error bars to reflect the measurement instruments that contain multiple sources of score fluctuations between different samples. errors and unknown states, a cautious fuzzy analytic hierarchy process (F-AHP) is preferred to increase the 6.2 Discussion on parameter sensitivity of model's tolerance for external factors. Thirdly, a change in the upper limit of the set consistency ratio (CR) threshold fuzzy analytic hierarchy process can indirectly lead to a change in the final conclusion. The In the comprehensive state evaluation model, the main general default value is set to 0.1 to draw a conclusion, but influencing factors that need to be considered are that the in the process of complex system evaluation, if the F-AHP results are greatly affected by the setting of a consistency requirements are artificially relaxed, it may series of important variables, especially the design of lead to internal conflicts, causing the weight system to attribute functions for the fuzzy decision matrix, the deviate from the initial judgment conditions and weakening selection of upper and lower boundary points for the explanatory power of the model. evaluating systems, triangular fuzziness, and the synthesis method of weights. controlling the strictness and number of indicators required These variables not only directly affect the ranking of for consistency testing is an important means to ensure the each important indicator, but also affect the stability and practicality of the model. discrimination of the final comprehensive score results. Therefore, exploring the sensitivity of these variables in depth can enhance the interpretability and adaptability of 6.3 Model's potential for promotion in the model. When establishing a fuzzy decision matrix, a smart grids triangular fuzzy representation is usually used, and the The fuzzy analytic hierarchy process proposed by our selection of fuzzy boundaries carries a considerable research institute as a feature for evaluating the overall degree of subjective color. Even if different experts' operation status of electric energy metering devices has evaluation values for indicators fall within the same good universality, scalability, and intelligent integration rating range, their corresponding triangle numbers may capabilities, and can be easily promoted to the smart grid have slight changes, and this change will be amplified in framework system. On the one hand, the overall analysis models with high levels and sensitive interactions. model constructed using this method includes multiple Therefore, it is necessary to plan a reasonable mapping indicators such as accurate and stable measurement, system between fuzzy words and three-dimensional adaptability to power quality, communication capability, numbers to accurately represent the meaning of experts' and adaptability to the working environment, which meet ratings. Secondly, synthesizing fuzzy weights can also the concept of power grid management of equipment have a significant impact on the final result. The lifecycle. The fuzzy theory principle is used to deal with the commonly used weighted average method and the impact of information uncertainty between various maximum minimum method have different strengths in indicators, and comprehensive stability analysis can be reflecting extremes. In the process of experimental carried out under diverse heterogeneous data, improving the verification, if a comprehensive method that is easily ability of power enterprises to identify equipment operation scored full marks by extreme situations is used, it may risks under operating conditions. On the other hand, it has lead to an increase in global rating due to high scores of good interface scalability and data compatibility, making it A Comprehensive Evaluation Model for the State of Electric Energy… Informatica 49 (2025) 319–332 331 easy to collaborate with information management metering devices has become a key issue in ensuring systems, online monitoring systems, and data centers. measurement accuracy and improving energy Whether it is a lightweight installation deployed at the management level. In response to the problems of edge or an application deployed in the control room of insufficient resolution and poor adaptability of existing the dispatch center for centralized processing, its state recognition modes, this paper designs and parameters can be adjusted according to functional needs implements a comprehensive evaluation model for the to adapt to various usage scenarios. It can also be state of electric energy metering devices based on fuzzy associated with distribution automation and connected to analytic hierarchy process (F-AHP), which considers the Internet of Things of electrical equipment to meet multiple indicators. This model is based on expert various business scenarios. At the same time, as experience and on-site data, constructing an indicator intelligent operation and maintenance are gradually system, fuzzy quantification through membership promoted, this model also has the possibility of functions, and combining hierarchical structure and integrating with artificial intelligence technologies such weight allocation to achieve comprehensive integration as machine learning, anomaly detection, fault prediction, of multiple factors influences, outputting quantitative etc. By analyzing the results of previous ratings, it is scores and state level results. possible to build a self-learning system that completes In experimental verification, the model outperforms the transition from "static evaluation" to "dynamic warning", thereby providing the ability to support a traditional weighted scoring and threshold methods in comprehensive state management process of perception, terms of recognition accuracy, scoring stability, and anti- intelligent decision-making, and cyclic control. To interference ability, demonstrating good practicality and upgrade from static evaluation to dynamic intelligent potential for promotion. Especially in the processing of operation and maintenance, this model can be integrated fuzzy boundary samples, the model exhibits stronger with AI modules to construct an intelligent monitoring robustness. However, this study still has certain system. For example, the F-AHP score result can be used limitations. For example, the weight judgment process as a "health label" for device operation, which can be relies heavily on expert experience, which may cause used to train lightweight classifiers (such as SVM, fluctuations due to subjective biases; At the same time, XGBoost) to achieve fast prediction of new device states; the sample size for verification is relatively limited and At the same time, time series anomaly detection has not fully covered diverse device scenarios. Future algorithms such as LSTM-AE and Isolation Forest can research can be further expanded from the following be combined to dynamically monitor the trend of device aspects: firstly, combining large-scale operation logs status score changes, achieving intelligent early warning with automatic data collection to further enhance the of phenomena such as score mutations and boundary objectivity and adaptability of scoring; The second is to fluctuations. This model structure can also be embedded explore a data-driven dynamic weight adjustment in IoT platforms, collecting real-time data through edge mechanism to weaken expert dependence; The third is to gateways and quickly scoring it as input for the operation integrate F-AHP with machine learning models to feedback indicators of digital twin systems, providing construct a state recognition framework with self- high timeliness decision support for scheduling systems. learning capabilities, achieving the transition from static It is worth noting that the F-AHP method will face evaluation to real-time intelligent monitoring. Overall, the problem of increased computational complexity in the state assessment model constructed in this article the construction of its judgment matrix and fuzzy weight provides a feasible path for the intelligent management of calculation process when facing large-scale device energy metering devices and lays a methodological clusters or significantly increased indicator dimensions foundation for the high reliability operation of future (such as n>30). In theory, the computational complexity smart grid measurement and control equipment. of its judgment process at each layer is about O (n ²), and as the number of indicator layers or expert groups expands, the computational time and difficulty of Funding consistency testing significantly increase. Therefore, in practical deployment, a distributed computing strategy The research is supported by: Science and technology can be adopted to modularize weight calculation and projects of State Grid Ningxia Electric Power Co., Ltd. process it in parallel; At the same time, an expert scoring template library is constructed through historical data to achieve automated filling of the judgment matrix. This References model is also suitable for encapsulation into operation [1] Wang Z, Han X, Wang Y. High-precision substation and maintenance platforms in a microservices manner, electric energy metering device based on an adaptive with good resource scheduling and load control algorithm[J]. Journal of Physics: Conference capabilities in large-scale device management scenarios, Series,2024,2782(1): https://dol:10.1088/1742- ensuring efficient and stable output of evaluation results. 6596/2782/1/012053. [2] Alvarez-Valera H H, Maurice A, Ravat F, et al. 7 Conclusion Energy Measurement System forData Lake: An With the continuous development of smart grids, how Initial Approach[C]//Asian Conference on Intelligent to efficiently identify the operating status of energy 332 Informatica 49 (2025) 319–332 C. Xu et al. Information and Database Systems.Springer, Sustainable Energy, Singapore, 2024.https://dol:10.1007/978-981-97- 2023.https://dol:10.1002/ep.14072. 4982-9_2. [13] Paunkov N , Popov R , Parushev A ,et [3] Zhang F ,Guo J ,Yuan F , et al.Research on al.MEASUREMENT OF ELECTRICAL POWER Intelligent Verification System of High Voltage AND ENERGY WITH A SMART Electric Energy Metering Device Based on Power METER[J].INTED2023 Proceedings, Cloud[J].Electronics,2023,12(11):https://dol:10.3 2023.https://dol:10.21125/inted.2023.1300. 390/electronics12112493. [14] Alrobaie A S , Krarti M .Measurement and [4] Zheng K , Zhou M , Zhang Y .Status evaluation of Verification Building Energy Prediction (MVBEP): electric energy metering device (EEMD) based on An interpretable data-driven model development and artificial intelligence technology[J].Proceedings analysis framework[J].Energy and Buildings, of SPIE, 2022, 2023.https://dol:10.1016/j.enbuild.2023.113321. 12500(000):7.https://dol:10.1117/12.2662643. [15] Qu Y , Wang J , Cheng X ,et al.Migratable Power [5] Natarajan N , Vasudevan M , Dineshkumar S K ,et System Transient Stability AssessmentMethod Based al.Comparison of Analytic Hierarchy Process on Improved XGBoost[J].Energy Engineering, 2024, (AHP) and Fuzzy Analytic Hierarchy Process (f- 121(7):1847- AHP) for the Sustainability Assessment of a Water 1863https://dol:10.32604/ee.2024.048300. Supply Project[J].Journal of The Institution of [16] Zhao J , Huang S , Cai Q ,et al.Research on Engineers (India), Series A. Civil, architectural, Distributed Renewable Energy Power Measurement environmental and agricultural engineering, and Operation Control Based on Cloud-Edge 2022.https://dol:10.1007/s40030-022-00665-x. Collaboration[J].EAI Endorsed Transactions on the [6] Li Y , Dong Y , Pan J ,et al.Black box fault Energy Web, 2024, diagnosis method of power metering device based 11(1).https://dol:10.4108/ew.5520. on digital filtering and EMD [17] Prastowo H , Pitana T , Artana K B ,et method[J].Proceedings of SPIE, 2023, al.Development of Measurement Tool for Energy 12788(000):8.https://dol:10.1117/12.3004699. Efficiency Operational Indicator (EEOI)[J].IOP [7] Yang W , Wang G , Wu W ,et al.Analysis of factors Publishing Ltd, 2024.https://dol:10.1088/1755- influencing electric energy measurement in power 1315/1423/1/012026. systems[J].AIP Advances, 2024, [18] Huang T , Zhiwu W U , Zhan W ,et al.ELECTRIC 14(2):6.https://dol:10.1063/5.0190948. ENERGY METERING ERROR EVALUATION [8] Yamamoto T , Tokura Y .Energy exchange and METHOD BASED ON DEEP fluctuations between a dissipative qubit and a LEARNING[J].Scalable Computing: Practice & monitor under continuous measurement and Experience, 2024, feedback[J]. 25(3).https://dol:10.12694/scpe.v25i3.2789. 2024.https://dol:10.21468/SciPostPhysCore.8.1.0 [19] Wang C .A method for identifying and evaluating 16. energy meter data based on big data analysis [9] Shah A M ,Yuan R ,Zheng S , et al.Modeling of technology[J].International Journal of Information Nonlinear Load Electric Energy Measurement and and Communication Technology, 2023, Evaluation System Based on Artificial Intelligence 23(4):22.https://dol:10.1504/IJICT.2023.134852. Algorithm[J].Recent Advances in Electrical & [20] Yue Z , Yao X , Chen Y .A New Fuzzy Analytic Electronic Engineering,2023,16(2):94- Hierarchy Process Method forSoftware 102..https://dol:10.2174/23520965156662205181 Trustworthiness Measurement[C]//International 21454 Conference on AI Logic and Applications.Springer, [10] Gerber D L , Ghatpande O A , Nazir M ,et Singapore, 2022.https://dol:10.1007/978-981-19- al.Energy and power quality measurement for 7510-3_18. electrical distribution in AC and DC microgrid buildings[J].Applied Energy, 2022, 308.https://dol:10.1016/j.apenergy.2021.118308. [11] Jintao C ,Binruo Z ,Fang Z , et al.Prediction Method of Electric Energy Metering Device Based on Software-Defined Networking[J].International Journal of Information Security and Privacy (IJISP),2022,16(2):1- 20.https://dol:10.4018/IJISP.308316. [12] Taherikhonakdar Z , Fazlollahtabar H .Proposing the method for Operating System energy consumption measurement with energy efficiency approach[J].Environmental Progress & https://doi.org/10.31449/inf.v49i12.8933 Informatica 49 (2025) 333–344 333 Models And Methods of Analysing Infrastructure Performance in Cloud Environments Based on Process Optimisation Methods Pavlo Kudrynskyi1, *, Oleksandr Zvenihorodskyi2, Yaroslav Bai2 1Department of Computer Science, State University of Information and Communication Technologies Kyiv, 03110, Ukraine 2Department of Artificial Intelligence, State University of Information and Communication Technologies Kyiv, 03110, Ukraine E-mail: pavlokudrynskyi@ukr.net, o.zvenihorodskyi@outlook.com, yar-bai@hotmail.com *Corresponding author Keywords: performance evaluation, workflow improvement, neural network technologies, resource management, dynamic workloads, cloud services Received: April 16, 2025 The study aimed to develop models and methods for analysing infrastructure performance in cloud environments that consider the complexity and dynamism of modern IT systems. The development of adaptive resource management models capable of responding to changing loads in real time was emphasised. New methods of process optimisation were developed, including the use of artificial neural networks for load forecasting and dynamic resource allocation. Solutions for efficient management of computing and storage capacities were modelled and simulated. The use of adaptive models based on neural network technologies has increased the accuracy of load forecasting by up to 95% and reduced costs by 20% through the automation of resource management. Practical experiments conducted in the Amazon Web Services (AWS) and Microsoft Azure environments confirmed the effectiveness of the approaches under various load conditions. These results help to improve the stability of cloud services, reducing the risk of overload, downtime and data loss. The proposed models are universal and can be applied in various industries, including the financial sector, e-commerce and healthcare, which allows them to effectively solve the problems faced by modern information systems. The findings of the study highlight the importance of integrating artificial intelligence into performance management, which ensures the flexibility and scalability of cloud environments. This creates new opportunities to optimise processes, improve service quality and reduce operating costs, creating the basis for further research and development in the field of cloud computing. Povzetek: Študija razvija adaptivne modele in metode za analizo delovanja infrastrukture v oblaku, ki temeljijo na globokem učenju (nevronske mreže) za dinamično upravljanje virov. To je omogočilo boljše napovedi obremenitve in zmanjšanje stroškov v okoljih AWS in Azure, kar povečuje stabilnost in učinkovitost storitev. 1 Introduction management. For example, M. Abdullah and M. Mohamed Surputheen [2] noted that static models often In the modern world, cloud computing has become an used for performance analysis do not consider the important element of IT infrastructure for enterprises and dynamic nature of loads inherent in modern cloud organisations of varying scales. Cloud computing enables infrastructures. These approaches do not facilitate optimal efficient use of resources, reduces infrastructure costs and resource allocation, particularly during fluctuations in user provides flexibility in working with data. However, the activity or when processing large datasets. The authors growing popularity of cloud services poses new advocate for the implementation of adaptive models, challenges, particularly in managing their performance, although their analysis remains largely conceptual. efficiency and security. One of the main challenges is to Similarly, H. Alrammah et al. [3], who addressed the ensure the high performance of cloud infrastructures under limited scalability of cloud platforms within static variable loads, as well as to optimise the cost of computing resource management models. The authors noted that such resources [1]. Consequently, investigating novel models approaches do not consider unpredictable changes in the and methodologies for analysing the performance of cloud load that often occur due to peak user activity. They environments is both a significant and pressing endeavour. proposed the use of adaptive algorithms, but their research According to numerous studies, existing approaches to is mostly limited to basic simulations, without a detailed assessing performance in cloud environments have analysis of performance in real-world conditions. significant limitations that affect the efficiency of resource 334 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. The study by A. Tiwari and S. Yadaw [4] also confirms In this context, it is also worth noting the importance that static resource management approaches do not of adaptive systems in effective resource management in provide adequate efficiency in dynamic cloud cloud environments. Adaptive approaches can be used to environments. The authors analysed in detail the dynamically respond to changes in load, ensuring efficient shortcomings of such methods and noted that they are use of available resources and minimising their excessive particularly inefficient during peak loads. A. Tiwari and S. consumption. The implementation of such systems not Yadaw emphasise the importance of implementing only increases the stability and reliability of cloud services adaptive technologies that can predict load changes and but also contributes to economic efficiency, as it makes it adapt resources accordingly. Although their study is possible to reduce infrastructure costs without losing mainly focused on analysing existing approaches, it lays service quality [10]. This approach is especially important the theoretical foundation for the integration of smart in today's environment, when organisations face large systems. volumes of data and demands on the speed of information R. Anayat [5] explored the role of machine learning in processing, as well as a high level of flexibility and enhancing the performance management of cloud scalability in their systems. infrastructures. The author noted that the basic algorithms In general, based on the aforementioned that are often used do not consider the complexity and considerations, this study aims to develop novel models variability of the real-world conditions in which cloud and methods for analysing the performance of cloud platforms operate. R. Anayat recommends the use of deep infrastructures that combine adaptive and intelligent neural network models that can provide more accurate approaches. These models should operate efficiently forecasting and adaptation of resources, but the study amidst constant load changes, ensuring high performance, remains mostly theoretical and does not offer detailed resilience, and cost-effectiveness under varying practical implementations. operational conditions. This will not only improve system Despite advancements in cloud computing research, performance but also expand opportunities for the use of there is a dearth of comprehensive approaches that cloud technologies in various industries, such as financial integrate various technologies for optimising and services, healthcare, and e-commerce. managing resources under real-world workloads. The problem of adaptive management of cloud infrastructures 2 Materials and methods that can effectively respond to changing conditions remains unresolved in many scientific papers. Therefore, The research is based on two major cloud platforms: it is imperative to explore the potential of advanced Amazon Web Services (AWS) and Microsoft Azure. optimisation methods, including neural network Modelling and simulations were conducted on these technologies, to achieve high-performance management platforms to study the effectiveness of different efficiency in cloud environments. approaches to performance optimisation. Previous studies show that most existing models are The study was conducted on equipment located in unable to effectively account for load variability. Standard AWS and Microsoft Azure data centres. Each server had optimisation algorithms may prove ineffective under the resources ranging from 2 to 16 processor cores and 8 to 64 high dynamism and scalability of cloud systems [6,7]. GB of RAM, which provided the necessary capacity for Studies such as those by O.B. Johnson et al. [8] confirm conducting load tests and performance monitoring. that without the use of adaptive management methods, it Apache JMeter and Stress-ng tools were used to simulate is impossible to ensure stability and efficiency in the the load on the servers, which was used to simulate various operation of cloud infrastructures. Thus, there is a need to load scenarios in cloud environments. The performance of develop methods that can adjust resources in real-time and the systems was monitored using Amazon CloudWatch consider multifaceted changes through the integration of and Azure Monitor monitoring interfaces, which provide artificial intelligence. detailed information on resource usage. For the statistical A. Talha et al. [9] also discussed approaches to using analysis of the data obtained, the R environment was used machine learning for load forecasting and automatic to process and visualise the results, as well as the SPSS resource scaling in cloud platforms. However, contrary to software package to perform significance tests and their work, which focused on basic machine learning compare the results between different server methods, the study focuses on deep neural networks, configurations and cloud platforms. which allow for more accurate forecasting and adaptation Resource allocation adaptation models were of resources under highly dynamic loads. developed using recurrent neural networks and Long Machine learning methods, in particular neural Short-Term Memory networks. These models specialise in networks, have great potential to solve this problem, as processing time series of data, such as central processing they allow modelling complex relationships between unit (CPU) utilisation, memory, disc operations and various system parameters and predicting future load. network traffic. The developed models were integrated However, to date, there is very little research combining into a real-time dynamic resource scaling system. By these methods with cloud technologies. This study aims to predicting load peaks, the system adapted, adding or fill this gap and develop new approaches for integrating releasing resources as needed. machine learning into cloud infrastructure optimisation The sample for this study was formed based on the processes. characteristics of typical cloud environments that are Models And Methods of Analysing Infrastructure Performance in… Informatica 49 (2025) 333–344 335 widely used in real organisations to ensure reliability, the performance of different types of virtual machines was scalability and efficient resource management. These compared. The study determined that with optimal load environments were chosen to replicate a variety of balancing, even low-tier servers can achieve performance industry-standard cloud setups, with differing compute similar to high-end machines at significantly lower costs. and storage needs that represent actual cloud service The resilience of cloud platforms to failures and high deployments, in order to guarantee the models' loads was evaluated. To do this, error injection methods applicability. For AWS, three types of instances were (for example, Chaos Monkey) and load simulation using selected: standard, storage, and compute-intensive, which Kubernetes Stress Test Tools were employed. Chaos meet different performance and load requirements. For Monkey was applied by randomly terminating instances to Microsoft Azure, similar configurations were chosen to simulate system failures and assess recovery capabilities. provide a comparison between the two most popular cloud Kubernetes Stress Test Tools was employed to simulate platforms. Examples from both AWS and Azure were high traffic conditions, testing the platform's ability to chosen to provide a clear and equitable comparison, handle resource scaling and maintain stability under heavy encompassing a variety of workloads, such as content loads. The main criteria were the percentage of data loss delivery networks, high-performance computing and the average recovery time after a failure. The applications, and transactional databases. The choice of evaluation demonstrated that platforms with automatic configurations was based on real-world use cases, such as scaling and redundancy mechanisms provide high web application hosting, big data processing, and file resilience even in critical conditions. storage. The next stage included resource management and The study also determined the amount of data cost-effectiveness analysis. Dynamic scaling algorithms processed and the level of traffic, which ranged from reduced the cost of renting cloud resources by 25% and moderate (constant load on the servers) to highly dynamic reduced server downtime. This section compared the (with sharp traffic spikes at certain times). This diversity effectiveness of static and adaptive management by was used to evaluate the ability of the platforms to adapt evaluating key performance indicators such as system to changing conditions and ensure high performance under uptime, resource utilisation, and cost efficiency. It showed different loads. For each server configuration, several load a significant reduction in costs and improvement in scenarios that varied depending on the type and degree of performance when using the adaptive approach. user activity were created. These scenarios were created to At the final stage, the infrastructure performance was mimic the behaviour of real-world applications under optimised using multi-criteria algorithms, such as genetic various operating situations in addition to testing the algorithms and the particle swarm method. Simulation scalability of the system. They ranged from a stable load platforms (CloudSim, iFogSim) were used to test the (where the servers operate at an average level of developed models. They simulated cloud environments performance) to a highly dynamic load (where the load and evaluated resource allocation strategies under various increases sharply at certain times). load conditions. The main criteria were to reduce query The study was conducted in a real-world environment processing time and increase overall performance, where each platform used its typical performance considering energy consumption. The platforms were monitoring tools. Amazon CloudWatch was used for compared using static and adaptive resource management AWS and Azure. Azure Monitor was used for Azure, methods. The results showed that the optimisation which allowed for accurate monitoring of service improved performance by 18-22%. performance, including CPU Utilisation, Network This approach identified the most effective resource Throughput, Memory Usage and Disk I/O. The study was management strategies that automatically optimise their conducted on servers located in geographically dispersed use under high loads, minimising infrastructure costs and data centres, which was used to examine the performance ensuring stable system operation under changing of the platforms in different locations and physical conditions. In addition, adaptive management algorithms distances between servers. have reduced operating costs for computing power The performance of the cloud infrastructure was without losing data processing efficiency. assessed. The main criteria were system response time (ms), throughput (requests/sec), and resource utilisation 3 Results (CPU, memory, and disk space). Log files of real cloud platforms (AWS, Azure, Google Cloud) and synthetic 3.1 Comparative performance analysis of tests (for example, Apache Bench) were used. The results AWS and Microsoft Azure cloud platforms showed that performance significantly decreases at peak loads, which requires dynamic resource management. and development of resource allocation The efficiency of resource use, which was determined adaptation models by power consumption (W/request) and the efficiency of servicing requests per unit of equipment, was analysed in As part of the research, models for real-time adaptation of the study. This helps in understanding whether the cloud resource allocation based on intelligent algorithms, such infrastructure is over-provisioned or underutilised, leading as recurrent neural networks and Long Short-Term to potential cost savings or performance issues. Profilers Memory networks, were developed. These models such as Cloud Harmony and Prometheus were used, and specialised in processing time-series data, such as CPU, 336 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. memory, disc operations, and network traffic. The When it comes to memory usage, both platforms can development process involved several key steps. First, the provide stable performance under a steady load, but data was prepared: it was normalised, cleaned of Azure's memory usage is less efficient when the demand anomalies and segmented to ensure the quality of training. for resources is variable. In the case of sudden load peaks, The models were trained with an emphasis on analysing static management on Azure does not efficiently limit long-term dependencies in time series, which was used to memory usage, which can cause overloading of certain identify hidden patterns in load changes. Model instances and degradation of overall system performance. optimisation included the use of genetic algorithms and Instead, AWS demonstrates better results in terms of particle swarming to tune hyperparameters and find memory allocation among instances. Based on the data optimal resource configurations that minimise response obtained, memory usage on the AWS platform is 6-8% time and energy consumption. Clustering algorithms were more efficient than Azure in static resource management. also used to group servers and resources based on This demonstrates AWS's superior ability to maintain load similarity in load, which contributed to their more efficient balance without critical overloads on certain nodes, even use. with static resource allocation. When using static management methods, which Network Throughput is a critical factor for the involve a fixed allocation of resources without the ability performance of cloud platforms, especially when there are to dynamically scale them in real-time, it is important to large volumes of data transfer between services [13,14]. assess how each platform handles loads under conditions With static resource management, AWS demonstrates of stable and variable demand. Static resource better results in providing stable and high-performance management does not allow for adaptation to load network performance. With more optimised data paths fluctuations, which can lead to inefficient use of and better geographical distribution of its data centres, computing power, memory, and other resources [11,12]. AWS can provide more stable and faster data transfer, However, to compare the performance of AWS and even at peak loads. Microsoft Azure in static management Microsoft Azure platforms in static resource management, conditions shows slightly lower network throughput in the four main indicators should be considered: CPU case of high volumes of data transfer between instances. Utilisation, Memory Usage, Network Throughput and The difference in throughput is 10-12% in favour of AWS, Disk I/O. which is the result of less efficient load balancing in the As shown in Table 1, both AWS and Microsoft Azure network on the Azure platform. deliver stable CPU utilisation results when running static Disk I/O is an important parameter for cloud resource management. However, when there are platforms, as it determines the speed of reading and significant load peaks, AWS is usually more efficient in writing data to the disc. Both platforms provide high managing CPU resources, as its default algorithms performance when using disk resources in static mode. provide more efficient load balancing between instances. However, with large volumes of disk operations, it turns In Microsoft Azure, the CPU utilisation situation may be out that AWS can better cope with high disk loads due to less optimised, as it does not have the same flexibility to more optimised caching and storage methods. Microsoft scale instances in real-time, which leads to the overloading Azure, although it demonstrates good results in terms of of certain instances while others remain underutilised. Disk I/O, has certain limitations under static management at high loads. Tests have shown that the efficiency of using Table 1: Comparison of AWS and Azure performance by disk resources on Azure at a stable load is 7-9% worse key metrics in static management than on AWS, which is the result of a less optimised CPU Network organisation of the disk subsystem under static Memor Disk Platform Utilisatio Throughpu y Usage I/O management. n t The static resource management on both platforms 50 shows certain limitations in the face of variable AWS 95% 89% 95 MB/s MB/ workloads. While both platforms perform similarly under s steady resource demand, AWS delivers better 48 Microsof 92% 90% 92 MB/s MB/ performance under dynamic workloads by making more t Azure s efficient use of its compute, memory, network, and disk resources. These differences can be associated with their Table 1 shows that in terms of static resource storage options, network architecture, and scaling and management, the performance of both platforms is similar, resource allocation strategies. Better dynamic scaling and but AWS demonstrates better performance in most key load balancing algorithms enable AWS to effectively metrics. CPU utilisation rates indicate a high load on the distribute resources in real-time based on varying demand, processors of these systems. This means that most of the which is why it performs better than Azure. AWS's computing resources are used to process requests, which extensive worldwide network of data centres and well- may indicate that the system is operating efficiently but designed storage solutions further improve its capacity to also indicates that delays or performance degradation may manage peak loads and large data volumes without occur if the load is increased further. experiencing performance issues. Azure's static resource management methodology, on the other hand, lacks real- time adaptability and results in less effective resource Models And Methods of Analysing Infrastructure Performance in… Informatica 49 (2025) 333–344 337 allocation, particularly during periods of changing Microsoft Static 83% 2100 W/hour demand, which causes instances to be underutilised or Azure overloaded. Because of this, AWS offers greater Source: compiled by the authors. flexibility, faster resource adjustments, and better overall performance during dynamic workloads, whereas Azure A comparison of adaptive and static resource functions well in stable environments but has trouble management shows significant advantages of adaptive handling variations in peak demand. methods: AWS scores demonstrated significant improvement 1. Uptime without downtime. Adaptive over Microsoft Azure in such areas as CPU Utilisation and management ensures 98% (AWS) and 97% (Azure) Network Throughput, which improves the platform's uptime, which is 10-15% higher than static management. scalability under highly dynamic workloads by 15%. This 2. Power consumption. Adaptive management can allows the AWS platform to handle variable workloads reduce energy costs by up to 1500 W/h for AWS and 1700 faster and more efficiently, reducing latency and W/h for Azure, which is 5-6% less than static methods. improving overall performance. Adaptive management significantly improves the At the same time, Microsoft Azure performs better efficiency and stability of cloud platforms by predicting under stable workloads, particularly in the Memory Usage load and automatically scaling [17]. The study results aspect, demonstrating a 10% improvement. This suggests showed that adaptive resource management based on deep that Azure is more efficient when resource demand is learning methods significantly improves server efficiency fixed, making it more attractive to organisations that have by reducing power consumption and reducing downtime. a stable infrastructure load. This is achieved by accurately predicting the load and automatically scaling resources in response to changes in 3.2 Assessing the effectiveness of adaptive the load. Compared to static management methods, adaptive technologies can reduce downtime by 10-15%. resource management This means that cloud services operate more stably, even Further experiments were aimed at evaluating the in cases of high or variable loads, providing uninterrupted impact of adaptive resource management on the overall access to resources for users. performance of cloud platforms, comparing adaptive and This is especially relevant for cloud infrastructures that static resource management. For this purpose, two main often face high dynamic loads, such as large volumes of scenarios were applied, where one used traditional static traffic, spikes in user activity, or sudden changes in management and the other adaptive management based on computing resource requirements. Static methods based deep learning methods. on fixed capacity reservations cannot effectively respond Table 2 presents a comparative analysis of cloud to such changes, which often leads to the overuse of platforms employing adaptive versus static resource resources at times of low load or system overload at high management, evaluated across two key metrics: uptime loads. At the same time, adaptive technologies that use and power consumption. Uptime, defined as the deep learning can adjust resources in real time, percentage of operational time without system anticipating changes in load and adjusting them interruptions, demonstrates a marked advantage in accordingly to ensure optimal system performance. adaptive management systems. In the case of adaptive Thus, the results demonstrate that the implementation management, the platform automatically scales in of adaptive technologies is critical to optimise the response to changes in load, which reduces the risk of performance of cloud infrastructures, particularly in downtime and ensures high stability, which explains the conditions of high load dynamics. This reduces costs, high score for this parameter. Static control, on the other minimises downtime and ensures more stable and efficient hand, does not respond to changes in load, which increases operation of cloud services, which is important for the probability of overloads and, consequently, downtime. businesses that depend on uninterrupted access to Energy consumption shows the percentage of costs for computing power. using cloud resources. Adaptive management can use resources efficiently, scaling them depending on the load, 3.3 Resistance to load changes and error which reduces costs [15,16]. Static management, which injection testing does not adapt resources to changes, leads to higher costs because resources are used less efficiently. To assess the resilience of cloud platforms, testing was conducted that included sudden changes in load, such Table 2: Performance results of cloud platforms with as traffic spikes and processing large amounts of data in a adaptive and static resource management short period. The results showed that adaptive resource Platform Type of Operating time Energy management provides significantly better platform control without consumption resilience to outages and changes in load. For instance, for downtime AWS with adaptive management, the percentage of data AWS Adaptive 98% 1500 W/hour loss was 0.5%, the average recovery time after a failure AWS Static 85% 2000 W/hour was 3 minutes, and the performance degradation during Microsoft Adaptive 97% 1700 W/hour peak loads was only 8%. In comparison, AWS with static Azure management showed 3.2% data loss, 12 minutes of 338 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. recovery time, and an 18% performance degradation. For 3.4 Reduction of infrastructure costs Microsoft Azure with adaptive management, the percentage of data loss was 0.7%, the average recovery Adaptive resource management in cloud time was 4 minutes, and the performance was 7%. In infrastructures has proven to have significant cost-saving contrast, Azure with static management had 4.1% data benefits. Resource efficiency avoids situations when loss, 15 minutes of recovery time, and a 22% performance servers are running at low load or overloaded, which is degradation. Thus, adaptive resource management allows within normal parameters for static management methods. for better fault tolerance and high performance during Real-time optimisation of resource allocation minimises peak loads, while static management demonstrates the amount of unused computing capacity, thus reducing significantly worse results in terms of data loss, recovery the direct costs of renting or operating them. time, and performance. In the tests, both platforms In addition, resilience to changes in load provides demonstrated the ability to effectively handle these load flexible scaling that allows platforms to effectively handle changes, but AWS performed significantly better in terms peak loads without having to maintain excessive resource of rapid recovery and resource adaptation. At high peak reserves [18,19]. This is particularly relevant for loads, AWS proved to be more efficient in load balancing, businesses with irregular or seasonal operations, where which reduced response times and avoided delays in adaptive management can reduce the need for long-term request execution. This ensured high availability of leases or additional capacity, reducing costs by up to 20% services, even with significant load fluctuations. compared to static approaches. Thus, efficiency and Compared to Microsoft Azure, AWS has shown greater resilience to change not only reduce operating costs but flexibility in scaling resources, which has enabled faster also increase the cost-effectiveness of cloud infrastructure response to sudden changes in traffic and loads, increasing while ensuring stability and quality of service. the overall resilience of the platform. Table 3 shows a comparison of infrastructure costs for There are a number of reasons why AWS and Azure static and adaptive resource management methods on function differently, including variations in their designs, AWS and Microsoft Azure. approaches to resource management, and load balancing systems. AWS's superior load-balancing algorithms and Table 3: Reduced infrastructure costs when using capacity to effectively divide workloads among numerous adaptive management instances allow it to scale resources with greater Reduction of flexibility, particularly during periods of high peak load. Type of Infrastructure costs with Platform This guarantees faster response times and fewer execution control costs (%) adaptive management delays for requests. Azure, on the other hand, struggles with resource allocation during dynamic load variations, Adaptive 20% 20% leading to instances that are either underutilised or AWS overcrowded, even if it performs well under constant load Static 25% - levels. Additionally, AWS gains from a more strategically placed data centre network, which improves network Adaptive 18% 22% throughput and overall performance during periods of Microsoft Azure high traffic. Azure's performance, on the other hand, is Static 23% - typically more reliable but less effective at managing abrupt surges in traffic. Additionally, AWS's predictive Source: compiled by the authors. resource management and improved machine learning model integration allow for quicker adaptability to shifting For AWS, adaptive management can reduce costs by traffic patterns, which reduces data loss and speeds up 20% from 25% with a static approach to 20% with an recovery. In conclusion, because of its sophisticated adaptive approach. In Microsoft Azure, the adaptive resource scaling, better load balancing, and quick response approach reduces costs by 22% from 23% with static to abrupt traffic fluctuations, AWS performs better than management to 18%. This shows that adaptive Azure in dynamic situations. management, thanks to dynamic resource optimisation, AWS demonstrates greater flexibility and efficiency in provides significant cost savings compared to static adapting resources to peak loads. One of the key findings methods for both platforms. These differences can stem of the study was that adaptive resource management based from their resource management approaches. Real-time on predictive models can significantly reduce load forecasting and adaptive scaling provided by AWS infrastructure costs, increasing its cost-effectiveness. allow for more effective resource allocation, which lowers Predictive models based on neural networks can the need for overprovisioning and minimises idle accurately predict the future load on cloud resources and resources, ultimately saving more money. Azure is less automatically adapt the distribution of computing power cost-effective than AWS due to its less flexible static and memory to ensure optimal resource utilisation. This resource allocation, which leads to underutilisation during avoids overcapacity and reduces the need for excessive periods of low demand and overutilisation during periods use of infrastructure to handle peak loads, which is one of of high demand. As a result, AWS's dynamic resource the main causes of cost overruns in traditional static management strategy reduces costs more effectively, resource management models. particularly for workloads that fluctuate. Models And Methods of Analysing Infrastructure Performance in… Informatica 49 (2025) 333–344 339 In summary, adaptive management showed a and static management modes. Adaptive management on significant reduction in infrastructure costs compared to both platforms is highly efficient, reducing infrastructure static management. Although AWS costs are higher, costs and maintaining the required level of performance. adaptive management performed better for both platforms, Through the implementation of forecasting and reducing costs more than static management. Thanks to automatic scaling mechanisms, adaptive resource predictive methods and automatic scaling, the system management significantly optimises infrastructure adapts resources to real needs, which can reduce utilisation [20]. However, this approach may require infrastructure costs by 20% compared to static additional setup and monitoring costs. Static control, management, where costs can be significantly higher due although easier to implement, can lead to less efficient use to inefficient use of resources. of resources, especially when the load is variable, which increases costs or reduces productivity [21]. Thus, 3.5 Optimisation of the use of computing adaptive management is a better option for efficient use of power and memory computing power and memory, although it can be more difficult to implement and maintain. Table 4 demonstrates a comparison of key performance By leveraging forecasting and automatic scaling indicators (CPU Utilisation, Memory Usage, Network capabilities, adaptive resource management substantially Throughput and Disk I/O) when using static and adaptive optimises infrastructure utilisation [22, 23]. This resource management methods for AWS and Microsoft methodology effectively reduces the operational Azure cloud platforms. It also demonstrates that adaptive expenditures associated with cloud services while management allows for more efficient resource utilisation. ensuring sustained high performance and service Costs are reduced by automatically scaling resources, reliability. This approach is significantly more cost- which allows for high performance while significantly effective and efficient than traditional static management, reducing overconsumption. which cannot effectively respond to changing load conditions. Table 4: Comparison of cloud platform performance by For a more detailed comparison of the effectiveness of key parameters in static and adaptive resource adaptive and static resource management, it is important management to note that a key factor in reducing infrastructure costs is Networ to reduce the time during which resources are operating in Manage CPU Mem Platfo k Disk an elevated mode. In systems with static management, ment utilisat ory rm through I/O resources are often kept in reserve for possible peak loads, method ion usage put which leads to constant capacity costs even during quiet periods [24, 25]. In such systems, resources can be in an Static 60% 70% 65% 60% increased mode (e.g., 80% of capacity) for 70% of the AWS 85% 88% 85% 90% time, which creates significant additional costs. At the Adaptive (+15 (+28 (+25%) (+25%) same time, in systems with adaptive control, resources are %) %) added only when needed, and their use is adjusted Static 58% 68% 60% 58% depending on actual conditions. Therefore, resources are Micros in overdrive only 20% of the time, as the system oft 83% 84% Azure 80% 85% automatically optimises resource allocation according to Adaptive (+15 (+26 (+22%) (+25%) %) %) current needs. This adaptability can significantly reduce Source: compiled by the authors. infrastructure costs as resources are not over-utilised when they are not needed, resulting in greater efficiency and Percentages were calculated as the increase in resource savings. Through the use of predictive techniques, the system efficiency when moving from static to adaptive can not only reduce costs during low load phases but also management. For each indicator, the increase is ensure that additional resources are available when determined relative to the value recorded during static management. The initial values represent the effectiveness needed, which helps maintain high performance and of static methods. minimise the risk of downtime when resources are not The results show that adaptive resource management available to handle peak loads. This process also contributes to the stability of cloud platforms, as allows for more efficient use of computing power, anticipating changes in workload allows operations to memory, network bandwidth, and disk operations. AWS adapt to future changes before they occur, providing demonstrates slightly higher performance growth, especially in CPU Utilisation and Network Throughput. greater confidence in the continuity of services. At the same time, Microsoft Azure shows a steady These results also highlight the great potential of using adaptive methods for a variety of business processes and improvement in all parameters, which indicates the organisations where high efficiency in the use of cloud platform's high adaptability. resources is critical to reducing operating costs while The comparison of platform performance results demonstrates that AWS has overall higher resource ensuring the required performance. The use of such utilisation rates than Microsoft Azure, both in adaptive technologies is especially relevant for environments with high load variability, such as e-commerce, data 340 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. processing, financial services and other industries where collection process in cloud environments. By combining load peaks can occur at unpredictable times. machine learning with data-gathering methods to increase Through the implementation of predictive models and the precision and effectiveness of investigations, it makes adaptive management, businesses can significantly a substantial contribution to the field of cloud forensics. improve their economic performance while ensuring Although in a different context, this confirms the competitiveness and cost reduction, which is a key factor importance of predictive accuracy and standardisation, for modern organisations seeking to make their operations which is also key to adaptive resource management. flexible and resilient in an ever-changing environment. P. Nawrocki et al. [32] addressed short-term and long- term resource reservations, emphasising the need to 4 Discussion respond quickly to sudden peak loads such as flash crowd workload effects. The study looked at machine learning- The results confirm that adaptive management is based adaptive resource planning for cloud-based much more effective than static approaches, especially applications, with an emphasis on how machine learning when the load on cloud infrastructures is dynamic. For models can improve resource planning in cloud instance, the study by N. Du et al. [26] explored the use of environments. The results complement this approach by convex hull triangle mesh-based static mapping in highly showing that adaptive systems can effectively respond to dynamic environments, providing a novel technique for unpredictable loads while minimising costs. Other studies, improving mapping accuracy in such environments. This such as one by S. Ivan et al. [33], have studied the demonstrated that traditional approaches to resource efficiency of different cloud platforms, including AWS management under variable load conditions have limited and Microsoft Azure. The study offered insights into effectiveness. The results of the study confirm this cloud-based data processing for big data applications by statement, demonstrating that predictive models based on highlighting the advantages and disadvantages of each neural networks not only reduce infrastructure costs but platform for doing sentiment analysis at scale. Although also provide high flexibility and adaptability to cloud the study compared platforms, the results support the systems. conclusion of this paper that adaptive models significantly Similar conclusions were made by A. Braafladt et al. improve efficiency regardless of the specific platform. [27] and S. Khan and A. Jillani [28]. A. Braafladt et al. Microsoft’s Azure cloud computing is a fully managed presented an unusual approach to improving defence computing service that was introduced at a conference in modelling and simulation by examining the use of AI- 2008 and became known as Windows Azure and later driven adaptive analysis to detect emergent behaviours in renamed Microsoft Azure. P. Narayanan [34] discussed military capabilities design. S. Khan and A. Jillani the key components and services of Azure, with a special employed search-based software engineering techniques focus on data engineering and machine learning, as well to investigate cloud resource allocation and optimisation, as its impact on various industries due to the availability showing how sophisticated algorithms can be applied to of data centres around the world. P. Borra [35] discussed increase the effectiveness of cloud computing. This the key networking solutions provided by Microsoft emphasised the need to implement adaptive algorithms to Azure, which are the basis for supporting digital ensure the scalability and flexibility of cloud platforms. operations in modern business. The author examines in This correlates with this approach, which has shown the detail Azure components such as Virtual Network, Load effectiveness of using deep learning methods for real-time Balancer, VPN Gateway, ExpressRoute, and Firewall, load forecasting. with a focus on their practical application to ensure B. Predić et al. [29] and I. Petrovska and H. Kuchuk uninterrupted connectivity and improve security. The [30] both aimed to improve cloud resource management study aims to provide organisations with in-depth but took different approaches. In order to improve cloud knowledge and insights to help them effectively leverage load predictions and resource allocation under varying Azure networking services to meet changing business demands, Predić et al. employed a machine learning needs, which can complement the findings of this study. approach. In order to maximise efficiency and guarantee A study by O. Rolik and S. Zhevakin [36] confirmed secure operations, Petrovska & Kuchuk concentrated on the results in terms of cost-effectiveness. The use of adaptive resource allocation for data processing and adaptive management can reduce the cost of cloud security. Both strategies emphasised dynamic resource services by up to 20%, which highlights the importance of management in comparison to the current study, with the results for reducing the financial costs of Petrovska & Kuchuk concentrating on security and Predić organisations. P. Lakhera [37] complements these et al. on prediction accuracy. These concepts are findings by suggesting strategies for cost optimisation supported by the current study, which shows that adaptive using artificial intelligence. Anomaly detection and management improves cost-effectiveness and robustness predictive scaling, which the authors investigated, are key under fluctuating loads. elements for improving cost efficiency. Studies on cloud forensics, such as the one by R. Al- Traditional methods of resource management, as noted Mugern et al. [31], analyse the integration of machine by S. Tendulkar [38], are less effective due to the lack of learning techniques for data standardisation. This work consideration of dynamic changes in the load. This study presents an improved machine learning method that confirms this by demonstrating that predictive models can applies a cloud forensic meta-model to enhance the data more accurately determine resource requirements and Models And Methods of Analysing Infrastructure Performance in… Informatica 49 (2025) 333–344 341 ensure efficient use of resources under variable load edge computing offer prospects for further improving the conditions. S. Jaber [39] also supports the claim that flexibility and performance of cloud platforms. adaptive systems significantly reduce infrastructure costs. Neural network-based models have proven to provide The use of predictive models can reduce costs and highly accurate predictions of load changes, enabling improve the performance of cloud systems. efficient real-time resource adaptation. This significantly AWS, as shown by L. Devane [40], provides a high improves both the performance of cloud platforms and the ability to adapt to peak loads, which is consistent with the stability of systems. The results of the study demonstrate results obtained. Similar conclusions were made by S. the benefits of using intelligent algorithms that can adapt Gong et al. [41], who noted that adaptive systems to changing operating conditions. effectively respond to sudden changes in load, ensuring The results obtained are important for practical the stability of platforms. The study complements these application. They open opportunities to significantly findings by emphasising the importance of reducing reduce business operating costs while ensuring high response times to peak loads. This is important for availability and stability of services. The use of machine organisations that work with large amounts of data and learning-based adaptive control technologies allows for need consistent access to resources in real-time. optimising resource utilisation and minimising downtime In general, the research findings are fully consistent and congestion. with current industry trends, in particular the importance However, the study has several limitations: only two of using adaptive systems to manage cloud resources and cloud platforms were used, which may limit the confirm the effectiveness of load forecasting methods to generalisability of the results, and the number of types of reduce costs and improve performance. At the same time, server configurations for testing is limited. For further it is worth analysing the further development and research, it is advisable to expand the number of cloud improvement of such models based on deep learning and platforms tested, explore the integration of adaptive integration with new technologies such as edge management with new technologies, such as edge computing, which will allow for even greater efficiency in computing, which will significantly improve the real-time management. efficiency of real-time resource management, and improve predictive models using more sophisticated machine learning algorithms to improve the accuracy of predictions 5 Conclusions and system adaptability. The study developed models for load forecasting and resource management, including a neural network model References for load forecasting and an adaptive resource management [1] Berestovenko, O. Virtualisation and network model that automatically adjusts resource use based on management: Best practices for improving forecasts. One of the main achievements was the efficiency. Technologies and Engineering, 2024, confirmation of the effectiveness of using intelligent 25(6): 41-52. https://doi.org/10.30857/2786- algorithms, in particular neural networks, for load 5371.2024.6.4 forecasting and automatic adaptation of resource [2] Abdullah, M., & Mohamed Surputheen, M. allocation in real-time. This reduced the cost of cloud Optimizing performance of cloud infrastructure services by an average of 20% compared to traditional through effective resource scheduling. Journal of static approaches, which confirms the cost-effectiveness Advanced Applied Scientific Research, 2024, 6(1): of the proposed methods. 1-14. https://doi.org/10.46947/joaasr612024748 The study also determined that AWS demonstrated [3] Alrammah, H., Gu, Y., Yun, D., & Zhang, N. Tri- better adaptability under highly dynamic workloads due to objective optimization for large-scale workflow faster resource scaling and more efficient load balancing. scheduling and execution in clouds. Journal of While Microsoft Azure showed a more even distribution Network and Systems Management, 2024, 32(4): of resources at a stable load, which is an advantage in the 89. https://doi.org/10.1007/s10922-024-09863-3 case of a constant load level. The results of the study [4] Tiwari, A.K., & Yadav, S. Algorithmic model for showed that adaptive resource management in cloud cloud performance optimization using connection platforms can achieve significant performance pooling technique. Journal of Statistics and improvements and cost savings. AWS demonstrated a Management Systems, 2024, 27(2): 489-499. 15% improvement in scalability and performance under https://doi.org/10.47974/jsms-1290 highly dynamic workloads, while Microsoft Azure [5] Anayat, R. Cloud-based reinforcement learning in showed a 10% increase in resource allocation efficiency resource-constrained environments: Real-time under stable workloads. The use of predictive models performance optimization in autonomous systems, based on neural networks ensures accurate forecasting of 2024. load changes and automatic adaptation of resources in https://doi.org/10.13140/RG.2.2.24832.24326 real-time. Adaptive algorithms have proven to be more [6] Varanitskyi, D., Rozkolodko, O., Liuta, M., efficient than traditional approaches, especially in the face Zakharova, M., & Hotunov, V. Analysis of data of variable workloads. Further developments in protection mechanisms in cloud environments. technologies such as deep learning and integration with 342 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. Technologies and Engineering, 2024, 25(1): 9-16. Improving accuracy of the spectral-correlation https://doi.org/10.30857/2786-5371.2024.1.1 direction finding and delay estimation using [7] Demchyna, M., Styslo, T., & Vashchyshak, S. machine learning. Eastern European Journal of Optimisation of intelligent system algorithms for Enterprise Technologies, 2025, 2(5(134)): 15-24. poorly structured data analysis. Bulletin of https://doi.org/10.15587/1729-4061.2025.327021 Cherkasy State Technological University, 2024, [17] Porkodi, S., & Raman, A.M. Success of cloud 29(4): 21-31. computing adoption over an era in human resource https://doi.org/10.62660/bcstu/4.2024.21 management systems: a comprehensive meta- [8] Johnson, O.B., Olamijuwon, J., Cadet, E., analytic literature review. Management Review Osundare, O.S., & Samira, Z. Designing multi- Quarterly, 2025, 75(2): 1041-1075. cloud architecture models for enterprise scalability https://doi.org/10.1007/s11301-023-00401-0 and cost reduction. Open Access Research Journal [18] Sandhu, R., Faiz, M., Kaur, H., Srivastava, A., & of Engineering and Technology, 2024, 7(2): 101- Narayan, V. Enhancement in performance of cloud 113. https://doi.org/10.53022/oarjet.2024.7.2.0061 computing task scheduling using optimization [9] Talha, A., Bouayad, A., & Malki, M.O. An strategies. Cluster Computing, 2024, 27(5): 6265- improved pathfinder algorithm using opposition- 6288. https://doi.org/10.1007/s10586-023-04254- based learning for tasks scheduling in cloud w environment. Journal of Computational Science, [19] Soh, J., Copeland, M., Puca, A., & Harris, M. 2022, 64: 101873. Microsoft Azure, 2020. Berkeley: Apress. https://doi.org/10.1016/j.jocs.2022.101873 https://doi.org/10.1007/978-1-4842-5958-0 [10] Slivka, S. Microservices architecture for ERP [20] Singh, S., Ramkumar, K.R., & Kukkar, A. systems. Bulletin of Cherkasy State Technological Analysis and implementation of microsoft Azure University, 2024, 29(4): 32-42. https://bulletin- machine learning studio services with respect to chstu.com.ua/en/journals/tom-29-4- machine learning algorithms. In R. Agrawal, C.K. 2024/arkhitektura-mikroservisiv-dlya-erp-sistem Singh, A. Goyal, & D.K. Singh (Eds.), Modern [11] Destek, M.A., Hossain, M.R., Manga, M., & Electronics Devices and Communication Systems, Destek, G. Can digital government reduce the 2023, (pp. 91-106). Singapore: Springer. resource dependency? Evidence from method of https://doi.org/10.1007/978-981-19-6383-4_7 moments quantile technique. Resources Policy, [21] Kavaldzhieva, K. The Impact of Digitalization on 2024, 99: 105426. the Measurement of value in the production and https://doi.org/10.1016/j.resourpol.2024.105426 operation of industrial products. In 2019 [12] Smailov, N., Tsyporenko, V., Sabibolda, A., International Conference on High Technology for Tsyporenko, V., Abdykadyrov, A., Kabdoldina, Sustainable Development, HiTech, 2019, (Article A., Dosbayev, Z., Ualiyev, Z., & Kadyrova, R. number: 9128260). Sofia: Institute of Electrical Streamlining digital correlation-interferometric and Electronics Engineers. direction finding with spatial analytical signal. https://doi.org/10.1109/HiTech48507.2019.91282 Informatyka Automatyka Pomiary W Gospodarce 60 I Ochronie Srodowiska, 2024, 14(3): 43-48. [22] Kiurchev, S., Abdullo, M.A., Vlasenko, T., Prasol, https://doi.org/10.35784/iapgos.6177 S., & Verkholantseva, V. Automated Control of the [13] Makhazhanova, U., Omurtayeva, A., Kerimkhulle, Gear Profile for the Gerotor Hydraulic Machine. In S., Tokhmetov, A., Adalbek, A., & Taberkhan, R. F. Chaari, F. Gherardini, V. Ivanov, & M. Haddar Assessment of Investment Attractiveness of Small (Eds.), Lecture Notes in Mechanical Engineering, Enterprises in Agriculture Based on Fuzzy Logic. 2023, (pp. 32-43). Cham: Springer. Lecture Notes in Networks and Systems, 2024, 935 https://doi.org/10.1007/978-3-031-16651-8_4 LNNS: 411-419. [23] Bezshyyko, O., Dolinskii, A., Bezshyyko, K., [14] Azieva, G., Kerimkhulle, S., Turusbekova, U., Kadenko, I., Yermolenko, R., & Ziemann, V. Alimagambetova, A., & Niyazbekova, S. Analysis PETAG01: A program for the direct simulation of of access to the electricity transmission network a pellet target. Computer Physics using information technologies in some countries. Communications, 2008, 178(2): 144-155. E3S Web of Conferences, 2021, 258: 11003. https://doi.org/10.1016/j.cpc.2007.07.013 https://doi.org/10.1051/e3sconf/202125811003 [24] Orazbayev, B., Zhumadillayeva, A., Kabibullin, [15] Imamguluyev, R., & Umarova, N. Application of M., Crabbe, M.J.C., Orazbayeva, K., & Yue, X. A Fuzzy Logic Apparatus to Solve the Problem of Systematic Approach to the Model Development Spatial Selection in Architectural-Design Projects. of Reactors and Reforming Furnaces With Lecture Notes in Networks and Systems, 2022, Fuzziness and Optimization of Operating Modes. 307: 842-848. https://doi.org/10.1007/978-3-030- IEEE Access, 2023, 11: 74980-74996. 85626-7_98 https://doi.org/10.1109/ACCESS.2023.3294701 [16] Smailov, N., Tsyporenko, V., Ualiyev, Z., Issova, [25] Sasi, S., Subbu, S.B.V., Manoharan, P., & A., Dosbayev, Z., Tashtay, Y., Zhekambayeva, M., Abualigah, L. Design and implementation of Alimbekov, T., Kadyrova, R., & Sabibolda, A. secured file delivery protocol using enhanced Models And Methods of Analysing Infrastructure Performance in… Informatica 49 (2025) 333–344 343 elliptic curve cryptography for class I and class II [36] Rolik, O.I. & Zhevakin, S.D. Cost optimization transactions. Journal of Autonomous Intelligence, method for informational infrastructure 2023, 6(3). https://doi.org/10.32629/jai.v6i3.740 deployment in static multi-cloud environment. [26] Du, N., Xie, L., Zhou, M., Gao, W., Wang, Y., & Radio Electronics, Computer Science, Control, Hu, J. Convex hull triangle mesh-based static 2024, 3: 160-172. https://doi.org/10.15588/1607- mapping in highly dynamic environments. IEEE 3274-2024-3-14 Transactions on Instrumentation and [37] Lakhera, P. Leveraging large language models to Measurement, 2024, 73: 1-14. optimize costs in Amazon web service cloud. https://doi.org/10.1109/tim.2023.3348881 TechRxiv, 2024. [27] Braafladt, A., Sudol, A., & Mavris, D. AI-driven https://doi.org/10.36227/techrxiv.172684142.2396 adaptive analysis for finding emergent behavior in 6027/v1 military capability design. Journal of Defense [38] Tendulkar, S. Optimizing generative AI model Modeling and Simulation: Applications, performance through cloud resource management Methodology, Technology, 2024. in hybrid AI systems, 2024. https://doi.org/10.1177/15485129241289137 https://doi.org/10.13140/RG.2.2.34745.38246 [28] Khan, S.M. & Jillani, A. Cloud resource allocation [39] Jaber, S. Enhanced model performance in and optimization using search-based software generative AI: Cloud resource optimization for engineering methods, 2024. real-time adaptive autonomous systems, 2024. https://doi.org/10.13140/RG.2.2.17568.19207 https://doi.org/10.13140/RG.2.2.32857.94567 [29] Predić, B., Jovanovic, L., Simic, V., Bacanin, N., [40] Devane, L. Adaptive AI systems in autonomous Zivkovic, M., Spalevic, P., Budimirovic, N., & environments: Real-time decision making and Dobrojevic, M. Cloud-load forecasting via resource allocation through cloud-based decomposition-aided attention recurrent neural reinforcement learning, 2023. network tuned by modified particle swarm https://doi.org/10.13140/RG.2.2.21638.18241 optimization. Complex & Intelligent Systems, [41] Gong, S., Yin, B., Zheng, Z., & Cai, K.-Y. 2023, 10(2): 2249-2269. Adaptive multivariable control for multiple https://doi.org/10.1007/s40747-023-01265-3 resource allocation of service-based systems in [30] Petrovska, I. & Kuchuk, H. Adaptive resource cloud computing. IEEE Access, 2019, 7: 13817- allocation method for data processing and security 13831. in cloud environment. Advanced Information https://doi.org/10.1109/access.2019.2894188 Systems, 2023, 7(3): 67-73. https://doi.org/10.20998/2522-9052.2023.3.10 [31] Al-Mugern, R., Othman, S.H., & Al-Dhaqm, A. An improved machine learning method by applying cloud forensic meta-model to enhance the data collection process in cloud environments. Engineering, Technology & Applied Science Research, 2024, 14(1): 13017-13025. https://doi.org/10.48084/etasr.6609 [32] Nawrocki, P., Grzywacz, M., & Sniezynski, B. Adaptive resource planning for cloud-based services using machine learning. Journal of Parallel and Distributed Computing, 2021, 152: 88-97. https://doi.org/10.1016/j.jpdc.2021.02.018 [33] Ivan, S.C., Győrödi, R.Ş., & Győrödi, C.A. Sentiment analysis using Amazon web services and Microsoft Azure. Big Data and Cognitive Computing, 2024, 8(12): 166. https://doi.org/10.3390/bdcc8120166 [34] Narayanan, P.K. Engineering data pipelines using Microsoft Azure. In P.K. Narayanan (Ed.), Data Engineering for Machine Learning Pipelines, 2024, (pp. 571-616). Berkeley: Apress. https://doi.org/10.1007/979-8-8688-0602-5_17 [35] Borra, P. Microsoft Azure networking: Empowering cloud connectivity and security. International Journal of Advanced Research in Science, Communication and Technology, 2024, 4(3): 469-475. https://doi.org/10.48175/ijarsct- 18949 344 Informatica 49 (2025) 333–344 P. Kudrynskyi et al. https://doi.org/10.31449/inf.v49i12.9433 Informatica 49 (2025) 345–360 345 Deep Neural Network Architecture Optimization for Edge Computing Based on Evolutionary Algorithms Li Wang 1, Xiuming Cheng 2, * 1School of Information and Electronics Engineering, Jiangsu Vocational Institute of Architectural Technology, Xuzhou 221116, Jiangsu, China 2School of General Courses, Jiangsu Vocational Institute of Architectural Technology, Xuzhou 221116, Jiangsu, China E-mail: xiuming_cheng@hotmail.com *Corresponding author Keywords: vehicular edge computing (VEC), edge server placement, network condition adaptation, synergistic fibroblast optimized efficient deep neural network (SFO-Eff-DNN) Received: May 28, 2025 Vehicular Edge Computing (VEC) is a crucial component of Intelligent Transportation Systems (ITS), enabling low-latency and energy-efficient services by offloading computation to the network edge. However, optimizing system performance in such environments requires careful edge server placement, especially in dynamic vehicular contexts characterized by high mobility and unpredictability. Achieving optimal performance under the constraints of latency, energy consumption, and mobility remains a significant challenge. This research proposes a comprehensive framework for optimizing deep learning architectures in VEC, utilizing advanced evolutionary algorithms. Building on real-world vehicular mobility traces, the framework employs the Synergistic Fibroblast Optimized Efficient Deep Neural Network (SFO-Eff-DNN) to identify optimal configurations and edge server placements. The dataset includes details about task offloading under different mobility levels, the data was preprocessed using Min-Max normalization to ensure smooth learning. Among the algorithms evaluated, Synergistic Fibroblast Optimization (SFO) consistently produces well-distributed Pareto-optimal solutions and effectively handles trade-offs between competing objectives. The DNN is utilized to learn complex patterns in vehicular mobility and network conditions, which helps predict the best configurations for edge server placements. The proposed system efficiently minimizes latency and energy consumption while ensuring scalability and adaptability to real-world scenarios. Results demonstrate that SFO-Eff-DNN achieves superior convergence speed and energy efficiency, making it well-suited for time-sensitive deployments. Comparative simulations validate that this approach outperforms traditional methods, providing valuable insights for deploying efficient and robust edge intelligence architectures in next-generation intelligent transportation systems. Povzetek: Ta raziskava se osredotoča na področje robnega računalništva v vozilih (VEC), kar je ključno za zagotavljanje nizke zakasnitve v inteligentnih transportnih sistemih. Vsebina prispevka predstavlja hibridni okvir SFO-Eff-DNN, ki združuje globoko učenje in evolucijsko optimizacijo za reševanje kompleksnega problema postavitve robnih strežnikov in prilagajanja arhitekture nevronske mreže. Glavni dosežki vključujejo rešitev večciljne optimizacijske naloge, ki uspešno minimizira zakasnitev in porabo energije v dinamičnem voznem okolju. 1 Introduction et al., 2020). The exponential growth in ITS has resulted in an increased demand for responsive, energy-efficient, and An ITS enhances the safety of moving vehicles and intelligent processing solutions that can manage the hikers within the vicinity. In recent times, problems dynamic vehicular environment (Elassy et al., 2024). VEC regarding road traffic safety have increased and accidents is a pattern that brings the cloud computing capacities continue to occur regularly (Wan et al., 2020). Fortunately, closer to the network edge and is a likely solution to a growing number of related technologies have been service demands for low-latency services, such as auto- applied to the transportation industry as wireless corrective driving support, real-time traffic management, communication and sensor technologies have developed and location-based services (Alhilal et al., 2024). and matured in recent years. The increased need for road Connected vehicles benefit from VEC by shortening the efficiency and safety in intricately linked road systems has response time of their systems and helping them save drawn a lot of attention to ITS in recent years (Boukerche power by assigning tasks to local servers (Chougule et al., 346 Informatica 49 (2025) 345–360 L. Wang et al. 2024). Greater safety, dependability, efficiency in inability to handle the dynamism of vehicular mobility transportation, fast action and network reach make smart effectively in the process of optimizing latency and energy and sustainable driving networks possible (Talpur and consumption. Gurusamy, 2021). To minimize the time for data exchanges and energy used in vehicles, VEC allows vehicles to 2 Related work perform certain tasks on edge servers nearby. As a result, This section discusses the positioning of border connected vehicles receive a much better level of service servers within the VEC, including the traditional heuristics, (Zaki et al., 2024). Due to their speed and patches of deep learning (DL), evolutionary algorithms, the unpredictability, the movement of vehicles complicates challenges in dynamic vehicular environments, and the VEC systems (Zhao et al., 2023). The greatest aspect to recent data-driven and optimization-based developments focus on is the best locations and times for edge servers so of this space for better adaptability and performance. To that moving vehicles can be handled efficiently (Shen et fix the issue of resource assignment in cloud computing al., 2021). With many vehicles moving, topology shifts Infrastructure as a Service (IaaS), an Equilibrium taking place and numerous demands for services, generic Optimization (EO)-based evolutionary Recurrent Neural or manual placements are not usually enough. Similarly, Network (RNN) was presented (Ebrahimi Mood et al., managing various goals, including keeping reaction times 2025). This model was designed to give virtual machines quick, using as little energy as possible, maintaining an optimal number of physical machines by improving flexibility, and scaling up, remained prominent in network how they work in general and by reducing their complexity. research (Peyman et al., 2023). As simulation traces were The simulations were faster and more reliable than the used, working with many nodes and requiring some conventional ones. attention to used parameters, this approach might face The significance of edge computing topics such as issues when put to practical use. selecting the right tasks for offloading, allocating resources, Deep learning and evolutionary optimization are used and ensuring good Quality-of-Service and Quality-of- in the design to choose the best locations for edge servers. Experience (Vijayakumar et al., 2021). The challenges in Specifically, the SFO-Eff-DNN approach allows the optimizing and scheduling were solved with models and system to recognize patterns using a DNN and search DL techniques based on evolution. This approach helps to globally using an SFO algorithm. This framework make better decisions and effectively manage resources in processes actual data from vehicle movement to environments at the edges of a network. Yang et al., (2021) understand vehicle movements and the state of the network, introduced a method that can manage both accuracy and as well as select the best position for the servers. The key the speed of neural networks on edge devices. An estimate contribution of the research is as follows. of resource use latency created from the profiling model In extremely dynamic vehicle contexts, it was best to and the Pareto Bayesian search was driven by constraints formulate the edge server placement problem as a multi- on accuracy and latency. Without sacrificing accuracy, the objective optimization task that simultaneously reduces inference process was 94.71% faster and the search the latency and energy consumption. process became 18.18% more efficient. To create the SFO-Eff-DNN framework, which An energy-efficient DNN offloading was developed combines biologically inspired optimization with effective under deadline and budget constraints in edge-cloud deep learning to deliver scalable and flexible placement environments; this optimization modeling was performed solutions. using an Enabled Hybrid Chaotic Evolutionary Algorithm To compare the system against traditional techniques Dynamic Voltage Frequency Scaling (HCEA-DVFS) (Li et and perform comprehensive simulations using genuine al., 2024). The Archimedes Optimization and Simulated mobility datasets, showcasing notable advances in Annealing were applied for global exploration, and local placement accuracy, energy economy, and convergence search improvement based on the Genetic Algorithm (GA) speed. chaotic strategy. Experiments proved that HCEA-DVFS The remainder of this research is separated into the decreased energy consumption by 7.93% to 19.38% following sections: the literature review on edge server relative to baseline techniques on a variety of DNN-based placement and the intelligent optimization techniques in apps. A suitable deep learning model and a proper method VEC are reviewed. The phrasing of the problem and the for training the effective training scheme for the deep system model are then given in detail, as well as the neural network (ETS-DNN) were created to allow real- description of the proposed SFO-Eff-DNN framework. time monitoring in an Internet of Medical Things (IoMT) The next section will discuss the experimental settings and system that used edge computing (Pustokhina et al., 2020). performance evaluations, and the results and insights will Optimization of the neural network with autoencoders and then be discussed. Lastly, the research is concluded with softmax layers was achieved by using a Hybrid Modified directions for further research. Water Wave Optimization (HMWWO) algorithm. The introduction highlights the significant importance Examination of simulation results indicated that ETS- of edge servers' placement efficiency in the VEC for DNN performed better when processing prompts and improving ITS performance. The literature review reveals making accurate diagnoses. Table 1 demonstrates the the weaknesses of existing methods, especially their summary of the literature review. Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 347 Table 1: Related work VEC optimization methods and outcomes Methods Aim Outcome Challenge Author/Ref. DeepMaker Automatically design Achieved up to 26.4x compression on Designing efficient DNNs (Loni et al., 2020) Framework robust DNN architectures CIFAR-10 with only 4% accuracy that fit resource (Multi- for embedded devices loss; optimized network size and constraints while objective accuracy for limited resources maintaining accuracy Evolutionary Approach) Internet of To detect cyberattacks in Achieved higher accuracy, superior Addressing IoT security (Saheed et al., 2024) Things (IoT)- IoT networks using an detection rate, greater precision, false with limited resources, Defender efficient, lightweight alarm rate, mIoU, and training time class imbalance, and low (Modified edge-based IDS on BoT-IoT dataset; effective real- hardware security in edge GA)/ Deep time deployment on Raspberry Pi computing environments long-short- devices term memory (LSTM) Genetic To reduce latency and Achieved lower energy consumption Balancing limited (Bi et al., 2020) Simulated energy usage in smart and faster convergence compared to resources of SMDs with Annealing- mobile devices by three baseline methods using real-life high communication costs based Particle partially offloading data; provided joint optimization of and maintaining energy- Swarm offloading ratio, bandwidth, and efficient service Optimization transmission power allocation (GSP) Greedy Optimizing task Achieved near-optimal scheduling Reducing excessive (Chen et al., 2020) Algorithm scheduling in cloud-edge performance with reduced average delays during DNN task and GA for systems to reduce the response time; GA outperformed offloading to enhance the Task average response time of greedy in accuracy but required more vehicle experience Scheduling DNN-based apps computation time. Particle to efficiently and quickly Reduced MEC server delay, balanced Designing a low-delay (You et al., 2021) Swarm transfer activities from energy consumption, and enabled and energy-efficient Optimization resource-constrained edge effective resource allocation offloading technique in a (PSO) devices to MEC servers in compared to GA and SA methods system with several IIoT contexts vehicles and MECs Differential To maximize IoT edge Outperformed the Firefly Algorithm Clustering and scheduling (Yousif, et al., 2024) Evolution computing task clustering and PSO in reducing execution time tasks effectively in (DE) and scheduling and improving system efficiency and heterogeneous IoT edge stability under heavyweight environments workloads Greedy To minimize the worst- Achieved convergence and effective Heterogeneous (Xiao et al., 2021) Algorithm + case cost of FL in VEC by trade-off between cost and fairness capabilities and data Lagrangian optimizing computation, through dynamic vehicle selection quality among vehicles; Dual + transmission, and local and resource allocation optimization energy and time Adaptive model accuracy constraints in VEC Harmony Search in federated learning (FL) VECMAN To improve energy Achieved 7–18% energy savings vs. Uncertainty in future (Bahreini et al., 2021) (Resource efficiency in VEC local execution and ~13% vs. RSU vehicle locations; Selector + systems by managing offloading by selecting participating difficulty in determining Energy resource sharing among vehicles and optimizing sharing optimal resource sharing Manager EVs durations and energy management Algorithms) VaCo To enhance intelligent VaCo effectively utilizes vehicle real-time scheduling of (Jiang et al., 2025) (Vehicle- service deployment in resources, reducing the service failure vehicle storage; benefit assisted VEC by using vehicles' rate and cost. Real-world dataset evaluation under dynamic Collaborative storage for collaborative evaluation confirms its ability to load Caching caching balance benefits for all. System HSCoNAS Optimize DNN (Hardware- Achieved strong accuracy–latency High search overhead and architecture for accuracy aware trade-offs on ImageNet across CPU, runtime approximation (Luo et al., 2021) and latency on edge Evolutionary GPU, edge challenges NAS devices Framework) LENS Incorporate wireless Improved Pareto front performance Scalability issues and (Latency- communication into NAS by 76.47% (energy) and 75% (Odema et al., 2021) fixed-tier constraints aware NAS for hierarchical systems (latency) for Edge– 348 Informatica 49 (2025) 345–360 L. Wang et al. Cloud Systems) Federated Learning in Review implementation, Classified FL methods, hardware Synchronization delays, Edge taxonomy, and challenges constraints, and case studies; (Abreha et al., 2022) hardware resource limits Computing of FL in EC identified open issues (Survey) RL-Dynamic To optimize service Reduced delay and improved edge Model complexity and (Talpur and Gurusamy, (Reinforcement placement in vehicular server utilization compared to static vehicle mobility 2021) Learning networks by considering placement; fairness trade-offs unpredictability Framework) mobility and dynamic demonstrated service demands 2.1 Problem statement individuals move around and the network evolves, it is Optimizing resources and edge server placement in important to find these servers with practical jobs and VEC as a result of high mobility, variable networks, and make sure they supply energy. The problem is solved by few resources was hard. Usually, greedy algorithms and optimizing multiple objectives, with the main variables other traditional methods do not work well in being the location of servers and the way vehicles connect environments that change dynamically (Chen et al., 2020). to them throughout the day. PSO faces the issues of early convergence and fixation when working with multiple vehicles (You et al., 2021). A) Architectural components DE was not suitable for clustering tasks in real time on heterogeneous edge systems due to its issues with The architecture of the VEC system consists of three scalability and computation (Yousif et al., 2024). main layers, such cloud, VEC, and vehicle, Cloud storage Therefore, the proposed framework SFO-Eff-DNN was allows for convenient processing and provides a backup used to learn how devices move and decide on offloading. system. Figure 1 illustrates the architecture of VEC. The It minimizes delays and uses less power, all while offering VEC layer includes a network of Roadside Units (RSUs) adaptability, scalability, and fast convergence in changing with edge servers, allowing local computing and rapid VEC networks. exchanges of data. Intelligent vehicles make up the vehicle, layer and handle task generation and offloading depending on the current network and mobility issues. Environmental 3 Methods sensors like Global Positioning System (GPS) and cameras 3.1 Architectural overview and problem in vehicles provide live data that is key for improved traffic formulation management and safety. They enable Vehicle-to-Vehicle The VEC would feature wireless connection, (V2V) and Vehicle-to-RSU (V2R) communication and permanent edge servers, and mobility vehicles. The were able to process or offload tasks according to resource simulation's rise can be increased by using vehicles to availability. Vehicles also allow for caching of data in carry out new missions on surrounding servers. As memory, which makes the system work more responsively. Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 349 Figure 1: The architecture of the VEC Vehicle Definition: The vehicle 𝑉 defined as a six- Edge Server: An edge server 𝐹 is defined as a three- tuple is expressed in equation (1). tuple in equation (3). 𝑽 = {𝑽𝒋𝒅, 𝑽𝒔𝒕, 𝒗𝑲, 𝑮, 𝑱[ 𝒓]} (1) 𝑭 = {𝑭𝒋𝒄, 𝑫, 𝑲} (3) Each vehicle 𝑉 is identified by its 𝑉𝑗𝑑, can be activated The edge server is identified by a unique ID (𝐹𝑗𝑐) and or deactivated (𝑉𝑠𝑡), has a type of task (𝑣𝑗,), is located by characterized by its computational capacity (𝐷 ), which Simulation of Urban Mobility (SUMO’s) data 𝐾 = includes memory, processing speed, and storage modeled {𝑘 similarly to vehicle hardware specifications. Its 𝑤 , 𝑘𝑧, 𝑘𝑦,𝑠𝑡}, is equipped with certain hardware (𝐺), and geographical location (𝐾) is also a key attribute for optimal is running several active instances of applications 𝐽[ 𝑟]. placement within the VEC network. Vehicle Hardware Specifications and Role of RSU: a Properties of edge servers in VEC vehicle’s hardware specifications 𝐺are represented as a set Dynamic vehicle assignment: Vehicle assignments to in equation (2). clusters at any time 𝑠 were independent of previous 𝑮 = {𝑶,𝑵[ 𝒓], 𝑨, 𝑻, 𝒅, 𝒆} (2) assignments, allowing the system to adapt in real-time to the high mobility and changing network topology of Each vehicle’s hardware profile 𝐺 includes processor vehicular environments. specs ( 𝑂 ), memory configuration 𝑁[ 𝑟] distinguishing Dedicated edge server assignment: Each vehicular central processing unit (CPU)/Graphics processing unit cluster was mapped indirectly to a single edge server, (GPU usage, battery capacity (𝐴 ), installed sensors (𝑇 ), ensuring exclusive service per cluster. This approach communication interfaces (𝑑) such as Wi-Fi, Long Term minimizes resource conflicts and supports the demanding Evolution (LTE), or 5G New Radio (NR), and performance requirements of VEC applications. communication frequency range (𝑒 ). These parameters Many-to-one vehicle-to-server mapping: Multiple influence the vehicle’s ability to process or offload vehicles can offload computational tasks to the same edge computational tasks. server, enabling efficient resource utilization and RSUs were placed along roadways that help to process centralized task processing within the VEC framework. and store data close to the network. RSUs were better at Data from edge servers is uploaded to remote data processing and managing data than vehicles and at storing centers, known as cloud servers, which supply large and communicating with the internet whenever necessary. amounts of computing and storage services over a large It provides quick answers to requests in maps, and videos, area. Using information from vehicles and edge servers, and controls traffic while edge servers rely on them. 350 Informatica 49 (2025) 345–360 L. Wang et al. cloud services can manage the network from one central 1) Decision Variables: place and take the best actions. The combination of To model the edge server placement in the VEC vehicular terminals, edge servers, and cloud infrastructure network, define decision variables that indicate whether an makes the VEC system both strong and capable of edge server is deployed at a specific location and how handling the needs of intelligent transportation vehicles were assigned to these servers for optimal management. performance. 𝐴𝑣𝑓 is a binary decision variable that With the architectural components established, the indicates the connection status between vehicle 𝑣 and edge server placement strategy in the proposed VEC framework server 𝑒 in equation (6). can now be formally defined to optimize performance under dynamic vehicular conditions. 𝐴𝑣𝑓 = Edge server placement: In the VEC model, the 𝟏, 𝒊𝒇 𝒗𝒆𝒉𝒊𝒄𝒍𝒆 𝒖 𝒊𝒔 𝒄𝒐𝒏𝒏𝒆𝒄𝒕𝒆𝒅 𝒕𝒐 𝒆𝒅𝒈𝒆 𝒔𝒆𝒓𝒗𝒆𝒓 𝒇 placement of edge servers was modeled by a bipartite { graph with two sets: 𝐹is for edge servers, while 𝑉 is for 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 client vehicles. Each server 𝑓 ∈ 𝐹 comes with a defined (6) 𝑊𝑚𝑎𝑥 , showing its maximum vehicle capacity. 𝑓 Communication cost indicates how well a vehicle v works 𝐴𝑣𝑓 is a binary decision variable indicating the with a server e due to the effects of latency 𝐾𝑉𝑓 and energy deployment status of an edge server at location  𝑗 in consumption 𝐹𝑉𝑓 . The objective is to determine a good equation (7). subset 𝐹1 out of 𝐹 and describe the mapping𝜙: 𝑉 → 𝐹1 :, assigning each vehicle to a server to minimize both the 𝐴𝑓𝑗 = total delay and the power used across the system. 1, 𝑖𝑓 𝑎𝑛 𝑒𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝑖𝑠 𝑝𝑙𝑎𝑐𝑒𝑑 𝑎𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑗 Average latency:𝐾ˉis used to mean the average time { 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 taken for vehicles to communicate with edge servers while (7) offloading their tasks. It helps to measure the effectiveness of server placement and matching vehicles to servers in the VEC framework under changing mobility conditions. It is 2) Parameters computed as in equation (4). The parameters in the formulation define the system 𝟏 characteristics essential for optimizing edge server 𝑲 = ∑ 𝑲 (4) |𝑽| 𝒗∈𝑽 𝒗𝒇 placement in the VEC network. The energy consumption for a vehicle 𝑣 to offload computational tasks to an edge The ∣ 𝑉 ∣ denotes the total number of vehicles within server e is denoted as 𝐹𝑣𝑓. In equation (8). the VEC network. 𝐾𝑣𝑓 represents the communication latency encountered by vehicle 𝑣 during task offloading to 𝐹𝑢𝑓 = (𝑂𝑠𝑤 + 𝑂𝑞𝑤). 𝑆𝑐𝑜𝑚𝑚 (8) edge server 𝑓, defined as equation (5). Where 𝑂 𝑲 𝑠𝑤is the vehicle’s transmission power, 𝑂𝑞𝑤is 𝒗𝒇 = 𝑺𝒓𝒆𝒄𝒆𝒊𝒗𝒆 − 𝑺𝒔𝒆𝒏𝒅 (5) the reception power, and 𝑆𝑐𝑜𝑚𝑚 is the time taken for the communication exchange. This metric helps quantify In this context, 𝑆𝑠𝑒𝑛𝑑indicates the timestamp when a energy efficiency in task offloading scenarios within the vehicle initiates the task offloading request, while 𝑆𝑟𝑒𝑐𝑒𝑖𝑣𝑒 VEC environment. The latency experienced by a vehicle marks the moment the vehicle receives the processed 𝑣 when offloading tasks to an edge server e is denoted as response from the edge server. 𝐾𝑣𝑓 . Equation (9) defines it as the interval of time between The goal of the edge server placement was to the sending of the offloading request and the receiving of minimize the average latency 𝐾ˉ, ensuring efficient, low- the processed response. latency communication for all vehicles within the network. 𝑲𝒗𝒇 = 𝑹𝒆𝒄𝒆𝒊𝒗𝒆 𝑻𝒊𝒎𝒆 − 𝑺𝒆𝒏𝒅 𝑻𝒊𝒎𝒆 (9) B) Model formulation The edge server placement issue in a VEC network is In the VEC environment, key parameters include 𝑂𝑓, defined in this section to minimize overall energy usage the active power consumption of edge server  𝑓; 𝑐𝑣𝑓, the and delay through optimal edge server placement. The distance between vehicle  𝑣  and edge server 𝑓 ; 𝐷 , the decision variables, objective functions, and constraints maximum number of servers on the edge deployable in the involved in the problem formulation are detailed below. network; and  𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦𝑖 , the maximum number of Consider a fixed of vehicles 𝑉 = {𝑣₁, 𝑣₂. . . 𝑣ₘ}, a set vehicles that a server on the edge  𝑖  can handle. These of edge servers 𝐹 = {𝑓1,𝑓2,…,𝑓𝑛} , anda list of possible factors guide optimal server placement. deployment sites 𝐽 = {𝑗1,𝑗2,…,𝑗𝑛} for placing edge servers 3) Objective Function within the network. To minimize overall energy consumption and reduce total latency in the VEC network. Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 351 Minimize Total Energy Consumption: Total energy ∑𝑚 ∑𝑚𝑓=1 𝑗=1𝐴𝑓𝑗 ≤ 𝐷 (14) consumption includes the energy used by vehicles to offload tasks (𝐹𝑣𝑓) and the power consumed by active edge Binary Constraints: The decision variables 𝐴𝑣𝑓 and servers (𝑂𝑓 ). The objective is to minimize the sum of 𝐴 vehicle offloading energy and edge server power across the 𝑓𝑗 are binary, reflecting the discrete nature of the problem. network in equation (10). Specifically, a vehicle 𝑣 is either connected to an edge server 𝑓 or not, and an edge server is either deployed at location 𝑗 or not. These binary constraints ensure clear and 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∑𝑁 ∑𝑀𝑣=1 𝑓=1𝐴𝑣𝑓𝐹𝑣𝑓 + unambiguous decision-making in the edge server ∑𝑚 ∑𝑚 placement and vehicle assignment process within the VEC 𝑓=1 𝑗=1𝐴𝑓𝑗𝑜𝑓 (10) network in equations (15) and (16). Where 𝑜𝑓 and 𝐴𝑓𝑗 indicates vehicle-to-server 𝐴 connections and server placements, respectively. 𝑢𝑓 ∈ {0,1}∀𝑢,𝑓 (15) Minimize Total Cumulative Latency: To reduce the overall communication delay experienced by vehicles 𝐴𝑓𝑗 ∈ {0,1}∀𝑓,𝑗 (16) when offloading tasks to edge servers. This total latency is calculated as the sum of the individual latencies 𝐾𝑣𝑓 , defined by the time difference between sending the task 3.2 Dataset and receiving the response expressed in the following For a Vehicular Edge Computing scenario, this 5,811- equation (11). record task offloading event dataset is used to validate the effectiveness of the proposed SFO-Eff-DNN system. This 𝑴𝒊𝒏𝒊𝒎𝒊𝒛𝒆 ∑𝒎 ∑𝒏𝒇=𝟏 𝒖=𝟏𝑨𝒗𝒇𝑲𝒗𝒇 (11) dataset includes information on task arrival/completion time, processing time, network latency, energy consumption, and vehicle node mobility. The model can Where 𝐴𝑣𝑓 indicates if vehicle 𝑣 offloads to server 𝑓, learn intricate mobility and network behaviors because to and 𝐾𝑣𝑓 is the latency between them. this dataset's capture of dynamic, real-world vehicle 4) Constraints settings. This is in line with the framework's goal of The optimization problem includes constraints to guarantee efficient deployment of edge servers and proper optimizing edge server placements and deep neural assignment of vehicles, ensuring that server capacities network settings. It encourages scalability, responsiveness, were not exceeded and system resources were utilized and efficiency for real-time VEC and smart mobility by effectively. facilitating an equitable examination of latency vs. energy Server Capacity Constraint: Each edge server has a trade-offs. limited capacity, restricting the number of vehicles it can serve. The total vehicles assigned to server 𝑓 must not Source: exceed its capacity𝐷𝑓, ensuring balanced load distribution https://www.kaggle.com/datasets/programmer3/vec-edge- and preventing server overload in equation (12). server-offloading-dataset ∑𝑴𝒖=𝟏𝑨𝒗𝒇 ≤ 𝑫𝒇 ∀𝒇 (12) 3.3 Preprocessing Using Min-Max Normalization Vehicle Assignment Constraint: To ensure proper task To create an energy-efficient optimum structure of a deep offloading, each vehicle must be assigned to exactly one neural network for real-time VEC activities with enhanced edge server. This guarantees that every vehicle connects to energy economy, reduced latency, and scalable a single server for processing its tasks, expressed as performance, min-max normalization is applied in the preprocessing stage. The model's convergence is equation (13). enhanced and a uniformly distributed collection of features is made possible for efficient decision-making for real- ∑𝐹𝑓=1𝐴𝑣𝑓 = 1 ∀𝜈 (13) time VEC operations by normalizing the input parameters of delay, energy consumption, and vehicle speed between 0 and 1.The value of property 𝐵 is normalized from Restrictions on Edge Server Positioning: The [𝑚𝑖𝑛 deployment of edge servers within the network is restricted 𝐵 , 𝑚𝑎𝑥𝐵] to [𝑛𝑒𝑤𝑚𝑖𝑛 , 𝑛𝑒𝑤𝑚𝑎𝑥 ] using equation 𝐵 𝐵 by a maximum allowable number, denoted by 𝐷 . This (17), which maximizes data representation: constraint confirms that the total quantity of placed edge 𝑢−𝑚𝑖𝑛𝐵 servers does not exceed 𝐷, and is formulated as equation (𝑛𝑒𝑤 𝑚𝑎𝑥 𝑚𝑖𝑛 , 𝑛𝑒𝑤𝐵 𝑚𝑎𝑥 ) + 𝑛𝑒𝑤𝐵 𝑚𝑖𝑛 𝐵 𝐵−𝑚𝑖𝑛𝐵 (14). (17) 352 Informatica 49 (2025) 345–360 L. Wang et al. In addition to enhancing prediction reliability and while SFO acts like fibroblast cells in real healing to search preserving a consistent data distribution, this through lots of different solutions quickly. SFO helps set normalization facilitates effective implementation in real- up Eff-DNN weights, biases, and learning rates to ensure time automotive applications. good latency, energy consumption, and ability to scale up or down. As a result of hybridization, the system evades 3.4 Synergistic fibroblast optimized efficient local optima and gradually finds the best solution. The use deep neural network (SFO-Eff-DNN) of real-world data for vehicles confirms that the SFO-Eff- The research to improve the DL architecture, the SFO- DNN framework can quickly converge, lower the time Eff-DNN suggests a hybrid intelligence framework with needed for inference, and help with making energy- edge servers situated in VEC. It relies on the predictive efficient decisions in rapidly changing VEC environments. power of Eff-DNN and integrates the ability of the SFO Algorithm 1 represents the proposed SFO-Eff-DNN model algorithm to adjust itself. Eff-DNN is used to figure out working process. how vehicles move around and how the network changes, Algorithm 1: SFO-Eff-DNN 𝑆𝑡𝑒𝑝 1: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑠𝑒𝑡𝑢𝑝(): 𝑀 = 30 # 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒 𝑁 = 𝑛𝑢𝑚_𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠() # 𝑇𝑜𝑡𝑎𝑙 𝐸𝑓𝑓 − 𝐷𝑁𝑁 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑚𝑎𝑥_𝑖𝑡𝑒𝑟, 𝑟ℎ𝑜, 𝑡𝑎𝑢 = 100, 0.5, 5 𝑠, 𝑘_𝑝𝑞, 𝐿 = 1.0, 0.8, 10.0 𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑎𝑑_𝑉𝐸𝐶_𝑑𝑎𝑡𝑎() # 𝑅𝑒𝑎𝑙 − 𝑤𝑜𝑟𝑙𝑑 𝑚𝑜𝑏𝑖𝑙𝑖𝑡𝑦/𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑑𝑎𝑡𝑎 𝑟𝑒𝑡𝑢𝑟𝑛 𝑀,𝑁,𝑚𝑎𝑥_𝑖𝑡𝑒𝑟, 𝑟ℎ𝑜, 𝑡𝑎𝑢, 𝑠, 𝑘_𝑝𝑞, 𝐿, 𝑑𝑎𝑡𝑎 𝑆𝑡𝑒𝑝 2: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 𝑑𝑒𝑓 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁): 𝑟𝑒𝑡𝑢𝑟𝑛 [{′𝑝𝑎𝑟𝑎𝑚𝑠′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁), ′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁)} 𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑀)] 𝑆𝑡𝑒𝑝 3: 𝑇𝑟𝑎𝑖𝑛 𝑎𝑛𝑑 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒 𝑡ℎ𝑒 𝐸𝑓𝑓 − 𝐷𝑁𝑁 𝑚𝑜𝑑𝑒𝑙 𝑑𝑒𝑓 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑝𝑎𝑟𝑎𝑚𝑠, 𝑑𝑎𝑡𝑎): 𝑚𝑜𝑑𝑒𝑙 = 𝑏𝑢𝑖𝑙𝑑_𝐸𝑓𝑓𝐷𝑁𝑁(𝑝𝑎𝑟𝑎𝑚𝑠) 𝑡𝑟𝑎𝑖𝑛_𝐷𝑁𝑁(𝑚𝑜𝑑𝑒𝑙,∗ 𝑑𝑎𝑡𝑎) 𝑙𝑎𝑡𝑒𝑛𝑐𝑦, 𝑒𝑛𝑒𝑟𝑔𝑦 = 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦_𝑒𝑛𝑒𝑟𝑔𝑦(𝑚𝑜𝑑𝑒𝑙) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑒𝑛𝑒𝑟𝑔𝑦 # 𝑆𝑖𝑚𝑝𝑙𝑒 𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑙𝑜𝑤𝑒𝑟 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟) 𝑆𝑡𝑒𝑝 4: 𝑉𝑒𝑙𝑜𝑐𝑖𝑡𝑦 𝑢𝑝𝑑𝑎𝑡𝑒 𝑤𝑖𝑡ℎ 𝑓𝑒𝑒𝑑𝑏𝑎𝑐𝑘 𝑎𝑛𝑑 𝑙𝑜𝑐𝑎𝑙 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑢𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦(𝑖𝑛𝑑, 𝑝𝑎𝑠𝑡_𝑝𝑜𝑠, 𝑟ℎ𝑜): 𝑐 = 𝑙𝑜𝑐𝑎𝑙_𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 𝑑 = 𝑣𝑒𝑐𝑡𝑜𝑟_𝑑𝑖𝑣(𝑝𝑎𝑠𝑡_𝑝𝑜𝑠, 𝑛𝑜𝑟𝑚(𝑝𝑎𝑠𝑡_𝑝𝑜𝑠)) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′] + (1 − 𝑟ℎ𝑜) ∗ 𝑐 + 𝑟ℎ𝑜 ∗ 𝑑 𝑆𝑡𝑒𝑝 5: 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑢𝑝𝑑𝑎𝑡𝑒 𝑑𝑒𝑓 𝑢𝑝𝑑𝑎𝑡𝑒_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛(𝑖𝑛𝑑, 𝑣𝑒𝑙, 𝑠, 𝑘_𝑝𝑞, 𝐿): Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 353 𝑠𝑝𝑒𝑒𝑑 = 𝑠 / (𝑘_𝑝𝑞 ∗ 𝐿) 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 = 𝑣𝑒𝑐𝑡𝑜𝑟_𝑑𝑖𝑣(𝑣𝑒𝑙, 𝑛𝑜𝑟𝑚(𝑣𝑒𝑙)) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′] + 𝑠𝑝𝑒𝑒𝑑 ∗ 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑆𝑡𝑒𝑝 6: 𝑀𝑎𝑖𝑛 𝑆𝐹𝑂 − 𝐸𝑓𝑓𝐷𝑁𝑁 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒_𝑆𝐹𝑂_𝐸𝑓𝑓𝐷𝑁𝑁(): 𝑀, 𝑁, 𝑇, 𝑟ℎ𝑜, 𝑡𝑎𝑢, 𝑠, 𝑘_𝑝𝑞, 𝐿, 𝑑𝑎𝑡𝑎 = 𝑠𝑒𝑡𝑢𝑝() 𝑝𝑜𝑝, ℎ𝑖𝑠𝑡𝑜𝑟𝑦 = 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁), [] 𝑓𝑜𝑟 𝑡 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑇): 𝑓𝑜𝑟 𝑖𝑛𝑑 𝑖𝑛 𝑝𝑜𝑝: 𝑖𝑛𝑑[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′] = 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑑𝑎𝑡𝑎) 𝑝𝑎𝑠𝑡 = 𝑝𝑜𝑝 𝑖𝑓 𝑡 < 𝑡𝑎𝑢 𝑒𝑙𝑠𝑒 𝑝𝑜𝑝. 𝑐𝑜𝑝𝑦() 𝐹𝑜𝑟 𝑖, 𝑖𝑛𝑑 𝑖𝑛 𝑒𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑒(𝑝𝑜𝑝): 𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′] = 𝑢𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦(𝑖𝑛𝑑, 𝑝𝑎𝑠𝑡[𝑖][′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑟ℎ𝑜) 𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′] = 𝑢𝑝𝑑𝑎𝑡𝑒_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛(𝑖𝑛𝑑, 𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′], 𝑠, 𝑘_𝑝𝑞, 𝐿) 𝑏𝑒𝑠𝑡 = 𝑚𝑖𝑛(𝑝𝑜𝑝, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) ℎ𝑖𝑠𝑡𝑜𝑟𝑦. 𝑎𝑝𝑝𝑒𝑛𝑑(𝑏𝑒𝑠𝑡[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑏𝑒𝑠𝑡, ℎ𝑖𝑠𝑡𝑜𝑟𝑦 To improve performance in dynamic Vehicular Edge makes use of an Eff-DNN to represent how vehicles and Computing (VEC) settings, the SFO-Eff-DNN algorithm 1 networks interact. An Eff-DNN architecture has an input combines the strength of Efficient Deep Neural Networks layer, an output layer, and many hidden layers, as shown (Eff-DNN) with Synergistic Fibroblast Optimization in Figure 2. The network is set up with six input layers and (SFO), an optimization technique inspired by nature. seven hidden layers, all containing 64 neurons to avoid Using actual traffic and network data, the algorithm overfitting. Model complexity and generalization were initializes a population of solutions, each of which managed in TensorFlow by setting them as represents a set of Eff-DNN parameters, and assesses each hyperparameters within the layers. It uses input about how according to latency and energy consumption. The vehicles behave and interact to determine the best approach is perfect for real-time intelligent transportation positioning of the edge servers. The network uses the systems because it ensures quick convergence and Rectified Linear Unit (ReLU) function to make its improved flexibility by updating its location and velocity computations non-linear and adjust the weights it uses for depending on fitness input and historical experiences. learning through backpropagation. The Eff-DNN can provide quick and efficient decisions in ever-changing Efficient Deep Neural Network (Eff-DNN) The vehicular environments due to the backpropagation proposed optimized deep learning architecture in VEC process, which keeps the cost function low. Neuron outputs were computed as follows in equation (18). 354 Informatica 49 (2025) 345–360 L. Wang et al. Figure 2: Architecture of Eff-DNN 𝑧𝑚+1𝑟 = 𝜎(𝑦) = 𝜎(∑𝑛𝑗=1𝜔 𝑚 𝑗 𝑧𝑚 𝑚+ 𝑗 + 𝑎 1 𝑟 ) (18) 𝑛𝑠 = 𝛽1𝑛𝑠−1 + (1 − 𝛽1)ℎ𝑠 𝑟 𝑈 𝑠 = 𝛽2𝑢𝑠−1 + (1 − 𝛽 2 2)ℎ𝑠 Where σ(z) represents the activation function, and ℎ𝑠 = 𝛻𝜃𝐹(𝜃𝑠−1) 𝑧𝑚+1𝑟 is the output of the 𝑟 − 𝑡ℎneuron in the (𝑚 + 1)-th 𝑛 ?̂?𝑠 = 𝑠 1−𝛽𝑠 (20) layer. The weights among the 𝑗 − 𝑡ℎ neuron of layer n and 1 𝑢 the 𝑟 − 𝑡ℎ neuron of layer (𝑚 + 1 ) are labeled 𝜔𝑚𝑗 , and 𝑟 ?̂? 𝑠 𝑠 = 1−𝛽𝑠 𝑎𝑚+1𝑟 represents the bias term for linear transformations. 2 ?̂? While training, the loss function compares the predicted 𝜃𝑠 = 𝜃 𝑠 { 𝑠−1−∝ outcomes with the desired ones. The model finds the best √𝑢𝑠+𝜀 values for 𝜔 and 𝑎 by minimizing the loss, making the network predict more accurately. The Eff-DNN’s loss 𝑒𝑝𝑜𝑐ℎ−𝑛𝑢𝑚 𝑀 function is explained in equation (19). ∝=∝ 𝑎 0 𝛽 𝑏 𝑡𝑐ℎ−𝑠𝑖𝑧𝑒 3 (21) 1 𝑓(𝜃) = − ∑ ∑𝑟 𝑠 l g 𝑚 𝑚 𝑚𝑟 o 𝑧𝑚𝑟 (19) Where 𝑈𝑠represents the weighted average of exponentially the squared gradients, while ℎ𝑠 denotes the gradient of the parameters at time 𝑠 , 𝑛𝑠captures the Where𝑠𝑚𝑟 represents the actual value of the 𝑟 − 𝑡ℎ average movement of the gradient, and ∝0 is the initial sample's 𝑚 − 𝑡ℎelement, 𝑧𝑚𝑟 denotes the predicted value learning rate. The corrected versions of these estimates for the same element, and θ represents the collection of were denoted by ?̂?𝑠and ?̂?𝑠, which improve optimization parameters including weights 𝜔 and biases 𝑎 . Here, 𝑀 is accuracy. Exponential decay rates 𝛽1 , 𝛽2,𝑎𝑛𝑑 𝛽3 are used the total quantity of samples. To reduce overfitting, a dropout mechanism is employed that randomly disables to stabilize updates. Additionally, parameters such as batch neurons during training, effectively disrupting the network size (epbatch−size) and current training iterations (ochnum structure and promoting generalization. Furthermore, the ) influence convergence behavior. The improved DNN proposed method enhances the conventional gradient supports dual operational modes, RDL-1 for normal descent by dynamically adapting the learning rate for conditions and RDL-2 for power swing detection, ensuring improved convergence. The optimization of the parameter adaptive command generation aligned with dynamic set 𝜃 is formally defined as equations (20) and (21). vehicular network scenarios. Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 355 (𝑡+1) (𝑡+1) (𝑡) Synergistic fibroblast optimization (SFO) 𝑏 ∗ 𝑣𝑖 𝑖 = 𝑏𝑖 + 𝑠 (23) (𝑡+1) ||𝑣 SFO is modeled after migratory fibroblast cells that 𝑖 || heal tissue by responding to the extracellular matrix (ECM). Every solution searches the solution space by 𝑠 The movement speed 𝑡 is defined as 𝑠 = here varying its position and velocity about diffusion and fitness. 𝑘 ′, w 𝑝𝑞𝐿 This bio-inspired method allows for greater flexibility and "𝑘𝑝𝑞" represents the baseline movement rate and 𝐿 denotes avoids local minima, making it appropriate for optimizing the movement length. The SFO-Eff-DNN hybrid model neural networks and edge server placement in dynamic optimizes edge server placement in dynamic VEC VEC settings. environments by combining adaptive search with deep A model based on the adaptive actions of fibroblast learning. It efficiently predicts optimal configurations, cells used in repairing tissues. SFO works on tuning how improves convergence speed, and reduces latency and deep neural networks are set up and arranging edge servers energy use, making it ideal for real-time intelligent in dynamically changing virtual edge clouds. Much as transportation systems. fibroblasts respond to the extracellular matrix (ECM), SFO looks for solutions in many different ways. Ongoing 4 Results and discussion testing and evaluation of fitness ensure the best solutions The experimental setup uses an Intel i7 CPU. use both energy and time efficiently. For this reason, this Simulations were conducted in Python with TensorFlow approach ensures flexibility in the way transportation and the Veins platform using Cologne traffic traces. The systems are managed. dataset was split using an 80:20 ratio, where 80% was used The process of biomechanical analysis was for training the SFO-Eff-DNN model and 20% was strengthened each time by paying attention to interactions reserved for testing to evaluate performance and with the ECM. As it runs, the program tests different generalization. combinations of settings, much like fibroblasts, to improve The SFO-Eff-DNN model includes ReLU-activated its outcome. The simulated cells disperse and travel to the layers and dropout, optimized via SFO. Performance was most promising areas to avoid getting caught in local evaluated based on latency, energy use, and server minima. Depending on the speed and distribution of the placement accuracy. Key simulation parameters with particles, the algorithm updates its next action using the values aligned to realistic VEC scenarios are presented in information and trends it has gathered. As a result, the Table 2. process can handle the trade-offs between speed, Table 2: Key simulation parameters for the SFO-Eff- performance, and movement better in VEC networks. DNN VEC Framework Initialization: Within the 𝑁 -dimensional solution 𝑷𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝑽𝒂𝒍𝒖𝒆 space, initialize a population of physical activity 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑟𝑒𝑎 1500 𝑚 × 1500 𝑚 movements 𝑓𝑖, , where𝑖 = 1,2, … ,𝑀, . Each movement is 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 200𝑠, 300𝑠, 400𝑠 assigned a random position ( ) and velocity (𝑣𝑖). Key parameters such as the diffusion coefficient 𝜌 and 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟𝑠8 movement speed 𝑠are established. 𝑇𝑟𝑎𝑛𝑠𝑚𝑖𝑠𝑠𝑖𝑜𝑛 𝑝𝑜𝑤𝑒𝑟 25 𝑚𝑊, 30 𝑚𝑊, 35 𝑚𝑊 Fitness Evaluation: For each candidate solution 𝑓 𝑅𝑆𝑈 𝑎𝑛𝑡𝑒𝑛𝑛𝑎 ℎ𝑒𝑖𝑔ℎ𝑡 5 𝑚 𝑖 in the N-dimensional space, the fitness function 𝑒(𝑓 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑟 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 −100 𝑑𝐵𝑚 𝑖) is evaluated iteratively to assess the quality of each 𝑀𝑒𝑠𝑠𝑎𝑔𝑒 𝑠𝑖𝑧𝑒 100 𝑏𝑖𝑡𝑠 movement. This process aims to identify the optimal 𝑀𝑒𝑠𝑠𝑎𝑔𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 2 𝐻𝑧 solution (maximum or minimum) within the evolving 𝐷𝑎𝑡𝑎 𝑟𝑎𝑡𝑒 10 𝑀𝑏𝑝𝑠, 20 𝑀𝑏𝑝𝑠, 30 𝑀𝑏𝑝𝑠 search region. Based on the fitness outcomes, the position 𝑉𝑒ℎ𝑖𝑐𝑙𝑒 𝑠𝑝𝑒𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 0 – 100 𝑘𝑚/ℎ (𝑏𝑖) and velocity (𝑣𝑖) of each movement were updated 𝐸𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝐶𝑃𝑈 𝑐𝑎𝑝𝑎𝑐𝑖3𝑡𝑦.5 𝐺𝐻𝑧 accordingly using the update rules given by Equations (22) 𝐸𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝑚𝑒𝑚𝑜𝑟𝑦 32 𝐺𝐵 and (23), enabling the algorithm to adaptively explore the solution space. 4.1 Offloading ratio (𝑡+1) (𝑡) ( ) Using time on the x plane and the percentage of tasks 𝑣 𝑖 𝑖 = 𝑣𝑖 + (1 − 𝜌)𝑐(𝑓 𝑡 ∗ 𝑓 (𝑡−𝜏) 𝑖 ) + 𝜌 ||𝑓 offloaded from the vehicle to edge servers on the y plane, 𝑖(𝑡−𝜏)|| Figure 3 shows the offloading ratio (%) in the VEC system (22) over 10 minutes. Starting at 75%, the offloading ratio steadily rises to 89%, reflecting an increasing reliance on Where 𝑡 is the current iteration, 𝜏 is the time delay, edge computation. This upward trend is attributed to and the diffusion coefficient 𝜌 is set to 0.5. enhanced network conditions, adaptive optimization by the SFO-Eff-DNN framework for energy efficiency, or the growing complexity of vehicular tasks that necessitate edge processing. Tracking this metric is crucial in the 356 Informatica 49 (2025) 345–360 L. Wang et al. research context, as a higher offloading ratio signifies accelerating task processing, thereby improving overall more efficient utilization of edge resources, which directly system performance in dynamic ITS environments. contributes to lowering vehicle energy consumption and Figure 3: Offloading ratio over time 4.2 SFO-Eff-DNN Pareto Front in VEC about 70 J to 40 J, showing an inverse relationship. All In VEC, the Pareto front for the suggested SFO-Eff- points on the curve are Pareto-optimal, as enhancing one DNN illustrates the relationship between latency and factor would cause a drop in the other. Because of the energy use. Figure 4 illustrates that with latency increasing model's diversity, it is possible to choose configurations for from 50 ms to 70 ms, the energy consumed decreases from specific needs, such as real-time applications or limited- power cases, proving its effectiveness and adaptability. Figure 4: Pareto front diversity of SFO-Eff-DNN in VEC 4.3 Convergence Behavior of SFO-Eff-DNN algorithm quickly identifies energy-efficient Figures 5 (a) and (b) illustrate the convergence configurations. The average energy consumption (green behavior of the SFO-Eff-DNN algorithm over 100 dashed line) also follows a similar decreasing trend, optimization iterations for energy consumption and latency. gradually converging toward the minimum, which reflects In Figure (a), the minimum energy consumption (blue line) the population's collective improvement. Similarly, in rapidly drops from approximately 0.34 to 0.29 within the Figure (b), during the first iterations, the latency drops first 10 iterations and then stabilizes, indicating that the rapidly and then becomes more stable at a much lower Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 357 level. The average latency also decreases and stabilizes Overall, these trends confirm that SFO-Eff-DNN achieves around the same value, highlighting consistent efficient and simultaneous convergence toward optimal performance improvement across the solution space. energy and latency trade-offs. Figure 5: Convergence Behavior of SFO-Eff-DNN (a) energy conception and (b) latency 4.4 Performance analysis A comparison of several optimization techniques based on their energy consumption and latency performance in vehicular edge computing scenarios is shown in Table 3. Among the evaluated techniques, Particle Swarm Optimization (PSO) (Surayya et al., 2025), Teaching–Learning-Based Optimization (TLBO) (Surayya et al., 2025), and Ant Colony Optimization (ACO) (Surayya et al., 2025), the proposed SFO-Eff-DNN method demonstrates the energy consumption and the latency. This highlights the superior efficiency and responsiveness of the SFO-Eff-DNN framework, making it highly suitable for real-time, energy-aware edge deployments in dynamic vehicular environments. Figure 6 demonstrates the results of the performance analysis. Table 3: Comparison of optimization methods by energy consumption and latency Methods Energy Latency (S) Consumption (J) PSO (Surayya 0.3535 40 μs et al., 2025) TLBO 0.3546 40 μs (Surayya et al., 2025) ACO 0.3517 60μs (Surayya et Figure 6: Comparison methods by energy al., 2025) consumption and latency SFO-Eff-DNN 0.3480 30 μs (Proposed) Analyzing different optimization methods for their energy consumption and latency when used in VEC. SFO- Eff-DNN shows better results than other models by using the least amount of energy (0.3480 J) and having the shortest latency (30 μs). Here, microseconds (μs) are used, 358 Informatica 49 (2025) 345–360 L. Wang et al. since 1 μs is a millionth of a second, which is needed to when ultra-low latency is necessary. ACO (Surayya et al., ensure fast response times vital in real-time VEC systems. 2025) can distribute solutions equally, but its slow For energy usage, PSO and TLBO lead with 0.3535 J and execution means it is not suitable when time is critical. A 0.3546 J, respectively, but both have a latency of 40 μs, PSO, TLBO, and ACO lead with low energy of 0.3535 J, while ACO uses 0.3517 J with the highest latency of 60 μs. 0.3546 J, and 0.3517 J. Using the SFO-Eff-DNN model, The results demonstrate that SFO-Eff-DNN offers better energy costs and latency can be cut down at the same time, results in real-time, energy-sensitive VEC applications. compared to older versions. Compared to the generic A comparison of task drop rates for various placement method's 2.90% dropped task rate (Khamari et al., 2022), techniques in dynamic VEC situations is shown in Table 4 the SFO-Eff-DNN's dropped task rate was only 1.83%, and Figure 7. In comparison to the generic method's 2.90% indicating its resilience in workload balancing and edge (Khamari et al., 2022) dropped task rate, the suggested resource utilisation in dynamic vehicular situations. Due to SFO-Eff-DNN model performs better, attaining a dropped advanced techniques and deep learning, the system reacts task rate of just 1.83% (Proposed). In latency-sensitive, to updates in vehicles and can quickly and accurately high-mobility edge computing systems, this research configure servers for VEC applications. demonstrates how well the SFO-Eff-DNN optimises server The computational load brought on by the workload allocation and lowers service denial. hybridization of deep learning and evolutionary optimization constitutes one of the key issues, especially Table 4: Comparison of task dropped rate between during the early phases of training and adaption. Despite placement strategies in VEC environments its potential for convergence efficiency, iterative optimization can be resource-hungry on edge nodes with Placement strategies Dropped Tasks (%) constrained computing capacity. Another problem is the system's scalability in high-density vehicle networks. generic method 2.90% While the model works well for simulations of (Khamari et al., intermediate scale, more study is needed to determine how 2022) it responds and operates in large, real-time vehicular systems with hundreds of nodes. These limitations SFO-Eff-DNN 1.83% highlight the significance of future studies that focus on (Proposed) distributed training practices and lightweight optimization versions that can sustain performance without increasing compute demands in practical applications. 5 Conclusion VEC is a pattern that encourages cloud computing capabilities closer to the network edge services needed for low-latency services, such as auto-corrective driving support, real-time traffic management, and location-based applications. The proposed SFO-Eff-DNN framework is used to optimize deep learning for VEC using modern evolutionary algorithms. To deal with the problem of placing servers at the edge of wireless networks in vehicles, both Synergistic Fibroblast Optimization and deep neural networks were used. It makes use of real travel data to Figure 7: Comparison of Dropped Task Rates for manage how quickly it responds and how much energy it Generic Method and SFO-Eff-DNN uses, adjusts to any changes in the network, and provides quick results. The data from experiments reveals that SFO- 4.5 Discussion Eff-DNN works with 30 μs latency, 0.3480 J energy consumption, and only 1.83% dropped tasks, making it By optimizing the placement of edge servers and DL well-suited for speedy and efficient smart transportation. It networks, the SFO-Eff-DNN in VEC reduces latency and strongly supports and adapts to the new directions being conserves energy. The technique has some problems with taken in VEC deployments. Using simulated movement responding to changes in vehicles and adapting to sudden and experimentation usually does not reflect real-world network changes in VEC settings (Bi et al., 2020). While events or problems, meaning their practical use may not be VECMAN saves energy by sharing resources among as effective. electric vehicles, it is difficult for it to accurately predict where vehicles are and to schedule them in situations that are constantly changing (Bahreini et al., 2021). As both Future scope PSO and TLBO (Surayya et al., 2025) prioritize low Future research should integrate real-time traffic incident energy over low latency, they may not respond fast enough data and 5G network slicing to further enhance adaptability. Extending the framework with federated learning for Deep Neural Network Architecture Optimization for Edge… Informatica 49 (2025) 345–360 359 privacy-preserving model updates across distributed [10] Peyman, M., Fletcher, T., Panadero, J., Serrat, C., vehicles, and exploring hybrid optimizers that combine Xhafa, F. and Juan, A.A., 2023. Optimization of SFO with reinforcement learning could improve vehicular networks in smart cities: from agile robustness against unforeseen network disruptions and optimization to learn heuristics and sim heuristics. accelerate convergence in large-scale, heterogeneous VEC Sensors, 23(1), deployments. p.499.https://doi.org/10.3390/s23010499 [11] Ebrahimi Mood, S., Rouhbakhsh, A. and Souri, A., 2025. Evolutionary recurrent neural network based References on equilibrium optimization method for cloud-edge [1] Wan, S., Xu, X., Wang, T., and Gu, Z., 2020. An resource management in Internet of Things. Neural intelligent video analysis method for abnormal event Computing and Applications, 37(6), pp.4957- detection in intelligent transportation systems. IEEE 4969.https://doi.org/10.1007/s00521-024-10929-1 Transactions on Intelligent Transportation Systems, [12] Vijayakumar, P., Rajalingam, P. and Rajeswari, 22(7), pp.4487-4495.DOI: S.V.K.R., 2021. Edge Computing Optimization 10.1109/TITS.2020.3017505 Using Mathematical Modeling, Deep Learning [2] Boukerche, A., Tao, Y. and Sun, P., 2020. Artificial Models, and Evolutionary Algorithms. Simulation intelligence-based vehicular traffic flow prediction and Analysis of Mathematical Methods in Real‐Time methods for supporting intelligent transportation Engineering Applications, pp.17- systems. Computer networks, 182, 44.https://doi.org/10.1002/9781119785521.ch2 p.107484.https://doi.org/10.1016/j.comnet.2020.107 [13] Yang, Z., Zhang, S., Li, R., Li, C., Wang, M., Wang, 484 D. and Zhang, M., 2021. Efficient resource-aware [3] Elassy, M., Al-Hattab, M., Takruri, M. and Badawi, convolutional neural architecture search for edge S., 2024. Intelligent transportation systems for computing with Pareto-bayesian optimization. sustainable smart cities. Transportation Engineering, Sensors, 21(2), p.100252.https://doi.org/10.1016/j.treng.2024.10025 p.444.https://doi.org/10.3390/s21020444 2 [14] Li, Z., Yu, H., Fan, G., Zhang, J. and Xu, J., 2024. [4] Alhilal, A.Y., Finley, B., Braud, T., Su, D. and Hui, P., Energy-efficient offloading for DNN-based 2022. Street smart in 5G: Vehicular applications, applications in edge-cloud computing: A hybrid communication, and computing. IEEE Access, 10, chaotic evolutionary approach. Journal of Parallel pp.105631- and Distributed Computing, 187, 105656.DOI: 10.1109/ACCESS.2022.3210985 p.104850.https://doi.org/10.1016/j.jpdc.2024.10485 [5] Chougule, S.B., Chaudhari, B.S., Ghorpade, S.N. and 0 Zennaro, M., 2024. Exploring computing paradigms [15] Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Khanna, for electric vehicles: from cloud to edge intelligence, A., Shankar, K. and Nguyen, G.N., 2020. An challenges and future directions. World Electric effective training scheme for deep neural networks in Vehicle Journal, 15(2), edge computing enabled Internet of Medical Things p.39.https://doi.org/10.3390/wevj15020039 (IoMT) systems. IEEE Access, 8, pp.107112- [6] Talpur, A. and Gurusamy, M., 2021. Drld-sp: A deep- 107123.DOI: 10.1109/ACCESS.2020.3000322 reinforcement-learning-based dynamic service [16] Loni, M., Sinaei, S., Zoljodi, A., Daneshtalab, M. and placement in edge-enabled internet of vehicles. IEEE Sjödin, M., 2020. DeepMaker: A multi-objective Internet of Things Journal, 9(8), pp.6239- optimization framework for deep neural networks in 6251.DOI: 10.1109/JIOT.2021.3110913 embedded systems. Microprocessors and [7] Zaki, A.M., Elsayed, S.A., Elgazzar, K. and Microsystems, 73, Hassanein, H.S., 2024. Quality-Aware Task p.102989.https://doi.org/10.1016/j.micpro.2020.102 Offloading for Cooperative Perception in Vehicular 989 Edge Computing. IEEE Transactions on Vehicular [17] Saheed, Y.K., Abdulganiyu, O.H. and Ait Tchakoucht, Technology.DOI: 10.1109/TVT.2024.3444591 T., 2024. Modified genetic algorithm and fine-tuned [8] Zhao, L., Li, T., Zhang, E., Lin, Y., Wan, S., Hawbani, long short-term memory network for intrusion A. and Guizani, M., 2023. Adaptive swarm detection in the Internet of Things networks with intelligent offloading based on digital twin-assisted edge capabilities. Applied Soft Computing, 155, prediction in VEC. IEEE Transactions on Mobile p.111434.https://doi.org/10.1016/j.asoc.2024.111434 Computing, 23(8), pp.8158- [18] Bi, J., Yuan, H., Duanmu, S., Zhou, M. and Abusorrah, 8174.DOI: 10.1109/TMC.2023.3344645 A., 2020. Energy-optimized partial computation [9] Shen, B., Xu, X., Qi, L., Zhang, X. and Srivastava, offloading in mobile-edge computing with genetic G., 2021. Dynamic server placement in edge simulated-annealing-based particle swarm computing toward the internet of vehicles. Computer optimization. IEEE Internet of Things Journal, 8(5), Communications, 178, pp.114- pp.3774-3785.DOI: 10.1109/JIOT.2020.3024223 123.https://doi.org/10.1016/j.comcom.2021.07.021 360 Informatica 49 (2025) 345–360 L. Wang et al. [19] Chen, Z., Hu, J., Chen, X., Hu, J., Zheng, X. and Min, (pp. 1-7). IEEE. https://doi.org/10.1109/VTC2021- G., 2020. Computation offloading and task Spring51267.2021.9448645 scheduling for DNN-based applications in cloud- [30] Khamari, S., Ahmed, T. and Mosbah, M., 2022, edge computing. IEEE Access, 8, pp.115537- December. Efficient edge server placement under 115547.DOI: 10.1109/ACCESS.2020.3004509 latency and load balancing constraints for vehicular [20] You, Q. and Tang, B., 2021. Efficient task offloading networks. In GLOBECOM 2022-2022 IEEE Global using particle swarm optimization algorithm in edge Communications Conference (pp. 4437-4442). computing for the industrial internet of things. IEEE.https://doi.org/10.1109/GLOBECOM48099.2 Journal of Cloud Computing, 10, pp.1- 022.10000721 11.https://doi.org/10.1186/s13677-021-00256-4 [21] Yousif, A., Bashir, M.B. and Ali, A., 2024. An evolutionary algorithm for task clustering and scheduling in IoT edge computing. Mathematics, 12(2), p.281.https://doi.org/10.3390/math12020281 [22] Xiao, H., Zhao, J., Pei, Q., Feng, J., Liu, L. and Shi, W., 2021. Vehicle selection and resource optimization for federated learning in vehicular edge computing. IEEE Transactions on Intelligent Transportation Systems, 23(8), pp.11073- 11087.DOI: 10.1109/TITS.2021.3099597 [23] Bahreini, T., Brocanelli, M. and Grosu, D., 2021. VECMAN: A framework for energy-aware resource management in vehicular edge computing systems. IEEE Transactions on Mobile Computing.DOI: 10.1109/TMC.2021.3089338 [24] Jiang, H., Cai, J., Xiao, Z., Yang, K., Chen, H. and Liu, J., 2025. Vehicle-Assisted Service Caching for Task Offloading in Vehicular Edge Computing. IEEE Transactions on Mobile Computing.DOI: 10.1109/TMC.2025.3545444 [25] Surayya, A., Hussain, M.M., Reddy, V.D., Abdul, A. and Gazi, F., 2025. Evolutionary Algorithms for Edge Server Placement in Vehicular Edge Computing. IEEEAccess.10.1109/ACCESS.2025.35 66172 [26] Luo, X., Liu, D., Huai, S. and Liu, W., 2021, February. HSCoNAS: Hardware-software co-design of efficient DNNs via neural architecture search. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 418-421). IEEE.https://doi.org/10.23919/DATE51398.2021.94 73937 [27] Odema, M., Rashid, N., Demirel, B.U. and Al Faruque, M.A., 2021, December. LENS: Layer distribution enabled neural architecture search in edge-cloud hierarchies. In 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 403- 408). IEEE. https://doi.org/10.1109/DAC18074.2021.9586259 [28] Abreha, H.G., Hayajneh, M. and Serhani, M.A., 2022. Federated learning in edge computing: a systematic survey. Sensors, 22(2), p.450. https://doi.org/10.3390/s22020450 [29] Talpur, A. and Gurusamy, M., 2021, April. Reinforcement learning-based dynamic service placement in vehicular networks. In 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring) https://doi.org/10.31449/inf.v49i12.8792 Informatica 49 (2025) 361–376 361 CCR-LWECNN: A Lightweight CNN Framework for Chinese Calligraphy Recognition and Evaluation Xi Chen*,Jing Zhao Zhongyuan Institute of Science and Technology,Zhengzhou, Henan,461000,China *Corresponding author E-mail:chenxi19891028@126.com Keywords: character, CCR-LWECNN Chinese calligraphy recognition, deep learning, image processing system Received: April 3, 2025 This study presents a lightweight enhanced CNN architecture (CCR-LWECNN) for Chinese calligraphy recognition, addressing the challenges of multi-class classification across 12,152 labeled images spanning 960 Chinese characters in five calligraphic styles. Unlike previous studies limited to small character sets and single recognition approaches, this research integrates character recognition with image processing techniques. Data augmentation using TensorFlow’s Image Data Generator—applying rotation and zoom—was employed to improve class balance and variety. The proposed model, comprising five convolutional and three fully connected layers, processes 224×224-pixel images and leverages pretraining for robust feature extraction. CCR-LWECNN achieved superior performance with 96.5% accuracy, 95.6% precision, 95.2% recall, and 95.6% F1-score, outperforming baseline models such as traditional CNN (90.5%), SVM (85.2%), and Random Forest (75.4%). By effectively mitigating overfitting and underfitting through dropout layers and augmentation, this approach advances automated Chinese calligraphy recognition and provides a scalable solution for real-world applications. Povzetek: CCR-LWECNN je lahki izboljšani konvolucijski model za prepoznavanje kitajske kaligrafije, ki na 12.152 slikah dobre rezultate. Z združevanjem povečanja podatkov in učinkovite CNN-arhitekture izboljša prepoznavanje 960 znakov v petih slogih ter preseže klasične metode. 1 Introduction calligraphy offers advantages in addition to being a highly regarded art form [4]. Characters in Chinese calligraphy are made up of a lot Character recognition has emerged as a hotspot for more strokes than those in Western calligraphy [1]. A computer vision research as picture digitisation advances, single letter in Chinese calligraphy can be made up of as and it has significant applications in data entry for paper few as one stroke or as many as thirty. Before writing documents. Because handwriting characters have more begins, the ink is absorbed by dipping and then used to irregular shapes than printed documents, it is more difficult produce strokes with a soft hairbrush. Different styles are to recognise handwriting. Chinese calligraphy is a sort of produced as the calligrapher writes the character by handwriting art form that consists of five main font type varying the brush's pressure, speed, and direction [2]. [5]. Figure 1 shown by Chinese calligraphy different font Regular, clerical, cursive, semi-cursive, and seal are the type. most often used styles. These styles go under several names. For instance, referred to the semi-cursive style as the running style. The naming scheme employed by author will be applied in this study [3]. Beginning with a single style is beneficial for Chinese calligraphy students. The student might advance to another style after they are proficient at writing several characters in that style. An ancient art style that originated in China, Chinese calligraphy is also well-liked in a number of other nations, including South Korea, Japan, and Thailand. Using a brush and ink, Chinese calligraphy artists create visually appealing and well-composed characters. Chinese Figure 1: Chinese calligraphy different font type 362 Informatica 49 (2025) 361–376 X. Chen et al. However, many find it difficult to instantly identify the upto300 Chinese characters—roughly 8–12.4% of the content of calligraphy works since the shapes of the letters 2500 characters used every day—can be recognised by in Chinese calligraphy vary widely across calligraphers ReLU models. Furthermore, there aren't many examples and differ substantially from conventional fonts used in from old Chinese calligraphy masters, thus additional daily life. Therefore, by presenting the font and textual training sample photos are required. There is a need for content of the input calligraphy image, a real-time more research because calligraphy is only mentioned in calligraphy recognition system can aid amateur one empirical study on AI in education. calligraphers in understanding calligraphy works [6]. Instead of manually typing out the text, the method may 1.2 Contribution of this study also be used to digitise calligraphy by just entering the image of the piece. In this study, we developed and put into The three primary forms of Chinese calligraphy— use a convolutional neural network-based calligraphy character recognition, calligraphy production and recognition system. Compared to earlier research, the simulation, and calligraphy analysis—represent an system has higher accuracy rates for identifying both important field of study deep learning (DL). To enhance typeface and textual content. We created a dataset of Chinese character and image processing technology, this calligraphy characters to train the network, and we tested study blends dropout in CNN hidden layers, data the viability of the system using pictures of various augmentation methods, and CNN architecture. The calligraphy pieces [7]. suggested approach CCR-LWECNN allows for greater accuracy without requiring additional training photos by 1.1 Challenges in Chinese calligraphy recognising more than 960 Chinese characters in five recognition calligraphic forms. Other languages can also be added to the model. In order to assist in this paper to monitor their Chinese calligraphy is a difficult art form because of its progress during practice sessions. Related works, datasets, many Chinese characters, many styles, and intricacy [8]. methods, findings, implications, discussion, and Since art evaluation is subjective and can have a conclusions are all included in the parts that make up the detrimental effect on teacher-student relationships, it study. might be challenging to find qualified calligraphers and offer comments. Artificial intelligence (AI) can assist in 2 Literature review overcoming these obstacles by offering unbiased assessments and comments. But only tiny groups of Table 1 shows Summary of works Table1: Summary on related works Ref Methods Used Dataset Baseline & Proposed Key Findings Size Accuracy Method & Accuracy [9] CNN, TensorFlow Not Traditional OCR CNN + CNN significantly improves specified 80% TensorFlow recognition for handwritten 93.7% characters [10] Hybrid CNN + 20,000+ Basic CNN Proposed 91.8% Attention helps in Attention + images 87.5% distinguishing subtle Distillation calligraphic variations [11] MobileNet, CNN ~12,000 Tesseract OCR MobileNet 90.1% Suitable for lightweight 76.2% deployment in mobile/web [12] Deep CNN, CAI Not given Classic CNN Proposed hybrid Integration of CAI improves 84.6% 89.2% learning and recognition efficacy [13] CNN with Deep ~8,000 Hand-crafted Proposed 91.0% Deep stroke analysis Stroke Extraction stroke features provides structural and 78.4% aesthetic insight CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 363 [14] 5-layer CNN ~6,500 SVM 83.2% CNN 92.4% CNN better handles degraded or stylized historical samples [15] Traditional CNN + Not stated Template CNN 88.6% CNN adapts better to style Filters Matching 74.8% variance than traditional methods [16] Faster R-CNN, 10,000+ SSD 90.3%, Faster R-CNN Accurate segmentation and YOLOv3 YOLOv3 91.5% 95.1% detection for full-page manuscripts 3 Methodology dataset's photos were adequate for style recognition. However, a second technique was employed to enhance 3.1 Dataset the number of training photos since we required to increase the number of images per Chinese character for character In order to construct the style recognition model, we used recognition. Utilizing data augmentation was the second CCR-LWECNN models, which represent datasets and strategy. During the training phase, we rotated and zoomed image pre-processing. The character recognition model is in on the already example photos using TensorFlow's constructed via data augmentation and picture pre- Image Data Generator function to produce more sample processing. Kaggle's "Chinese calligraphy characters images. image set" serves as the training dataset for the image recognition model [17] we provide these resources will be The dataset was constructed by combining photos from the made available via a public GitHub Humanities & Social Sciences collection with the Kaggle repository:https://github.com/zhuojg/chinese-calligraphy- set. Hash-based comparison methods were used to find and dataset. 2890 calligraphy pictures totaling 960 characters eliminate duplicate photos in order to guarantee quality. were collected from various calligraphers and made Additionally, physical inspection and simple picture available to the public. These pictures are labeled as semi- quality checks (e.g., resolution thresholding and contrast cursive, regular, seal, cursive, or clerical. We employed the analysis) were used to filter out low-quality samples, such oversampling approach because of the dataset's label as blurred, low-resolution, or severely distorted images. A imbalance issue. Additionally, this analysis demonstrated clean and varied dataset for efficient model training was that overfitting would not result from oversampling. A far guaranteed by this preparation. larger dataset was required for the image processing model Data Augmentation: Random rotation (±15°), zoom (10- than for style recognition. This is due to the fact that each 20% scale variation), brightness modification (±20%), and character to be categorized belongs to a single output class horizontal flipping (50% probability) were used by the in this multiclass classification model. There would have data augmentation process to enhance sample variety. In been just 2890 training photos for 960 classes if we had order to replicate natural stroke fluctuations, we used 8x8 utilized the same dataset for style recognition. That would grid warping with σ=4 for elastic distortions. These imply that there would typically be no more than three parameters were chosen to increase the effective training pictures each word. We needed to figure out how to get dataset 5-fold without creating unreal artefacts, all the more training photos. To expand the dataset's picture count while maintaining calligraphic integrity. Bilinear for character recognition, we employed two strategies. interpolation was used to preserve stroke continuity Adding pictures from a public domain collection was the throughout the real-time implementation of all initial technique. An online database of the Humanities & transformations using TensorFlow's Image Data Social Sciences Database Catalogue contained the Generator. dataset's URL (Humanities & Social Sciences Database Catalogue, 2023). We crawled the page and gathered 3.2 Feature extraction photos using the Kaggle connection. However, the link In recent years, deep learning has been widely applied in was broken when this paper was written. Following the tracking, object identification, and other domains. By addition of pictures from this dataset, the final dataset integrating low-level characteristics to create high-level comprised 12,152 training photos, with at least 10 images features that represent the scattered aspects of data, it for each Chinese word. The train_test_split () function in simulates how the human brain functions. Usually, the the Scikit-learn package's data preparation module was Light Weight Enhanced traditional CNN is used directly used to divide these photos into training and testing sets. for image classification. Utilizing CNN's numerous The sorted Data folder included the training set. advantages in feature extraction is the aim of this work. Since there were just five styles in the output class, the 364 Informatica 49 (2025) 361–376 X. Chen et al. Compared to explicit feature extraction, digital feature completely linked layer that can identify the style of a extraction produces more detailed feature data for Chinese picture. The first convolutional layer of our suggested picture works. model has 32 filters, each of which has three channels and The CNN theoretical framework-based CCR-LWECNN is 3 x 3. We employ the same padding, which means that model was pretrained using the Kaggle dataset to extract the input images are zero-padded so that the filters overlap the visual attributes of Chinese calligraphy. The model is each pixel. We employed ReLU as the activation function a feed-forward neural network with the model has two in the convolution layer. Batch normalization is used to convolutional layers, not five, with one fully connected enhance model stability and performance. The maximum layer of 512 neurons and an output layer three fully pooling filter has a dimension of 2 × 2 and travels with a connected layers. The model resizes the input image to 224 stride of 2. It carries out the max pooling procedure on the by 224 pixels in order to produce a 4096-dimensional feature maps. To assist keep this model from overfitting to feature vector. Feature Extraction is presented as a the training data, we have implemented a dropout layer to pretrained feature extractor producing a 4096-dimensional shake off the neurons. The dropout value for the first vector for further classification, suggesting a two-step convolution layer is set at 0.20. The second convolutional pipeline. The proposed model, pretrained on real images, layer is created using 64 filters, each of which has three may be used to extract characteristics from Chinese channels and is 3 × 3. The same cushioning is employed calligraphy. First of all, Chinese calligraphy characters are here as well. The ReLU activation function is also applied an artistic reworking of natural surroundings and another to the feature maps. Once more, batch normalization is depiction of a natural image. Second, the deep structure of utilized in the second layer to enhance model performance. the CCR-LWECNN model may extract complex structures The max pooling filter is 2x2 in size, advances by a stride from rich perceptual input and generate intrinsic of 2, and performs the max pooling operation on the representations in the data. More than 10 million natural feature maps. To address the issue of overfitting, a dropout photos are being utilized for training in order to gain value of 0.25 has been chosen for the second convolution relevant information for Chinese calligraphy feature layer. A flatten layer has been employed after the second extraction. Chinese character-like feature information will convolutional layer's dropout value has been set. The be included in the recovered features either directly or outcome of the last pool layer is a victory type, and a fully indirectly. Last but not least, the study's training dataset linked layer with 512 neurons comes after it. The final does not contain enough Chinese writing pieces to features are then classified into many classes in the output adequately train the suggested model. Nevertheless, this layer using fully linked layers that were taken from the study uses it as a forerunner to the CNN model, which is previous pooling and convolution layers. The completely lightweight so that the components it extracts may better linked layers learn from features. Batch normalization has capture the artistic character of Chinese calligraphy once again been used. The dropout value of 0.5 was then recognition. applied. We once more employed ReLU for the activation function in the dense layer. Lastly, there are six nodes in 3.3 Model explanation with CNN the output layer, each of which represents six classes. Next, we classified the desired label in the output layer In Figure 2, the framework that extracts the key using the softmax activation function. Figure 2aand 2b characteristics of calligraphy recognition consists of two Showed in Architecture diagram with dimensionalflows. convolutional layers. The framework has a single, CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 365 Figure 2a: Image style recognition model Figure 2b: Architecture diagram with dimensional flow 3.4 Chinese calligraphy recognition based on thoroughly, and emphasize its contribution to image lightweight enhanced CNN algorithm recognition technology for Chinese calligraphy (CCR-LWECNN) recognition. Convolutional neural network (CNN) technology is a type CNN is an effective recognition method and a type of of neural network that is specifically designed to process neural network that mimics the visual structure of biology. images. Since its inception, the technology has seen Convolutional, pooling, and fully connected layers are the significant development. As a result, CNN has greatly primary components of this recognition system. One of aided people in processing visual information. However, CNN's primary functions is the convolution operation of this technology's computationally demanding approach the convolutional layer. The following illustrates the also restricts its use in a number of industries. Therefore, convolution computation of continuous functions: the primary research goals in the current image recognition 𝑠(𝑡) = ∫ 𝑥(𝑎)𝑤(𝑡 − 𝑎) ⅆ𝑎 (1) sector are to lower the computational cost of CNN and decrease the calculation time, optimise the technology 366 Informatica 49 (2025) 361–376 X. Chen et al. Equation (1) uses x and w to stand for integrable functions, f′(z(l)) is the activation function; μ is the learning rate; l is a and t for distinct computational components, and d for the level of neurones; i is neurones; T is a constant; δ is the the convolution operation. The following illustrates how difference between the network's true and predicted discrete functions are calculated using convolution: values; W is the weight; b is the bias of the neurone; z is the neuron's input; an is its output; and f′(z(l)) is the 𝑠(𝑛) = ∑𝑚 r[𝑚]𝑣 , (𝑛 − 𝑚) (2) activation function. The following formula is used to determine a sample's loss function: Discrete functions are represented by r and v in Equation (2), whereas calculation elements are represented by m and 𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 1/2||𝑦 − ℎ𝑤1,𝑏(𝑥)||2 (10) n. Convolution can be thought of as a filtering procedure in computer vision tasks. Typically, the input data is a two- The following illustrates how the fully linked layer's dimensional picture. Convolution is performed using a output data is calculated: two-dimensional discrete convolution in the manner described below: 𝑦 = 𝑓(𝑊. 𝑥 + 𝑏) (11) 𝑚 𝐼(𝑥, 𝑦) ∗ 𝑘(𝑥, 𝑦) = ∑ ∑𝑛 𝑡=0. k(𝑠, 𝑡)𝐼(𝑥 − 𝑠, 𝑦 − 𝑡) 𝑦 = o u t p u t v e c t o r o f t h e f u l l y c o n n e c t e d l a y e r ( 5 12 or 6 𝑠=0 elements) (3) 𝑊 =weight matrix In Equation (3), I stand for the output feature, k for the convolution kernel, m and n for the convolution kernel's 𝑥 =input feature vector dimensions, x and y for the feature output point, and s and t for the feature extraction point. Pooling the image and 𝑏 =bias vector producing the result are the roles of the fully connected layer and the pooling layer, respectively. Both forward and 𝑓 =activation function (ReLU or Softmax) backward propagation are included in the CNN model's computation. Forward propagation is a sequence of Each output neurone in a dense layer computes a weighted computations that use input data to perform tasks like sum of all input characteristics plus a bias term, which is image recognition and feature extraction, then combine then passed through an activation function. This and output the results. Backpropagation is the process of representation faithfully depicts the behaviour of the layer. using the computation results as input to determine the This adjustment guarantees mathematical lucidity and error as the fundamental reference data for model conforms to the norms used in the literature on neural optimisation. The network optimises the parameters it networks. learns by ongoing iterative training and updating, with training ending when the predetermined thresholds are CNN is carried out using W, and following decomposition, fulfilled. Among these, backpropagation computation the first t significant eigenvalues are substituted for W's involves forwarding the input sample (x, y) in order to decomposition as follows: determine the output value of L1, L2, …, Ln, and the 𝑤 = ⋃∑𝑉𝑇 = ⋃ ∑ 𝑉𝑇 (12) output layer error in the manner described below: 𝑡 (𝑛𝑖) A diagonal matrix is denoted by ∑, a v × t-dimensional 𝛿𝑖 = −(𝑦 − 𝑎(𝑛𝑖) . ) ⋅ 𝑓′ (𝑛 ) (𝑧 𝑖 𝑖 ) (4) orthogonal matrix by V, and an u × t-dimensional orthogonal matrix by U. As a result, CCR-LWECNN is Each layer's error computation is displayed as follows: represented as follows: 𝛿(𝑙) 𝑇 = ((𝑤)(𝑙)) 𝛿(𝑙+1))𝑓′(𝑤)(𝑙) (5) Y = Wx = U(∑ 𝑣𝑇) ⋅ 𝑥 = 𝑈 ⋅ 𝑧 (13) 𝑡 The following formula is used to determine the relative The CNN technology can be broken down by the CCR- derivatives of weights and biases: LWECNNalgorithm, significantly lowering the network's 𝑇 computing load. In addition to being straightforward, this Δ𝑤(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 𝛿(𝑙+1)(𝑎(𝑙)) (6) approach produces superior outcomes. This algorithm is designed to optimise the CCR-LWECNN. Simpler image Δ𝑏(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 𝛿(1+1) (7) computational processes are outside the CNN algorithm's capabilities, and the CCR-LWECNNalgorithm excels at The following are the revised weight parameters: handling them. Its output feature map definition is displayed as follows: 𝑤′ = 𝑤′ − 𝜇∇𝑤(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) (8) 𝑏′ = b − 𝜇∇𝑏(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) (9) CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 367 𝐶 ~ 𝑥 𝑤𝐶 𝑛 is the approximated weight for the nnn-th filter and 𝑦 𝐹𝑛(𝑥, 𝑦) = ∑ ∑ ∑ 𝑧𝐶 ( 1 𝑦′ 𝑥 , 𝑦′)𝑤𝑐 𝑛(𝑥 − channel C. 1 =1 𝑥 =1 1=1 𝑥′, 𝑦 − 𝑦′) 𝐻k 𝑛 : Projection matrix or basis vector used to reduce the (14) dimensionality of the filters (e.g., a learned kernel basis). 𝐹𝑛(𝑥, 𝑦): Output feature map at position (x, y) for the n-th 𝑉𝐶 𝑘 : Coefficient vector or activation feature for the kkk-th filter. component in channel C. 𝐶: Number of input channels. 𝛤: A transformation operator (e.g., transpose or non-linear function like activation or power). 𝑧𝐶 (𝑥1, 𝑦′): Input feature map for channel c at position (𝑥 − 𝑥′, 𝑦 − 𝑦′). H stands for the horizontal filter, V for the vertical filter, and K for the hyperparameter that regulates the rank. CCR- 𝑤𝑐 𝑛 : Filter weights for the n-th filter applied to channel c. LWECNNdoes, however, have some drawbacks. In other words, even though CCR-LWECNNhas produced strong 𝑥, 𝑦: Spatial dimensions of the input. results for model acceleration and compression, this 𝑛: Index of the output filter approach is difficult to execute. CCR-LWECNNmust be carried out layer by layer since various layers contain 𝑤𝑐(𝑥 − 𝑥′, 𝑦 − 𝑦′ 𝑛 ): Convolution kernel applied with different information, making it impossible to construct spatial shift. CCR-LWECNNusing a global variable. Furthermore, the network must undergo extensive fine-tuning training The channel is denoted by W, the filter by n, and the following decomposition in order to converge and produce position of the channel by C. The primary goal is to the best result. Figure 3 shows at Proposed model flow approximate W in the manner described below: diagram. ~ 𝑤𝐶 𝑘 𝑛 = ∑ 𝐻k = 𝑛 (𝑉𝐶 1 𝑘 )𝛤 (15) 𝑘 Equation (15) represents a low-rank approximation of the convolutional weight tensor W, where: Figure 3: Proposed model flow diagram Since its inception, machine learning has evolved human design in order to progressively enhance their own throughout time and has developed a number of flaws. learning process. As a result, the operator's basic technical Conventional machine learning methods need constant competence is pretty high and its dependence is 368 Informatica 49 (2025) 361–376 X. Chen et al. particularly big throughout the calculating process. width and height, as well as the center's horizontal and Additionally, machine learning has not advanced very far. vertical coordinates, are denoted by the letters x, y, w, and In the meantime, the algorithm cannot rapidly achieve h. The adjusted value is denoted by t. accurate image identification and has very low image recognition accuracy. The most significant of these is that Indeed, the use of Convolutional Neural Networks (CNNs) conventional machine learning technology is unable to is standard and well-justified for image recognition tasks precisely distinguish different aspects of the image, which due to their ability to capture spatial hierarchies in visual typically results in significant application failures. The data. In the CCR-LWECNN model, integrating dropout most significant is that machine learning is unable to layers helps prevent overfitting by randomly deactivating recognise the primary information in a picture and neurons during training, enhancing generalization. distinguish between the image's background and major Additionally, data augmentation (e.g., rotations, scaling, portion. The drawbacks of conventional machine learning flipping) increases training diversity, especially important technology in image recognition are addressed by when working with limited samples per class, improving optimised deep learning technology. Optimising the the model’s robustness across varied calligraphy styles. calculation process is essential to lowering the computing Together, these techniques contribute to the model’s cost and increasing the computational efficiency of deep strong performance. learning image recognition technology if it is to be used to a larger field. Consequently, the model calculation method Algorithm 1: CCR-LWECNN Core Steps is made simpler and the calculation effect is somewhat Data Acquisition & Preprocessing enhanced by optimising CNN technology and creating the Collect images of Chinese characters across multiple Faster-CNN model. Figure 3 illustrates the fundamental styles (e.g., seal, cursive). concept of the lightweight Faster-CNN model. Normalize image sizes (e.g., 64×64 pixels). In comparison to traditional CNNs like VGG16 (~138 million parameters, >15 billion FLOPs), the CCR- Apply data augmentation (rotation, flipping, noise LWECNN model has around 1.2 million parameters and addition) to expand limited samples (≤15 per class). needs 150 million FLOPs per forward pass. Because of its shallow architecture—just two convolutional layers, Model Architecture smaller (3x3) filter sizes, fewer fully connected neurones, and use of effective procedures like batch normalisation Use a lightweight enhanced CNN with: and dropout—it is regarded as lightweight. Because of its ability to lower memory and compute requirements, this convolutional layers (ReLU activation, batch architecture is appropriate for real-time and resource- normalization). constrained applications, including embedded or mobile systems. Max-pooling layers to reduce spatial dimensions. When the Faster-CNN model is applied to image feature Dropout layers to prevent overfitting. recognition in Figure 3, it can not only significantly speed up the process and increase its effectiveness, but it can also Flatten layer followed by 3 fully connected layers maximise the model's recognition effect and assist users in (e.g., 512-neuron layer + output layer with softmax). completing the style transfer of painting images. The Training region proposal network is the method used to optimise the Faster-CNN model. Using anchor points, it modifies and Train the model using cross-entropy loss and Adam enhances the Faster-CNN model's image recognition optimizer. domain in the following ways: Batch size, learning rate, and dropout rate should be 𝑋 = 𝑤𝑎𝑡𝑥 + 𝑥𝑎 (16) tuned via validation. 𝑦 = ℎ𝑎𝑡𝑦 + 𝑦𝑎 (17) Perform training over multiple epochs with early stopping if necessary. 𝑤 = 𝑤𝑎(𝑡𝑤) (18) Evaluation ℎ = ℎ𝑎(𝑡ℎ) (19) Use 10-fold cross-validation to compute average The abscissa and ordinate of the anchor point's centre accuracy, precision, recall, F1-score, and ± standard point, as well as its breadth and height, are denoted by the deviation. letters xa, ya, wa, and ha, respectively. The model's chosen CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 369 Report statistical significance using p-values 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 → 𝑇𝑃/ 𝑇𝑃 + 𝐹𝑃 compared to baseline models (CNN, SVM, RF, etc.). 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 → 2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙/ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + Prediction 𝑟𝑒𝑐𝑎𝑙𝑙 For new input images, feed them through the trained Each classifier was assessed using the balanced F1 Score, CCR-LWECNN to output probabilities across target recall, and precision of popular techniques including classes (character+style or style only). support vector machine (SVM), Random Forest (RF) [18], Bonferroni Mean Fuzzy K-Nearest Neighbors (BM- FKNN) [19], and CNN [20]. In order to evaluate the efficacy of the suggested approach, a classifier model was 4 Results and discussion constructed by importing Dataset. Seal font, cursive font, semi-cursive font, clerical font, and standard font are 4.1 Experimental setup among the features. It was discovered that 10% of the samples were test samples, while 90% of the samples were The Intel(R) Core (TM) i74700 HQ CPU run at 2.40 GHz training samples. The likelihood that the lightweight CNN was the PC used in this experiment. It has 16.00 GB of will provide correct predictions is known as the accuracy RAM. The OS is Windows 8, 64-bit. Weka Ver 3.8.1 rate. The Adam optimiser was used to train the CCR- utilised to create and evaluate DL models, and Python LWECNN model because of its effective convergence and Anaconda Ver 2020.20 with the Seaborn library Ver 0.10.0 adjustable learning rate. During training, a batch size of 32 is used for correlation analysis. was used, and the initial learning rate was set at 0.001. Validation loss was recorded across epochs to keep an eye To ensure replicability and fair evaluation, the dataset of on overfitting, and training was stopped early after five 12,152 samples was divided as follows: Training set as epochs if there was no discernible improvement in 70% (8,506 samples), Validation set as 15% (1,823 validation loss. Data augmentation and dropout layers also samples), Test set as 15% (1,823 samples). Splitting was assisted in lowering the danger of overfitting. performed stratified by character class and style, ensuring balanced representation of each character–style Accuracy is defined as the proportion of image processing combination across all splits. Data augmentation was techniques that are reliably and accurately identified. applied only to the training set, preserving the integrity of Table 2a and Figure 4 present the accuracy findings. The the validation and test sets. current CNN (90.5%), Random Forest (75.4%), SVM (85.2%), and BM-FKNN (88.7%) algorithms were all 4.2 Performance analysis surpassed by our suggested CCR-LWECNN (96.5%). The baseline "CNN" refers to a conventional, standard CNN A variety of indicators are needed in order to compare the architecture commonly used in calligraphy or handwritten experiment's outcomes. The accuracy rate is the character recognition tasks. This baseline employs two probability that the classifier will produce accurate convolutional layers with ReLU activations and max predictions. The recall rate is the percentage of a Chinese pooling, followed by a fully connected layer for calligraphy image that are accurate for all 5 Fonts in that classification—essentially a straightforward feature within the dataset. We assess performance using implementation without the architectural refinements (e.g., various metrics, including F1-score, accuracy, recall, and optimized dropout rates and enhanced feature extraction) precision. Accuracy is defined as the percentage of total that distinguish our CCR-LWECNN model. We will revise samples properly identified by the classifier in (1). The the manuscript to provide a detailed description of this total number of samples that the classifier found to be baseline architecture, ensuring that the comparative positive accurately identified as positives in (2) is known evaluation is transparent and that readers understand the as the recall. Precision, which appears in (3), is the total specific differences between the baseline CNN and the number of classifier-predicted positive samples that are proposed CCR-LWECNN model. While Figure 6 shows true positives. By combining the precision and recall found that the CNN with uneven margins produced a greater F1 in (4), the F1-score calculates a balanced average result. and balanced Precision and Recall, Figure 6 shows that the True positive (TP), false positive (FP), true negative (TN), recall suggested by the proposed technique was 95.2%. and false negative (FN) are the many metrices that can be The precision result for our recommended approach, calculated using the equations below. which has the maximum precision at 95.6%, is shown in 𝑇𝑃+𝑇𝑁 Figure 5. With uneven margins, the CCR-LWECNN 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 → 𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 produced a 95.6% statistically significant better F1 than the conventional CNN. Comparing the various situations, 𝑅𝑒𝑐𝑎𝑙𝑙 → 𝑇𝑃/ 𝑇𝑃 + 𝐹𝑁 CCR-LWECNN, and novel tactics, Chinese calligraphy 370 Informatica 49 (2025) 361–376 X. Chen et al. has significantly improved in effectiveness. Table 2 shows Values of Accuracy, precision, Recall, F1-score. Table 2: Values of Accuracy, precision, Recall, F1-score Training set Methods Test-set Methods Accuracy (%) Precision Recall F1-Score (%) (%) (%) CNN 90.5 89.1 90.2 90 RF 75.4 73.5 72.2 73.1 SVM 85.2 84.7 84.1 83.7 BM-FKNN 88.7 85.5 88.3 87.4 CCR-LWECNN [Proposed] 96.5 95.6 95.2 95.6 Table 3: Model performance per calligraphy style (averaged across test folds): Style Accuracy Precision Recall F1 (%) (%) (%) (%) Regular ( 98.2 97.8 98.1 98.0 楷书) Semi- 95.4 94.9 95.1 95.0 cursive ( 行书) Cursive ( 93.1 91.7 92.3 92.0 草书) Figure 4: Result of accuracy outcome Seal (篆 94.5 93.2 94.0 93.6 书) Clerical ( 96.0 95.4 95.8 95.6 隶书) CCR-LWECNN generalizes well across all styles, with particularly strong performance on Regular and Seal scripts, and maintains high F1-scores across more complex styles like Cursive and Clerical. Other models showed more variance and lower scores, especially on cursive scripts. CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 371 Figure 5: Result of precision outcome Figure 7: Result of F1-score outcome Figure 6: Result of recall outcome Figure 8: Overall performance of existing and proposed method outcome 372 Informatica 49 (2025) 361–376 X. Chen et al. Figure 4 shows Result of Accuracy. Evaluation of BM- 88.7 ± [88.0, 89.4] 87.4 performance across a range of input variables, including FKNN 0.8 ± 0.7 variations in picture quality, subtleties in style, and noise, is crucial to determining the model's resilience in real- CCR- 96.5 ± [96.0, 97.0] 95.6 world situations. Figure 5 shows Result of Precision LWECNN 0.6 ± 0.4 Outcome. This entails evaluating the model over a range of handwriting styles and on calligraphy pictures that are noisy, low-resolution, or blurry. By demonstrating actual Table 4 Shows Performance Summary with Statistical application dependability, this assessment helps guarantee Rigor. The CCR-LWECNN model's lightweight design that the CCR-LWECNN model generalises effectively and makes it ideal for embedded systems and mobile devices, retains high accuracy even in less controlled or degraded even if system implementation is not extensively covered. conditions. Figure 6 shows Result of Recall Outcome. Without requiring sophisticated servers, its modest model Even while we anticipate that current techniques will size and low computational burden allow for effective improve the base classifier's performance, in several inference on devices with limited resources, enabling real- instances, a single classifier has produced identical or time calligraphy detection in mobile applications. superior outcomes. Decision trees, for instance, outperformed the BM-FKNN in this instance, with 88.7% accuracy and 85.6% precision, respectively. Figure 7 shows Result of F1-score Outcome. The same best results were also obtained by CCR-LWECNN, with 96.5% accuracy, 95.6% precision, 95.2% recall, and 95.6% F1 score. Overall, out of the ten classifier features using current techniques, CCR-LWECNN produced the best results shows in figure 8. A detailed performance comparison was conducted between the proposed CCR-LWECNN model and several baseline models—including a standard CNN trained from scratch and transfer learning models using pre-trained networks like MobileNetV2 and EfficientNet-B0—on the same dataset. CCR-LWECNN consistently outperformed these baselines, achieving higher accuracy, precision, recall, and F1-scores while maintaining a smaller model size and faster inference. This demonstrates that CCR- LWECNN’s lightweight architecture and tailored Figure 9: Outcome of Learning rate enhancements effectively improve Chinese calligraphy recognition over conventional and transfer learning The learning rate visualisation figure 9 illustrates how the approaches. loss of the model reacts to varying learning rates. The ideal learning rate range is indicated by a sharp decline in loss Table 4: Performance summary with statistical rigor that is followed by instability or a plateau. Here, the graph shows that the CCR-LWECNN model converges most Model Accuracy 95% F1- quickly and steadily when the learning rate is adjusted (%) ± SD Confidence Score between 0.001 and 0.005, preventing divergence (from too Interval (%) ± high a rate) or sluggish training (from too low a rate). The SD generalisation and efficiency of the model are enhanced by this adjustment. CNN 90.5 ± [89.8, 91.2] 90.0 0.9 ± 0.7 RF 75.4 ± [74.3, 76.5] 73.1 1.1 ± 1.2 SVM 85.2 ± [84.3, 86.1] 83.7 1.0 ± 0.8 CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 373 Confusion Matrix: Table 5: Outcome with comparison of recent methods Model Accur Precisi Rec F1- Para acy on (%) all Sco ms (%) (%) re (M) (%) MobileNe 94.2 93.7 93.1 93.4 3.4 tV2 EfficientN 95.3 94.8 94.1 94.4 5.3 et-B0 ViT-Tiny 92.5 91.6 91.2 91.4 5.7 CCR- 96.5 95.6 95.2 95.6 1.2 LWECN N Figure 10: Confusion matrix for the proposed model The provided confusion matrix in figure 10 evaluates the ROC Curve performance of the CCR-LWECNN model in classifying five Chinese calligraphy styles. (Regular/楷书, Semi- Cursive/行书, Cursive/草书, Seal/篆书, and Clerical/隶书 ). The diagonal values (ranging from 0.93 to 0.98) demonstrate strong classification accuracy, with Regular script (楷书) achieving the highest accuracy at 98%. The most notable misclassifications occur between Cursive ( 草书) and Semi-Cursive (行书), with 4% of Cursive samples incorrectly predicted as Semi-Cursive, likely due to their stylistic similarities in stroke connectivity. Other errors are minimal (≤3%), such as Seal (篆书) occasionally confused with Cursive (3%) or Clerical (隶书) with Semi- Cursive (1%). The numerical gradient (1 to 0) implies a visual color scale for interpretation, where higher values (closer to 1) represent correct predictions and lower values (closer to 0) indicate errors. This analysis confirms the model’s robustness in distinguishing calligraphy styles Figure 11: ROC curve for the Suggested method while highlighting expected challenges in discriminating fluid, connected scripts like Cursive and Semi-Cursive. Figure 11 shows the CCR-LWECNN model's ROC curve how well it can differentiate between binary classes, is Evaluate with recent methods: shown below. With an AUC of around 0.72 in this For contemporary picture classification problems, models simulated example, the curve illustrates the trade-off such as BM-FKNN, Random Forest, and SVM are less between the True Positive Rate (sensitivity) and the False appropriate, particularly when dealing with high- Positive Rate. Better model performance is indicated by a dimensional data like calligraphy images. We contrasted larger AUC, and this visualisation aids in evaluating the suggested CCR-LWECNN with lightweight deep classification efficacy over a range of thresholds. learning models designed for low-resource settings in order to give a more relevant benchmark. Table 5 given by Outcome with comparison of recent methods 374 Informatica 49 (2025) 361–376 X. Chen et al. 4.3 Discussion which makes it simple to submit images and shows identification results with unambiguous visual feedback. The suggested CCR-LWECNN model is better at Low latency, usually less than one second, is guaranteed recognising Chinese calligraphy since it is more by its lightweight design, allowing for quick and seamless computationally efficient than deeper architectures like interaction. By enabling users to rapidly explore DenseNet and BiConvExtractNet. DenseNet is great at calligraphy styles and characters, this promotes real-time reusing features via dense connections, while usability in workshops, classrooms, and museum kiosks, BiConvExtractNet uses bidirectional convolutional improving learning experiences and engagement. extraction for jobs with a lot of complexity. However, both models frequently consume a lot of resources and are likely to overfit on small artistic datasets. CCR-LWECNN, on the other hand, has a lightweight structure with well 5 Conclusion adjusted convolutional layers and dropout regularisation. It gets 96.5% accuracy on a calligraphy dataset with much The goal of this research is to identify Deep Learning less complexity and training cost. The CCR-LWECNN models that can accurately identify and assess image model successfully captures the geometric regularity in processing technologies on a bigger dataset that includes seal script and the fluid stroke dynamics in cursive script, the majority of commonly used Chinese characters. This it works well on both seal and cursive styles. Both high- goal was accomplished as our models, which were level stylistic elements and low-level texture are extracted constructed using CCR-LWECNN, obtained an image by its layered design, and data augmentation guarantees recognition accuracy of 96.5% for a 960-character set, resilience to handwriting variances. which is more than three times larger than previous research of a comparable kind. Thus, we demonstrated CCR-LWECNN's decreased generalisation between that, with a very short dataset, it is possible to construct a calligraphers is a major drawback since intra-style lightweight CNN with excellent accuracies in character discrepancies might result from differences in individual and picture recognition models by combining the ReLU, stroke patterns, pressure, and spacing. When applied to dropout, and data augmentation. For users to better fewer-represented calligraphers or unexplored writing understand how they might do better in the future, the styles, the model may become less successful due to comparison tool could show which aspects of the overfitting to prevalent patterns in the training data.CCR- calligraphy work are problematic. Lastly, style and image LWECNN makes it possible to accurately and recognition models in non-printed calligraphy works in automatically classify calligraphy styles and characters, it other languages may benefit from the techniques shown in facilitates the digitisation, cataloguing, and analysis of this study. historical works at scale, hence supporting heritage CCR-LWECNN is utilized to increase the system's preservation and digital archiving. This makes it easier to efficacy. Using pictures of various calligraphy pieces, the do cultural study, teach, and preserve traditional Chinese system's ability to recognize Chinese calligraphy has been calligraphy in digital form across time. The CCR- demonstrated. Additional features, such a dictionary LWECNN-based system is perfect for educational and function, will be added to the system in the future by cultural applications because of its user-friendly interface, linking it to other databases. References [1] Zeng, W. (2021, January). The Influence and directional diffusion equation. Applied Mathematics Communication of Chinese Calligraphy in South and Nonlinear Sciences, 8(1), 1509-1518. doi: Korea. In The 6th International Conference on Arts, https://doi.org/10.2478/amns.2022.2.0139 Design and Contemporary Education (ICADCE 2020) (pp. 720-723). Atlantis Press. doi: [4] Wong, A., So, J., & Ng, Z. T. B. (2024). Developing https://doi.org/10.2991/assehr.k.210106.137 a web application for Chinese calligraphy learners using convolutional neural network and scale [2] Lee, C. H., & Lee, Y. C. (2021). Effects of different invariant feature transform. Computers and finger grips and arm positions on the performance of Education: Artificial Intelligence, 6, 100200. doi: manipulating the Chinese brush in Chinese https://doi.org/10.1016/j.caeai.2024.100200 adolescents. International Journal of Environmental Research and Public Health, 18(19), 10291. doi: [5] Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. https://doi.org/10.3390/ijerph181910291 (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence [3] Cai, W. (2022). Chinese painting and calligraphy review, 53, 5455-5516. doi: image recognition technology based on pseudo linear https://doi.org/10.1007/s10462-020-09825-6 CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 361–376 375 [6] Zhang, X., Li, Y., Zhang, Z., Konno, K., & Hu, S. In 2021 IEEE International Conference on Advances (2019). Intelligent Chinese calligraphy beautification in Electrical Engineering and Computer Applications from handwritten characters for robotic writing. The (AEECA) (pp. 405-410). IEEE. doi: Visual Computer, 35, 1193-1205.doi: https://doi.org/10.1109/aeeca52519.2021.9574199 https://doi.org/10.1007/s00371-019-01675-w [16] Peng, X., Kang, J., Wu, Y., & Feng, X. (2022). [7] Wu, X., Chen, Q., Xiao, Y., Li, W., Liu, X., & Hu, B. Calligraphy Character Detection Based on Deep (2020). LCSegNet: An efficient semantic Convolutional Neural Network. Applied segmentation network for large-scale complex Sciences, 12(19), 9488. doi: Chinese character recognition. IEEE Transactions on https://doi.org/10.3390/app12199488 Multimedia, 23, 3427-3440. doi: https://doi.org/10.1109/tmm.2020.3025696 [17] B. Bing, 2022. “Chinese Calligraphy Characters Image Set,” Kaggle.com, Available: [8] Liu, X., Hu, B., Chen, Q., Wu, X., & You, J. (2020). https://www.kaggle.com/datasets/bai224/chinese- Stroke sequence-dependent deep convolutional neural calligraphy-characters-image-set. network for online handwritten Chinese character recognition. IEEE transactions on neural networks [18] Yuan, S., Wang, Y., Wang, X., Deng, H., Sun, S., and learning systems, 31(11), 4637-4648. doi: Wang, H., ... & Li, G. (2020, June). Chinese sign https://doi.org/10.1109/tnnls.2019.2956965 language alphabet recognition based on random forest algorithm. In 2020 IEEE International Workshop on [9] Li, Y., & Li, Y. (2021, June). Design and Metrology for Industry 4.0 & IoT (pp. 340-344). implementation of handwritten Chinese character IEEE. doi: recognition method based on CNN and TensorFlow. https://doi.org/10.1109/metroind4.0iot48571.2020.91 In 2021 IEEE International Conference on Artificial 38285 Intelligence and Computer Applications (ICAICA) (pp. 878-882). IEEE. doi: [19] Eko, Y. P. (2021, October). Bonferroni Mean https://doi.org/10.1109/icaica52286.2021.9498146 Fuzzy K-Nearest Neighbors Based Handwritten Chinese Character Recognition. In 2021 International [10] Yang, L., Wu, Z., Xu, T., Du, J., & Wu, E. (2023). Conference on Data Science and Its Applications Easy recognition of artistic Chinese calligraphic (ICoDSA) (pp. 118-123). IEEE. doi: characters. The Visual Computer, 39(8), 3755-3766. https://doi.org/10.1109/icodsa53588.2021.9617488 doi: https://doi.org/10.1007/s00371-023-03026-2 [20] Cui, W., & Inoue, K. (2021). Chinese calligraphy [11] Pang, B., & Wu, J. (2020, August). Chinese recognition system based on convolutional neural calligraphy character image recognition and its network. ICIC Express Letters, 15(11), 1187-1195. applications in Web and Wechat applet platform. doi: https://doi.org/10.24507/icicel.15.11.1187 In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (pp. 253-260). doi: https://doi.org/10.1145/3383583.3398516 [12] Si, H. (2024). Analysis of calligraphy Chinese character recognition technology based on deep learning and computer-aided technology. Soft Computing, 28(1), 721-736. doi: https://doi.org/10.1007/s00500-023-09423-y [13] Li, M., & Ren, G. (2023). Intelligent Evaluation Method of Calligraphy Characters Based on Deep Stroke Extraction. Adv. Comput., Signals Syst., 7, 99- 106.doi: https://doi.org/10.23977/acss.2023.071014 [14] Huang, Q., Li, M., Agustin, D., Li, L., & Jha, M. (2023). A novel CNN model for classification of Chinese historical calligraphy styles in regular script font. Sensors, 24(1), 197.doi: https://doi.org/10.3390/s24010197 [15] Chen, L. (2021, August). Research and application of chinese calligraphy character recognition algorithm based on image analysis. 376 Informatica 49 (2025) 361–376 X. Chen et al. https://doi.org/10.31449/inf.v49i12.9916 Informatica 49 (2025) 377–388 377 Enhancing Machine Translation of English Complex Sentences Using Refined Gradient CNN on Large-Scale Corpora SiYuan Li Department of Business and Foreign Languages, Hunan International Business Vocational College, Changsha, Hunan, 410200, China E-mail: lisiyuan198084@gmail.com Keywords: optimization, machine translation algorithm, english complex long sentence, refined gradient - convolutional neural network Received: June 30, 2025 Optimization of long and complicated sentences in English. Translating complex, lengthy statements from one language to another is the job of computer systems called machine translation algorithms (MTAs). A machine translation assistant (MTA) that trains on a big data corpus is one that makes use of a diverse and extensive collection of textual resources to improve translation quality. Translating complex and lengthy English sentences poses significant challenges for machine translation (MT) systems, especially when preserving semantic accuracy. It introduces the Refined Gradient-CNN (RG-CNN) model as a post- processing refinement mechanism to enhance phrase-level translation accuracy. The model is trained on a specially curated "Parallel Corpus" dataset comprising 1,563 English sentence pairs, including complex originals and their simplified counterparts. The RG-CNN employs gradient-enhanced convolution and bidirectional recurrent layers to capture and refine syntactic structures. The model is implemented using Python 3.11. Experimental results demonstrate the model’s superior performance. It achieved BLEU scores of 73.1% (corpus) and 70.1% (local), significantly outperforming. Likewise, RG-CNN reported a reduced WER of 0.3% (corpus) and 0.10% (local) compared to baseline models. Accuracy and recall were also improved to 97.51% and 98.43%, respectively, outperforming the baseline model. These results affirm RG-CNN's ability to optimize complex sentence translation, reduce ambiguities, and advance MT systems across diverse linguistic domains Povzetek: Model Refined Gradient-CNN (RG-CNN) je predlagan za izboljšanje strojnega prevajanja dolgih in kompleksnih angleških stavkov, zlasti za fraze. Model je treniran na obsežnem korpusu (1.563 parov) in optimira prevode kompleksnih besednih struktur. 1 Introduction processes and thus can translate difficult, lengthy texts with ease and efficiency [2]. With large language models, The English complex long sentence machine translation machine learning, high-performance NLP techniques, method is designed to be more efficient and accurate with context and semantics considered, efficiency boost, and a a lot of important parameters. The method ought to be human subject-matter expert feedback loop, all are context-aware and aware of the original sentence's included in the improvement of the English complex long meaning so that it can generate translations that are very sentence machine translation. All these in consideration, faithful to the original text. It should be capable of dealing the algorithm's translation capability for complex, long with idiomatic expressions, cultural references, and texts may be significantly enhanced [3]. The English metaphors to generate translations that are faithful to the complex sentence machine translation technique from a original text. In order to effectively address the large corpus of data needs to be trained to improve the computational requirements for analyzing compound, quality, accuracy, and fluency of the translation. The lengthy words, the approach can take advantage of parallel following are a few key considerations: Choose a large, processing methods and distributed computation models to broad-based, and comprehensive corpus that includes a offer optimal efficiency [1]. The research ensures that even range of topics, genres, and styles [4]. Align the source and sentences that are long and complex get translated outputs target sentences in the corpus to produce aligned sentence within a reasonable timeframe. Optimization and analysis pairs for generating the translation. It can learn English of the translated output can be facilitated by incorporating phrase-to-phrase mappings and their respective a human translator or linguist feedback loop. The computer translations through a critical phase while training a can learn from human experience through repeated supervised machine translation system. Apply parallel 378 Informatica 49 (2025) 377–388 S. Li processing techniques and distributed computing and subject matters. Data collection and cleaning must be platforms for efficient management of the enormous performed cautiously when creating a large corpus of data. processing involved in training over an enormous data It is ensured that text data covers a wide range of language corpus. It is easy to scale up and accelerate the training [5]. and context variations by obtaining it from various sources Employ the most advanced neural network architectures, [10]. Preprocessing methods, such as tokenization, phrase for instance, transformer architectures, which have made breaking, and part-of-speech tagging, are employed in significant jumps in machine translation tasks. The models attempting to process data to train and analyze. The are better at handling complex sentence structure and long- various techniques are able to train the machine translation distance relations. For pre-training the machine translation models, such as statistical machine translation (SMT) and model, employ the pre-trained language models like BERT neural machine translation (NMT), upon the creation of the or GPT. Fine-tune the model on the huge data corpus [6], big data corpus. The models can learn more complex particularly for the translation of very hard, long sentences. phrase structures, language patterns, and correspondences Transfer learning helps the model learn to identify because they have larger training data. Big data corpora universal language use patterns, and fine-tuning helps it have helped machine translation advance [11]. Big data learn to follow the specific translation task that is being analysis enables researchers and developers to train and performed. Generate artificial sentence variations or better develop models, which enhance translation quality, paraphrases to improve the training set. The algorithm manage complex sentence structures better, are natural- becomes stronger and more immune to complex, long sounding, and possess greater awareness of context. sentences by being exposed to more varied sentence forms MTAS should work much better when there is a massive and language variations. Employ the automated translation amount of data that is high in quality. To ensure the corpus system and collect user ratings of translations. Employ the is representative, precise, and unbiased, or noise-free, and feedback to improve the quality of the translations over does not adversely affect the translation outcome, time, using repeated usage in the training process. The meticulous data selection, preprocessing, and curation are research enables the algorithm to capture user preferences needed. Lastly, a large quantity of data as a corpus allows and personal translation challenges from complex, lengthy machine translation systems to learn from heterogeneous sentences [7]. Use standard measuring devices at regular linguistic data, leading to more efficient and robust intervals to test how well the improved algorithm is translation abilities for practically any level of sentence performing. Compare the algorithm with other cutting- complexity and linguistic diversity [12]. edge machine translation systems to evaluate its performance and what it needs to improve on. The English Key contributions: compound long sentence MTA can be translated better, more fluently, and contextually using the power of big data • The application of a certain machine translation corpora and implementing the optimization techniques. It algorithm to process lengthy, complicated words, generates more accurate, efficient, and effective designing the Refined Gradient-CNN model, applying a huge training set, and optimization translations of long compound sentences [8]. Large and methods used to enhance word translation diverse quantities of text data referred to as a "big data accuracy. corpus" undergo processing in machine translation and • This Research aims to overcome the difficulties other NLP tasks, training, testing, and also updating the in translating complex phrase patterns and processes. "Big data" thus addresses the sheer volume, enhance the general effectiveness of machine variety, and velocity of data that can be processed and translation systems. analyzed [9]. The size and variation of the corpus are also significant factors that decide the level at which it can be 2 Related work successfully used for training. With a large corpus, there can be complete utilization of different grammatical Research in many areas must include a literature survey, patterns, lexis, idiomatic expressions, and language commonly called a literature review or systematic review. variation. The variation of the corpus ensures that the It entails a thorough review and analysis of the research algorithm is trained in multiple domains and styles, body, shown in Table 1. thereby ensuring that it is capable of handling mixed styles Table 1: Literature survey References Objectives Summary of Findings Limitations [13] Research suggests that the segmentation of lengthy The study evaluated the features of professional Limited focus on phrases is made possible by the hierarchical literature and discussed a translation optimization structural network of ideas technique, which has been approach for professional literature, combining segmentation; lacks enhanced. statistics, which significantly increases. handling of deep syntactic variations in English CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 377–388 379 [14] Research improved to build researchers used the The study focused on the neural machine translation Lacks real-time multi-objective optimization technique. The study model's probabilistic structure, which allows learning adaptation; also employs parallel corpora and monolingual researchers to draw conclusions about data-related limited performance corpora routes with an emphasis on node regularization items and apply them. on unseen sentence distribution and data flow analysis. structures [15] The study improved the framework to optimize a It showed that the newly suggested computer- Relies on memory- computer-assisted translation system to increase assisted translation system can improve translation based context, the accuracy and reliability of automatic quality and intelligently translate memory-assisted which is insufficient translation of long-character English with long-character English with high data recall rates, for unseen phrase memory-assisted English. accuracy, and dependability. structures [16] The Research introduced by employing a word The overview objective of the goal showed that, in Focused only on corpus, the word alignment optimization approach comparison to the earlier methods, the suggested word alignment; enhances word alignment performance in the technique lowers the average alignment error rate. doesn't address transformer system. phrase-level semantic coherence. [17] The Research suggested that a model for The study evaluated the process of fuzzy semantic Limited calculating language-semantic correlation that selection achieved using a machine learning neural grammatical uses the best fuzzy semantics for English lengthy network adaptive learning technique. handling; doesn't sentences should be developed. scale well for professional or complex sentence contexts. [18] The study suggested language combinations and The Research focused on creating human and Data-centric collected and cleaned texts from diverse sources to automated assessments of the resulting models. approach; lacks form four parallel corpora, which were used to structural model build the translation system. improvements for long or technical texts [19] The overall objective of the goal was to explore the The Research investigated to show the viable Focused on a two different NMT algorithms, Bidirectional Long direction for Research to improve Bangla-English specific language Short-Term Memory (LSTM) and Transformer- NMT. pair; not based NMT, used for the Bangla-to-English generalizable to language pair. English complex phrase structure. [20] The study analyzed that well-known translator like The Research examines English, which has been the Not optimized for Google Translate do quite well when translating base or source language for the vast majority of NLP general MT between English, French, or Spanish. Still, studies research projects that have been discovered so far. performance; lacks make trivial mistakes when translating recently The study had several regularly spoken potential contextual learning introduced languages like Bengali, Arabic, etc. languages that still need to be explored. layers [21] The overall objective of the goal is to briefly The study showed that the backpropagation (BP) present the voice recognition neural network neural network recognized speech more quickly than Focused on speech technique. The machine translation method was artificial recognition and with a reduced word input; not optimized then put through simulation studies and contrasted mistake rate. for written complex with two additional machine translation language translation techniques. [22] To enhance Punjabi-to-English NMT translation Incorporating MWEs and word embeddings Limited to Punjabi- by addressing out-of-vocabulary (OOV) words improved translation fluency and adequacy, English pair; does and multi-word expressions (MWEs). achieving BLEU scores of 15.45, 43.32, and 34.5 on not generalize to small, medium, and large test sets, respectively. other low-resource or morphologically rich languages. [23] BLEUₙ-based evaluation, residual comparison, NMT showed higher translation quality than SMT Focus limited to Google Translate and European Commission’s across all BLEUₙ scores English–Slovak; no Translation tool (EC) tool deep feature extraction or semantic ranking; lacks domain- independent generalization 3 Methodology was a challenge for Transformer-based models [19] and Bidirectional LSTM models [19]. Furthermore, 3.1 Problem statement conventional approaches have significant word error rates and are ineffective at managing contextual memory [15]. To improve the quality of translation of lengthy and The proposed Refined Gradient-CNN model overcomes complicated English sentences, especially for low- all of the above limitations by incorporating contextual resource language pairings, while existing methods like memory encoding and layered semantic mapping, which fuzzy semantic selection approaches [17] encourage improves translation accuracy for complex phrase phrase segmentation and semantic correlation, but cannot structures. be particularly effective at learning deep contextual relationships. Similarly, structural alignment for long- sentence translation, especially minority languages [20], 380 Informatica 49 (2025) 377–388 S. Li 3.2 Experimental procedure Sentences that are distinguished by their length and complicated structure are referred to as being optimized This section provided a thorough explanation of how for English complex, lengthy sentences. Many clauses, the steps of the suggested design in Figure 1 were created, sub-clauses, phrases, modifiers, and dependent covered its creation process, and covered its key connections may be found in the sentences. Complex, components. This analysis has four parts: Information lengthy phrases may be difficult to understand, gathering is generally the main goal of the first stage. The comprehend, and translate because of their complex syntax second part included MTA for long English sentences. The and potential for ambiguity. The rearranging module most significant information is found in the third part, intends to fit more closely the short phrases' translation which describes the work performed to develop the with the language order following the combination by Refined Gradient-CNN model and compile the essential rearranging the short phrases generated through knowledge. The efficiency of each current and previous segmentation. Figure 2 depicts the upgraded intelligent design is presented in the fourth section. It is judged by MTA's flow. contrasting the pertinent factors. Figure 1: Methodological design A. Data collection Figure 2: Translation process of MTA for long English sentences To optimize sentences and enhance translation was validated it using the 1,563 English sentence pairings from The sentence segmentation module is designed to the "Parallel Corpus" dataset. Each record includes meta- split lengthy English sentences into shorter, manageable data such as readability ratings, difficulty levels, and segments. This is accomplished by predicting the domains, as well as the original complicated text and its likelihood of each word being a segmentation point using simplified or optimized translation. The aim of decreasing a maximum entropy (MaxEnt) classifier. The MaxEnt language complexity, enhancing semantic retention in AI, approach is particularly suitable here because it models education, and NLP applications, and enhancing phrase conditional probabilities flexibly without assuming translation quality at the phrase level are all complemented independence among features. by this dataset. For model development and evaluation, the exp⁡(∑𝑗 𝑧𝑗ℎ𝑗(𝑣,𝑢(𝑧))) dataset was partitioned into 70% for training (1,094 𝑜(𝑣|𝑢(𝑧)) = (∑𝑣 𝑒𝑥𝑝(∑𝑗 𝑧𝑗ℎ𝑗(𝑣′,𝑢(𝑧))) samples), 15% for validation (235 samples), and 15% for (1) testing (234 samples).This structured split enables robust model performance assessment across varied difficulty Where 𝑜(𝑣|𝑢(𝑧)),⁡the likelihood that a word levels and domain contexts, making it a reliable will𝑧 is a segmenting term in the lengthy statement, 𝑢(𝑧) benchmarK due to its rich linguistic annotations and is the background knowledge. domain diversity. It can serve as a thorough benchmark for model performance testing on a range of difficulty and After segmentation, the reordering module context aspects due to its extensive domain coverage and rearranges the segmented short sentences to reflect the strong language annotation. original logical flow. This is again modeled using a maximum entropy classifier, which estimates the Source: https://www.kaggle.com/datasets/ziya07/parallel- likelihood of a correct sequence based on context and corpus-data/data neighboring sentence features. Equation (2) is the B. MTA for english complex long sentence appropriate computation and reads as follows: CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 377–388 381 𝑡 𝑜(𝑝|𝐷𝑡 , 𝐷𝑛 exp⁡(∑𝑗 𝑧𝑗ℎ (𝑝 𝐷 𝑠 ) 𝑗 , 𝑛,𝐷 𝑛 𝑠 )) phrases, adverbial phrases, and adjectival words. Chunks 𝑛 = (∑𝑝.𝑒𝑥𝑝(∑𝑗 𝑧𝑗ℎ𝑗(𝐷 𝑡 𝑛 𝑛,𝐷𝑠 )) help identify interrelations and roles among sentence (2) constituents. Establish relationships and interdependence between the various constituents. Establish the verb-object The encoder receives the short English sentences relationships, subject-verb concordances, and other that have been segmented and re-ordered. The original text is encoded by the encoder using an LSTM model, and the syntactic relations that contribute to the general sentence resulting computation is given by Equation (3-7): form. Large sentences become richer in analysis and interpretation for linguists, NLP programmers, and 𝑒𝑠 = 𝜎(𝑎𝑒 + 𝑋𝑒𝑢 + 𝑍𝑒𝑔 ) 𝑠 𝑠−1 machine translation algorithms when they are divided into (3) smaller constituents. The proposed approach provides a better understanding of the syntactic form and semantic 𝑡𝑠 = 𝑒𝑠𝑡𝑠−1 + ℎ𝑠𝜎(𝑎 + 𝑋𝑢 + 𝑍 𝑠 𝑒𝑔 ) 𝑠−1 connections of the text and is more convenient for (4) translation or further study with greater accuracy. Long- ℎ distance segmentation is particularly useful in complex 𝑠 = 𝜎(𝑎ℎ + 𝑋ℎ𝑢𝑠 + 𝑍𝑒𝑔 ) 𝑠−1 (5) languages such as English, which possess complicated sentence forms with multiple clauses and modifiers. The ℎ𝑠 = 𝑡𝑎𝑛𝑔(𝑋ℎ)𝑟𝑠) lengthy phrases can be broken down into smaller segments (6) to lessen confusion and enhance the interpretation and comprehension of the entire message. By the 𝑟𝑠 = 𝜎(𝑎𝑟 + 𝑋𝑟𝑢𝑠 + 𝑍𝑟𝑔𝑠−1) (7) decomposition of huge words into smaller pieces or clauses, long-distance segmentation is important in the C. English long-distance segmentation study of language, NLP processing, and machine translation. It ensures that sentence structures can be English long-distance segmentation refers to breaking examined more systematically and explicitly, allowing for up a long sentence into sub-clauses or segments in order to free, correct comprehension, translation, and analysis of better understand and analyze it. It is commonly applied in complex language phenomena. linguistics, machine translation, and natural language processing (NLP) for splitting complicated sentences and D. Refined gradient-convolutional neural network grasping the syntactic structure of the sentence. The /9RG- CNN) design for big data dataset was chosen due to its relevance for tasks involving the reduction of linguistic complexity and the preservation There are several factors to consider and methods to use when developing huge amounts of data. To improve of semantic meaning, particularly within applications RG - -CNN design, particularly for large data situations, related to artificial intelligence, education, and natural there are differences between a regular CNN and the language processing (NLP). Its diverse domain proposed Refined Gradient-CNN (RG-CNN). To handle representation and robust linguistic annotations make it a complicated phrase structures and huge datasets more reliable benchmark for evaluating model performance effectively, the RG-CNN combines gradient-based across varying levels of sentence complexity and refinement, batch normalization, dropout, ReLU contextual nuance. Various clauses, words, and sub- variations, enhanced memory handling for big data, and sophisticated pooling methods. Take into consideration the clauses within a sentence are detected and separated from following key strategies: Big data typically encompasses a each other according to their grammatical relationship and very large number of input samples. The task is to develop dependencies via long-distance segmentation. The an RG - CNN model that is scalable to attack it. This might intention is to produce substantial sentences that can be involve employing parallel computing platforms, splitting learned separately or in conjunction with other, longer the job between numerous computers, and enhancing sentences. Find the sentence's main clauses or independent memory management to deal with large datasets. Big data typically must split the training task between numerous sections. The foundation elements convey complete ideas computer nodes or clusters. The training process can be and may be utilized alone as independent sentences. segmented based on techniques such as model parallelism Identify any subordinate or dependent sentences that and data parallelism, thereby providing faster convergence provide the direct clauses with explanation, background, and efficient use of resources. Batch normalization is one and information. The fines typically begin with relative of the techniques used to surmount the challenge of pronouns such as "who," "which," or "that" and training RG-CNN and other deep neural networks with subordinating conjunctions such as "although," "because," large sets of data. It scales and normalizes the activations of every layer of a network to help with faster and or "if." The sentence needs to be dissected into modifiers convergent training. Both the overall performance and the and relevant phrases. These include noun phrases, verb generalization of the RG-CNN model can be improved by 382 Informatica 49 (2025) 377–388 S. Li batch normalization. Overfitting of massive data must be ReduceLROnPlate avoided using regularization methods. The task can au employ regularization and dropout in order to control the complexity of the model and induce more generalization. Regularization improves the ability of the model to deal 3.3 Convolutional Neural Network with natural noise and variance present in large data. The performance of CNN on extremely large data might be A CNN is made up of five parts: input data, a significantly affected by choosing the appropriate convolutional layer, a pooling layer, FC overlay, and an activation functions. Rectified Linear Units (ReLU), output vector. CNNs come in a variety of layer which reduce the vanishing gradient problem and the combinations. The CNN's structure, which was used in this training speed, have proven to be useful. In order to detect experiment, is shown in Figure 3. more intricate patterns, their variations, such as Leaky ReLU or Parametric ReLU, can be used. For handling enormous data, transfer learning can be used. Starting points include pre-trained CNN models on huge datasets like ImageNet. The huge data may be fine-tuned better to fit the CNN to the particular job using the information gained from these models. The problem of little labeled data may be solved by transfer learning, which will enhance RG-CNN performance. Table 2 displays the RG- CNN model hyperparameters. Table 2: RG-CNN model hyperparameters and configurations for effective training and convergence. Paramete Value / Range Description r Figure 3: Structure of Convolutional Neural Network Adaptive learning Optimizer Adam / AdamW for sparse data Finding intriguing patterns in the data is the goal Tuned using of the convolutional gradient task. Each of the many layers' learning rate convolutional kernels has a frequency and a divergence Learning 1e-4 to 5e-4 scheduler (e.g., coefficient. It is assumed that 𝑢𝑗 is the weight parameter, Rate ReduceLROnPlatea 𝑎𝑗 is the divergence amount, and 𝑉𝑗−1 is the input to u) convolution layers 𝑗 while the inversion kernel 𝑗 is active. Batch Based on GPU 64–128 One such expression for the convolution operation of Size memory Equation (8) is: Early stopping Epochs 10–30 based on BLEU 𝑉𝑗 = 𝑒(𝑢𝑗 ⊗𝑉𝑗−1 + 𝑎𝑗) validation (8) Dropout To prevent 0.3–0.5 Rate overfitting Where 𝑉𝑗the output result of the convolution Activatio kernel 𝑗represents the convolution operation, and 𝑒(𝑥) ReLU / For non-linearity in n represents the activation function. LeakyReLU CNN layers Function Max The input data is swept repeatedly by the For padding and Sequence 100–150 tokens convolutional network, which then extracts the distinctive positional encoding Length information. In addition, the multilayer's operational Embeddin amplifier is changed to𝑅𝑒𝐿𝑈. The Linear transfer function 512 / 768 (or match g Use with pre- is simpler to derive than the exponential transfer function transformer Dimensio trained embeddings and other training algorithms, allowing for faster model encoder) n training and better protection against gradient Kernel disappearing. It is possible to write 𝑅𝑒𝐿𝑈, which is For capturing n- Size 3 × 3 / 5 × 5 represented in Equation (9) as: gram features (CNN) To retain the most 𝑉𝑗(𝑉𝑗 > 0) Pooling MaxPooling 𝑅𝑒𝐿𝑈(𝑉𝑗) = { relevant features 0⁡(𝑉𝑗 > 0) Gradient Prevent exploding (9) 1 Clipping gradients Schedule Warm-up + Cosine Smoother The pooling layer's main function is to reduce r Annealing / convergence down-sampling data redundancy, which also aids in CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 377–388 383 achieving invariance and reducing CNN complexity. The The input maps are produced as down-sampled two most popular ways to finish pooling are pooling layer copies using a sub-sampling layer. There will be exactly 𝑁 and max pooling. If the study uses averaged pooling, the export maps if there are 𝑁 intake maps, albeit the final outcome is the computing area's arithmetic mean, but if the maps will be smaller. More formally, they are calculated study uses max pooling, the outcome is the computation as Equations (16), area's highest value. Max pooling was used for this investigation because it preserves important data better 𝑢𝑘 = 𝑒(𝛽𝑘𝑖 𝑑𝑜𝑤𝑛(𝑢 𝑘−1 𝑘 𝑖 𝑗 ) + 𝑎𝑗 than average pooling. Equation (10) in mathematics (16) provides the maximum pooling: To identify that the patch in the input map 𝑅𝑖 = 𝑚𝑎𝑥(𝑂0 𝑖 , 𝑂 1 2 𝑖 , 𝑂𝑖 , 𝑂 3 𝑖 , … . 𝑂𝑠 𝑖 , ) is related to a specific pixel in the output map, and to (10) calculate the gradient of a kernel. Applying a delta recursion that resembles Equation (17-20) in this case Where⁡𝑅𝑖 ⁡𝑖𝑠⁡the return outcome of the pooled requires determining which area in the sensitivity map of region 𝑖, Max is the maximum pooling procedure, and 𝑂0 𝑖 the present layer corresponds to a particular pixel in the is the pooling area 𝑖 is the element 𝑠. sensitivity map of the following layer. The weights, being the weights of the (rotated) convolution kernel, are A CNN's "classifiers" are its layers. Its main increased by the relationship between the input patch and objective is to reorganize the data that the convolutional the output pixel. Convolution is once again used to do this and pooling layers retrieved and weighted from the effectively: hidden-layer space. A similar dropout method is implemented in the layer to randomly eliminate neurons to 𝛿𝑘𝑗 = 𝑒′(𝑢𝑘 1 𝑗 )°𝑐𝑜𝑛𝑣(𝛿 𝑘+1 𝑗 , 𝑟𝑜𝑡180(𝑙𝑘+ ), ′𝑓𝑢𝑙𝑙′𝑖 ) prevent over-fitting. (17) Let's determine the backpropagation updates for 𝜕𝐹 = ∑ (𝛿𝑘 𝜕𝑎 𝑥,𝑦 𝑗 )𝑥𝑦 a network's convolutional layers. The output feature map 𝑖 is created by convolving the feature maps from the (18) preceding layer using learnable kernels and then 𝑐𝑘 1 processing them via the activation function. Convolutions 𝑖 = 𝑑𝑜𝑤𝑛(𝑢𝑘−𝑖 ) with numerous input maps may be combined in each output map. Equation (11) is often shown as, (19) 𝑈𝑘 = 𝑓(∑ 𝑈𝑘 𝑗∈𝑁 ) 𝑖 𝑖 ∗ 𝐾𝑘 𝑘 𝑖 𝑗𝑖 + 𝑎 𝜕𝐹 𝑖 = ∑𝑥,𝑦(𝛿 𝑘° 𝑘) 𝜕𝑎 𝑖 𝑥𝑖 𝑥𝑦 𝑖 (11) (20) 3.4 Computing the gradients 3.5 CNN algorithm A down-sampling layer's map weights are all set to the same value𝛽, to determine the value of, we only The network weights are updated by the CNN scaled the result of the prior procedure by β. For each map, algorithm (1) by using a method known as we may do the same δ. Calculation again 𝑖Equation (12– backpropagation, depending on the error between the 15) represents the pairing of the map from the layer of predicted and actual results. The CNN can learn and convolution and the associated map from the sub- develop its capacity to identify patterns and objects in samples layer: pictures due to this iterative process of forward 𝑘 propagation (feeding data through the network), 𝛿 = ⁡𝛽𝑘+1 𝑘 𝑖 𝑖 ((𝑒′(𝑥𝑖 )°𝑢𝑝(𝛿 𝑘+1 𝑖 )) backpropagation, and object detection. In various (12) computer vision applications, such as picture classification, object recognition, and image segmentation, 𝑢𝑝(𝑢) ≡ 𝑢⨂1𝑚×𝑚 CNN methods have shown outstanding performance. We have been used to various tasks, including autonomous (13) driving, picture analysis in medicine, and face 𝜕𝐹 identification. = ∑ (𝛿𝑘 𝜕𝑎 𝑥,𝑦 𝑖 )𝑥𝑦 𝑖 (14) Algorithm 1: Convolutional Neural Network Function CNN (input_data); 𝜕𝐹 = // Convolutional layers 𝜕𝑙𝑘𝑗𝑖 For each convolutional layer: 𝑟𝑜𝑡180(𝑐𝑜𝑛𝑣2(𝑢𝑘−1𝑗 , 𝑟𝑜𝑡180(𝛿𝑘−1𝑗 , 𝑟𝑜𝑡180(𝛿𝑘𝑗 ), 𝑣𝑎𝑙𝑖𝑑 ′)) Convolution=apply convolution (input_data, (15) weights)// Apply convo: 384 Informatica 49 (2025) 377–388 S. Li Activation=apply_activation (convolution) // Apply through enhanced entity prediction using a Multilayer activation function Perceptron (MLP) layer. The MLP receives inputs from Pooling=apply_pooling (activation)// Apply pooling the TransE pre-trained embeddings and refines them, operation (e.g) enhancing the expressive capability of the overall model. // fully connected layers Table 4 and Figure 4 illustrate the outcomes. The proposed Flattened=flatten (pooling) //Flatten the pooled feature RG-CNN model, which incorporates both bidirectional maps into a training and the TransE+MLP architecture, achieves For each fully connected layer: significantly higher precision values: P@1 = 84%, P@5 = Weights=initialize_weights () //Flatten the polled 88%, and P@10 = 98%. These metrics indicate that the feature maps into a model reliably ranks the correct simplified sentence within Bias =initialize bias () // Initialize bias for the fully ( the top predicted candidates. Furthermore, the use of a Linear_transform = apply_linear_transform (flattened, Recurrent Neural Network (RNN) within RG-CNN weights, bias) effectively captures bidirectional dependencies in the data, Activation=apply_activation (linear_transform) contributing to performance that surpasses even the //Apply activation TransE+MLP configuration. This validates the //Output layer architectural choice and demonstrates the robustness of the Out_weights -= initialize_output_weights () //Initialize proposed model. weight for Output_bias = initialize-output-bias () // Initialize bias for Table 4: Numerical outcomes of the Training strategy the output based on the algorithm Output= apply_linear_transform (activation, output_weights, output_bias) Test sets Percentage (%) Predicated_class = classify_output (output)// Classify the P@1 P@5 P@10 output to dete TeansE 25 30 35 Return predicted_class TeansE+ MLP 40 45 50 FB15K 82 86 95 4 Result and discussion PP1 70 75 80 Results are always advised to reference the most Refined Gradient - CNN 84 88 98 recent literature survey for the most up-to-date information [Proposed] on these themes, since the exact outcomes and improvements in the translation of complicated, lengthy phrases might differ based on the research and development efforts in the area. To improve the accuracy of translating complex English sentences in real time, the proposed RG-CNN model was built using Python 3.11. Table 3 demonstrates the simulation setup. Table 3: Simulation setup Component Recommended Specification NVIDIA A100 / RTX 3090 / GPU Tesla V100 RAM 32–64 GB SSD (1 TB recommended for Storage large corpora like WMT) Framework PyTorch / TensorFlow Distributed Optional with Horovod / DDP Training (for WMT-scale corpora) Figure 4: Comparison of Training strategy based on the algorithm 4.1 English translation design using big data The suggested Refined Gradient-CNN model The results further emphasize the efficiency and outperforms the Improved Long Short-Term Memory rationale of the two training approaches, particularly the (LSTM) [24] and Hierarchical Network of Concepts bidirectional training strategy. This strategy, when (HNC) models [25]. It successfully enhances machine combined with TransE, demonstrates its effectiveness CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 377–388 385 translation of complex language patterns by capturing machine translation quality by showcasing the model's complex phrase structures. enhanced capacity to manage complex phrase structure. The proposed Refined Gradient-CNN model outperformed Table 6: Comparative word error rate (%) analysis for the Improved LSTM [24], 31.3% and 3.9%, respectively, english phrase translation accuracy in terms of BLEU scores, as shown in Table 5 and Figure 5, 73.1% for the corpus dataset and 70.1% for the local Methods Word Error Rate (%) dataset. The outcomes demonstrate how well the proposed Corpus Corpus dataset model works to translate complex phrase structures with dataset greater n-gram overlap. This aligns with the goal of the study, which is to improve machine translation systems by raising overall translation quality and semantic integrity Improved LSTM [24] 0.9 1.1 across a variety of datasets, especially for complex language structures. Refined Gradient- 0.3 0.10 CNN [Proposed] Table 5: Comparison of BLUE score on corpus and local dataset Methods BLEU (%) Corpus Local dataset dataset Improved LSTM [24] 31.3 3.9 Refined Gradient- 73.1 70.1 CNN [Proposed] Figure 6: Word Error Rate (%) comparison of various models Improving machine translation accuracy, the proposed Refined Gradient-CNN model significantly enhances the translation of the challenging English phrase structures. With 97.51% accuracy and 98.43% recall, the Refined Gradient-CNN outperformed the HNC technique [25], which achieved 93.38% accuracy and 94.51% recall, as shown in Table 7 and Figure 7. These improvements demonstrate the model's improved capacity to recognize and translate intricate phrase patterns more accurately. The results demonstrate that Refined Gradient-CNN improves machine translation systems' overall performance and efficiency in authentic language situations while also reducing translation ambiguities. Table 7: Comparison of accuracy and recall between HNC and the proposed refined gradient-CNN model Figure 5: BLEU comparison of various models Methods Accuracy Recall (%) (%) The translation accuracy metric known as WER, HNC [25] 93.38 94.51 which counts the number of insertions, deletions, and substitutions required to arrive at a reference translation, is Refined Gradient- 97.51 98.43 shown in Table 6 and Figure 6. Higher translation CNN [Proposed] precision is indicated by a lower WER. Compared to the Improved LSTM [24] 0.9% and 1.1%, the WERs recorded by the Refined Gradient-CNN were significantly lower 0.3% for the corpus and 0.10% for the local data. These findings align with the study goal of improving overall 386 Informatica 49 (2025) 377–388 S. Li Refined Gradient-CNN (RG-CNN), which was trained on a specially constructed parallel corpus of 1,563 sentence pairs enhanced with readability scores, complexity labels, and domain-specific metadata. RG-CNN effectively models structural intricacy, idiomatically controlled use, and long-distance relationships in English by fusing deep convolutional neural architecture with gradient-based smooth optimisation. The model is better able to generalise across contexts and adjust to complex language patterns when it is exposed to a heterogeneous, metadata- annotated corpus. Empirical evaluation confirms the model's excellent performance capability. With BLEU scores of 70.1% (local) and 73.1% (corpus). WER was also reduced to 0.3% (corpus) and 0.10% (local) compared to the improved LSTM [24] model. RG-CNN beat traditional models like HNC [25] in terms of classification performance, achieving 97.51% accuracy and 98.43% Figure 7: Performance metrics of various models in recall. Hyperparameter tweaking was also used to increase translating complex phrase patterns the model's parameter efficiency in order to get optimal convergence and significantly reduce overfitting. The RG- 5 Discussion CNN model improved its translation ranking performance by using the Parallel Corpus dataset for bidirectional Enhancing the quality of machine translation at training with TransE + MLP. The model converted simple the phrase level, particularly when translating difficult and and complicated words into vectors and rated them based syntactically complicated English formulations. For on their proximity to the right response. The model lengthier phrases, existing models such as Improved achieved 84 percent accuracy (P@1), 88 percent (P@5), LSTM [24] have not been adequate in terms of structural and 98 percent (P@10), proving its effectiveness in NLP equivalence and semantic coherence. Similarly, because and text simplification tasks related to education. These there are fewer contextual generalizations, the HNC model outcomes collectively support the objective of building an [25] is unable to interpret nested and specialized phrase efficient, high-performance model for simplifying patterns. complex English sentences across educational and NLP applications. The suggested Refined Gradient-CNN model effectively overcomes these drawbacks by integrating Future scope gradient-driven refinement into the convolution process for improved recognition of complex language structure. The new strategy is utilizing the power of big By reducing ambiguity in translation and enhancing linguistic data and optimization methods in solving the contextual knowledge, the approach increases the issues of translating intricate, long sentences, which will dependability of machine translation systems for technical end up improving the overall performance of the machine and professional communication procedures. The design is translation system. a significant improvement over earlier models in terms of structure, recall, and generalization. Acknowledgement: The Research is supported by: Construction of Training Modes for Business English Limitation Majors: A Perspective of Business Needs( 2021HXXM175) It requires an enormous amount of processing power to train models to handle complicated, lengthy words. Large-scale models could require a lot of memory and take References a long time to learn, rendering them unavailable to people or organizations with limited resources. There are many [1] Li, G., 2024. Research on Automatic Identification possible valid translations for a complicated statement, and of Machine English Translation Errors Based on the context or specific use conditions determine the Improved GLR Algorithm. Informatica, 48(6). preferred translation. This flexibility is hard to replicate https://doi.org/10.31449/inf.v48i6.5249 with machine translation processes and to reasonably [2] Ruan, Yuexiang. 2022. “Design of Intelligent reflect the learning. Recognition English Translation Model Based on Deep Learning.” Journal of Mathematics 2022: 1– 6 Conclusion 10. https://doi.org/10.1155/2023/9893016. [3] Wang, Xi. 2021. “Translation Correction of To achieve notable gains in semantic retention and English Phrases Based on Optimized GLR translation quality, the English complex long sentence Algorithm.” Journal of Intelligent Systems 30 (1): machine translation architecture (MTA) is optimised using 868–80. https://doi.org/10.1515/jisys-2020-0132. CCR-LWECNN: A Lightweight CNN Framework for Chinese… Informatica 49 (2025) 377–388 387 [4] Zhang, Qiang. 2022. “Cross-Context Accurate Singapore: Springer Nature Singapore. DOI English Translation Method Based on the Machine 10.21203/rs.3.rs-5734365/v1 Learning Model.” Mathematical Problems in [15] Song, Xin. 2021. “Intelligent English Translation Engineering 2022: 1–11. System Based on Evolutionary Multi-Objective https://doi.org/10.1155/2022/9396650. Optimization Algorithm.” Journal of Intelligent & [5] Quoc, T.N., Le Thanh, H. and Van, H.P., 2023. Fuzzy Systems 40 (4): 6327–37. Khmer-Vietnamese Neural Machine Translation https://doi.org/10.3233/jifs-189469.s Improvement Using Data Augmentation [16] Cao, Qianyu, and Hanmei Hao. 2021. “A Chaotic Strategies. Informatica, 47(3). Neural Network Model for English Machine https://doi.org/10.31449/inf.v47i3.4761 Translation Based on Big Data Analysis.” [6] Liang, J., and M. Du. 2022. “Two-Way Neural Computational Intelligence and Neuroscience 2021: Network Chinese-English Machine Translation 3274326. https://doi.org/10.1155/2021/3274326. Model Fused with Attention Mechanism. Ding B, [17] Yu, Jinlin, and Xiuli Ma. 2022. “English Translation Editor. Scientific Programming” 2022: 1–11. Model Based on Intelligent Recognition and Deep https://doi.10.1155/2022/9143845 Learning.” Wireless Communications and Mobile [7] Shen, Xiaoping, and Runjuan Qin. 2021. “Searching Computing 2022: 1–9. and Learning English Translation Long Text https://doi.org/10.1155/2022/3079775. Information Based on Heterogeneous [18] Li, H., and W. Xiong. 2022. “Analysis of the Multiprocessors and Data Mining.” Microprocessors Drawbacks of English-Chinese Intelligent Machine and Microsystems 82 (103895): 103895. Translation Based on Deep Learning.” In The 2021 https://doi.org/10.1016/j.micpro.2021.103895. International Conference on Machine Learning and [8] Li, Xiaoyu. 2022. “The Impact of Big Data Big Data Analytics for IoT Security and Privacy: Technology on Phrase and Syntactic Coherence in SPIoT-2021, 1:104 11. Springer International English Translation.” Mathematical Problems in Publishing. DOI:10.2478/amns-2025-0565 Engineering 2022: 1–11. [19] Dong, Z. 2022. Research on Machine Translation https://doi.org/10.1155/2022/1428748. Method of English-Chinese Long Sentences Based on [9] Suleiman, D., Etaiwi, W. and Awajan, A., 2021. Fuzzy Semantic Optimization. Mobile Information Recurrent neural network techniques: Emphasis on Systems. DOI:10.1155/2022/4863623 use in neural machine translation. Informatica, 45(7). [20] Shao, D., and R. Ma. 2022. English Long Sentence https://doi.org/10.31449/inf.v45i7.5267 Segmentation and Translation Optimization of [10] Rawat, R., Raj, A.S.A., Chakrawarti, R.K., Sankaran, Professional Literature Based on Hierarchical K.S., Sarangi, S.K., Rawat, H. and Rawat, A., 2024. Network of Concepts. Mobile Information Systems. Enhanced Cybercrime Detection on Twitter Using DOI:10.1155/2022/3090115 Aho-Corasick Algorithm and Machine Learning [21] Guo, Xiaohua. 2022. “Optimization of English Techniques. Informatica, 48(18). Machine Translation by Deep Neural Network under http://dx.doi.org/10.31449/inf.v48i18.6272 Artificial Intelligence.” Computational Intelligence [11] Farooq, Uzma, Mohd Shafry Mohd Rahim, and and Neuroscience 2022: 2003411. Adnan Abid. 2023. “A Multi-Stack RNN-Based https://doi.org/10.1155/2022/2003411 Neural Machine Translation Model for English to [22] Garg, K.D., Shekhar, S., Kumar, A., Goyal, V., Pakistan Sign Language Translation.” Neural Sharma, B., Chengoden, R. and Srivastava, G., 2022. Computing & Applications 35 (18): 13225–38. Framework for handling rare word problems in https://doi.org/10.1007/s00521-023-08424-0. neural machine translation system using multi-word [12] Mahata, Sainik Kumar, Avishek Garain, Dipankar expressions. Applied Sciences, 12(21), p.11038. Das, and Sivaji Bandyopadhyay. 2022. https://doi.org/10.3390/app122111038 “Simplification of English and Bengali Sentences for [23] Benkova, L., Munkova, D., Benko, Ľ. and Munk, M., Improving Quality of Machine Translation.” Neural 2021. Evaluation of English–Slovak neural and Processing Letters 54 (4): 3115–39. statistical machine translation. Applied https://doi.org/10.1007/s11063-022-10755-3. Sciences, 11(7), p.2948. [13] Bensalah, Nouhaila, Habib Ayad, Abdellah Adib, https://doi.org/10.3390/app11072948 and Abdelhamid Ibn El Farouk. 2022. “Transformer [24] He, H., 2023. An intelligent algorithm for fast Model and Convolutional Neural Networks (CNNs) machine translation of long English for Arabic to English Machine Translation.” In sentences. Journal of Intelligent Systems, 32(1), Proceedings of the 5th International Conference on p.20220257. https://doi.org/10.1515/jisys-2022- Big Data and Internet of Things, 399–410. Cham: 0257 Springer International Publishing. DOI:10.1007/978- [25] Shao, D. and Ma, R., 2022. English long sentence 3-031-07969-6_30 segmentation and translation optimization of [14] Yang, Jing, and Lina Fan. 2023. “Optimization professional literature based on a hierarchical Strategy of Machine Translation Algorithm for network of concepts. Mobile Information English Long Sentences Based on Semantic Systems, 2022(1), p.3090115. Relations.” In Lecture Notes on Data Engineering https://doi.org/10.1155/2022/3090115 and Communications Technologies, 572–79. 388 Informatica 49 (2025) 377–388 S. Li https://doi.org/10.31449/inf.v49i12.8951 Informatica 49 (2025) 389–402 389 AECO-SC StyleGAN: A Cross-Platform GAN Framework for Dynamic Advertising Creative Generation Yanan Zhang Yantai University of Science and Technology, School of Culture and Media, Penglai, Shandong, 265600, China E-mail: 15753572255@163.com Keywords: adaptive elephant clan optimizer (AECO), dynamic advertising creative, spatially conditioned StyleGAN, visual style consistency Received: April 18, 2025 Dynamic advertising (ad) requires personalized, engaging content across multiple platforms. Traditional approaches struggle with scalability and cross-platform adaptation. Leveraging deep learning (DL), particularly Generative Adversarial Networks (GANs), offers the potential to automate and optimize ad creative generation with higher precision and contextual adaptability. This research aims to develop a DL framework that dynamically generates and optimizes advertising creatives—leveraging Adaptive Elephant Clan Optimizer with a Spatially Conditioned StyleGAN (AECO-SC StyleGAN) for dynamic cross-platform advertisement creative generation. Adaptive Elephant Clan Optimizer (AECO) dynamically adjusts training hyperparameters to improve model convergence, while Spatially Conditioned StyleGAN (SC-StyleGAN) generates platform-specific ad creatives by incorporating spatial constraints for contextual alignment. Our system is trained on the Ad ImageNet dataset, which includes 9,003 ad samples with paired images and promotional text from platforms like Facebook and Instagram. All data were resized to 256×256, normalized, and tokenized for training. Using Python, the model demonstrates superior performance in creative generation and engagement prediction. The proposed AECO-SC StyleGAN model achieved an NDCG of 0.61, an accuracy of 98.48%, and a weighted F1-score of 98.5%, outperforming prior approaches such as VGG + Layout + NIMA (NDCG 0.22) and XCEPTION (accuracy 98.27%, F1-score 98.2%). These results highlight the effectiveness of integrating adaptive optimization and spatial conditioning in generating high-quality, context-aware advertising creatives, offer a scalable and automated solution for cross-platform digital marketing. Povzetek: Okvir AECO-SC StyleGAN uporablja globoko učenje (GAN in StyleGAN s prostorskim pogojevanjem) in adaptivno optimizacijo (AECO) za dinamično generiranje in optimizacijo visoko kakovostnih oglasnih vsebin na več platformah. Sistem dosega kvalitetno napovedovanju angažiranosti oglasov, kar avtomatizira prilagojeno digitalno trženje. 1 Introduction purposes to achieve continuous output improvement through adversarial learning processes [5]. Such adaptive Digital marketing leaders consider the capacity to send creativity improves advertising diversity and allows for customized advertising content across numerous platforms better metrics engagement, including the click-through as their main competitive advantage [1]. Advertising rate (CTR) and conversion rate (CVR). through traditional methods faces difficulties when The system processes real-time performance records in adjusting to user preference changes across multiple conjunction with user adjustments to automatically modify platform formats that include social media and mobile its creative elements and maintain platform usage between applications, together with websites [2]. DL techniques various platforms [6]. Digital marketing and artificial appear as groundbreaking solutions to address the intelligence maintain an expanding relationship that gives marketing challenges that face the optimization of ad marketers data-driven creative solutions to overcome their creativity. GANs have become prominent among these DL creative limitations [7]. This GAN-based framework techniques because they enable the production of high- provides automated design solutions for advertising quality, realistic, adaptable content [3]. The procedure content, which enable more efficient personalized enables machine automation to produce relevant visual advertising in an online environment dominated by advertisements with appropriate platform parameters for competition. The aim was to develop a DL framework that distinct user groups [4]. The model uses dual-network utilizes the AECO-SC StyleGAN to dynamically generate, training, which combines a generator for designing ad refine, and optimize advertising creatives across multiple variants together with a discriminator for evaluation 390 Informatica 49 (2025) 389–402 Y. Zhang platforms for enhanced personalization and performance The Automated Creative Optimization (AutoCO) in digital marketing campaigns. framework outperformed baselines, achieving lower The proposed AECO algorithm is a derivative-free cumulative regret and a 7% CTR increase in online A/B population-based metaheuristic. It simulates the social testing. behavior of elephants in clans to adaptively adjust The two-stage dynamic and creative optimization hyperparameters by exploring the solution space through framework, combining AutoCo and a transformer-based stochastic position updates, without relying on the gradient rerank model to improve CTR prediction and creative of the loss function. This enables AECO to dynamically ranking under ambiguous data conditions, was developed optimize hyperparameters such as learning rate, style by [10]. Experimental and online testing showed a 10% weights, and batch size during GAN training, improving CTR improvement over baselines, demonstrating superior convergence stability and creative output quality. performance. The integration of a Particle Swarm The structure presents the development and evaluation of Optimization-based Recurrent Neural Network (PSO- an intelligent advertising creative generation system using based RNN) algorithm with Computer-Aided Design AECO and SC-StyleGAN. Section 2 reviews related (CAD) tools to automate the generation and optimization works, highlighting recent advancements in GAN-based of advertising artistic designs, enhancing design efficiency advertising optimization and cross-platform creative and creativity, was explored by [11]. generation. Section 3 outlines the proposed methodology, The Dynamic Creative Optimization (DCO) problem, by detailing the preprocessing of advertising data, AECO- determining the optimal product and creative ad based hyperparameter tuning, and the architecture of the combination under constraints like ad fatigue and user SC-StyleGAN for spatially contextual ad generation. diversity, was examined [12]. The advertising design, Section 4 discusses the experimental setup, performance creativity, and efficiency were enhanced by integrating evaluation, and comparative results with baseline models. CAD technology and data-driven automation [13]. The Section 5 concludes with future directions to enhance developed model enabled the automated generation of scalability, personalization, and real-time adaptability in diverse advertising designs, successfully reflecting advertising technologies. creative schemes and allowing quantitative evaluation, thus validating its effectiveness in promoting innovative 2 Related works advertising solutions. The integration of artificial intelligence in advertising with an emphasis on content Automating the generation of ad creatives from landing production, targeting, personalization, and ad optimization pages using abstractive text summarization, enabling rapid was explored by [14]. Table 1 provides a comparative experimentation in large-scale marketing campaigns, was overview of recent GAN-based approaches. While examined by [8]. The advertising creative optimization by previous studies have focused on general image synthesis modeling complex interactions between creative elements or aesthetic enhancement, none leverage spatial and improving Click-Through Rate (CTR) prediction conditioning and adaptive optimization specifically for using an Auto ML-inspired framework was enhanced [9]. advertising creative generation. Table 1: Conventional approaches of GAN for dynamic advertising creative optimization Study Model Dataset Metrics Used Key Results Limitation Jiang et al. StyleGAN Proprietary Ad CTR, Qualitative CTR Limited [15] (AdSEE) Dataset Feedback improvement: generalizability +12% Shilova et al. Diffusion + User Behavior Personalization +15% relevance High computational [16] Outpainting + Ad Images Score cost Xu et al. [17] PDA-GAN PubLayNet, Layout Accuracy, Improved layout Focused on layout Rico Realism realism generation Aghazadeh Various (CAP Generated Ad CAP (Creativity, Structured ad No generative et al. [18] Evaluation) Images Alignment, quality eval model proposed Persuasion) Ma and Zhao Enhanced Logo Design FID, User Rating FID: 23.4, User Focused only on [19] DCGAN Dataset preference logos, not full ads 2.1 Problem statement traditional methods which results in reduced engagement and inconsistency [15] [16]. Although GANs make Digital marketing teams need to use personalized and automation possible, current models usually do not take platform-specific ads, but it is difficult to scale them using context into account [17], run on computers for long AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 391 periods [16] or mostly generate logos [18]. To overcome The Ad ImageNet dataset contains image-text these problems, the study offers a framework that uses advertisement samples, with images typically sized and Spatially Conditioned GANs, AECO for hyperparameter text averaging. For data preprocessing, images are resized adjustment and combines various input types to improve to uniform dimensions, normalized for intensity the quality and performance of automatic ad creation. consistency, and tokenized for textual elements. The AECO-SC StyleGAN framework uses spatial conditioning and adaptive optimization to dynamically generate and 3 Methodology optimize advertising creatives for enhanced cross-platform engagement and performance. Figure 1 shows the general outline of the methodological approach. Figure 1: General outline of the methodological approach 3.1 Data collection pixel image size was used to simplify the original image size. This input data type was created by resizing the image The Ad ImageNet dataset, sourced from the Peter Brendan using bicubic interpolation. Because the outcome was repository, consists of 9,003 image-text advertisement smoother at the edges than with bilinear interpolation, this samples totaling approximately 682 MB in size. Each entry approach was chosen. Bicubic was the perfect balance includes a banner-style advertisement image along with between process time and high-quality results. The bicubic associated promotional text. The dataset captures a variety interpolation estimates the pixels in the (𝑗, 𝑖) positions of standard ad dimensions, with the most frequent being using a sampling (𝑆) distance of 16 nearby pixels (4𝑥4) in 254 × 254 pixels, commonly used in digital marketing. The equations (1)-(5). Figure 2 illustrates (a) before resizing textual content varies in length, with an average of around and (b) after resizing. 525 characters, covering diverse product and event 𝑔𝑗,𝑖 = [𝑋−1(𝑇𝑧)𝑋0(𝑇𝑧)𝑋1(𝑇𝑧)𝑋2(𝑇𝑧)] advertisements. The dataset was split into 70% training, 𝑔 𝑔 𝑗−1,𝑖−1 𝑔𝑗,𝑖−1 𝑗+1,𝑖−1 𝑔𝑗+2,𝑖−1 15% validation, and 15% testing sets to ensure robust 𝑒 𝑔 𝑔 𝑗−1,𝑖 𝑔𝑗,𝑖 𝑗+1,𝑖 𝑗+2,𝑖 performance evaluation across unseen data. 𝑔 𝑔 𝑗−1,𝑖+1 𝑔𝑗,𝑖+1 𝑗+1,𝑖+1 𝑔𝑗+2,𝑖+1 Source: [𝑔 𝑔 𝑗−1,𝑖+2 𝑔𝑗,𝑖+2 𝑗+1,𝑖+2 𝑔𝑗+2,𝑖+2] https://huggingface.co/datasets/PeterBrendan/AdImageNe 𝑋−1 (𝑇𝑤) t 𝑋0 (𝑇[ 𝑤) ] 𝑋1 (𝑇𝑤) 3.2 Data preprocessing using image resizing 𝑋2 (𝑇𝑤) Image resizing standardizes input dimensions, enabling (1) consistent data processing for DL models and optimizing Where: 𝑇𝑧 = 𝑖 ′ − 𝑖, 𝑇𝑤 = 𝑗 ′ − 𝑗 and the generation of dynamic advertising creative. Although 𝑔𝑗,𝑖 = 𝑝𝑖𝑥𝑒𝑙 𝑣𝑎𝑙𝑢𝑒 𝑎𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑗, 𝑖) the original dataset images varied in size most frequently −𝑇3+2𝑇2−𝑇 256 × 256 pixels the images were uniformly resized to 224 𝑋−1(𝑇) = (2) 2 × 224 × 3 pixels for this research. The 244 × 224 𝑥 3- −3𝑇3+5𝑇2+2 𝑋0(𝑇) = (3) 2 392 Informatica 49 (2025) 389–402 Y. Zhang −3𝑇3+4𝑇2+2 𝑋1(𝑇) = (4) 2 𝑇3−𝑇2 𝑋2(𝑇) = (5) 2 Figure 3: Tokenization outcome (a) Positive Worlds Cloud and (b) Ad Image Net Words Cloud 3.3 AECO-SC StyleGAN The hybrid deep learning framework is called the AECO- SC StyleGAN and is meant to dynamically develop and Figure 2: (a) Before resizing and (b) After resizing improve advertisement creative work. The method implements AECO, a metaheuristic that studies elephant 3.2.1 Tokenization behavior, together with SC-StyleGAN, a modified GAN The raw advertising text was first processed with that adds spatial and contextual inputs. The adaptive tokenization so that the model could efficiently generate approach of AECO to changing hyperparameters leads to and improve dynamic ad content. The text for promotion faster learning and better exploration of various solutions, was separated into words, phrases and sentences using compared to the Adam optimizer. SC-StyleGAN makes natural language processing. Turning unstructured text use of semantic maps, sketches and embeddings from into a structured form made it simpler to examine texts and different sources to produce images that look good when connect models. Keeping the important connections and used in ads. When combined, this integration improves order in each sentence, tokenization protected the key how creative works on ads, how it is predicted to be meaning needed to make an ad relevant. Because the text received by the target audience and how well it adapts to in ads is generally brief, simple tokenization and various digital platforms, giving a solid, effective system embedding were adequate. The fact that it is lightweight for today’s data-driven advertising. The AECO-SC helped the system express meaning with little cost which StyleGAN for Ad Creative Generation in algorithm 1. improved the performance of the AECO-SC StyleGAN framework. Figure 3 shows (a) Positive Worlds Cloud and (b) Ad Image Net Words Cloud. Algorithm 1: AECO-SC StyleGAN for Ad Creative Generation 𝑆𝑡𝑒𝑝 1: 𝑆𝑒𝑡𝑢𝑝 𝑑𝑒𝑓 𝑠𝑒𝑡𝑢𝑝(): 𝑁,𝑀, 𝐺, 𝑇 = 𝑛𝑢𝑚_ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑠(), 40, 5, 100 𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀 = 0.3, 1.0, 0.8, 0.7, 0.5 𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑎𝑑_𝑎𝑑𝑣𝑒𝑟𝑡_𝑑𝑎𝑡𝑎𝑠𝑒𝑡() 𝑟𝑒𝑡𝑢𝑟𝑛 𝑁,𝑀, 𝐺, 𝑇, 𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀, 𝑑𝑎𝑡𝑎 𝑆𝑡𝑒𝑝 2: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁): 𝑟𝑒𝑡𝑢𝑟𝑛 [{′𝑝𝑎𝑟𝑎𝑚𝑠′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁), ′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′: 𝑁𝑜𝑛𝑒} 𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑀)] 𝑆𝑡𝑒𝑝 3: 𝐸𝑣𝑎𝑙𝑢𝑎𝑡𝑒 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 𝑑𝑒𝑓 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑖𝑛𝑑, 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀): 𝑚𝑜𝑑𝑒𝑙 = 𝑡𝑟𝑎𝑖𝑛_𝑆𝐶_𝑆𝑡𝑦𝑙𝑒𝐺𝐴𝑁(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑐𝑜𝑚𝑝𝑢𝑡𝑒_𝑙𝑜𝑠𝑠(𝑚𝑜𝑑𝑒𝑙, 𝑑𝑎𝑡𝑎) 𝑆𝑡𝑒𝑝 4: 𝐶𝑙𝑎𝑛 𝑈𝑝𝑑𝑎𝑡𝑒 𝑑𝑒𝑓 𝑐𝑙𝑎𝑛_𝑢𝑝𝑑𝑎𝑡𝑒(𝑝𝑜𝑝, 𝑔𝑏𝑒𝑠𝑡): 𝑓𝑜𝑟 𝑐𝑙𝑎𝑛 𝑖𝑛 𝑓𝑜𝑟𝑚_𝑐𝑙𝑎𝑛𝑠(𝑝𝑜𝑝): 𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ = 𝑚𝑖𝑛(𝑐𝑙𝑎𝑛, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑐𝑙𝑎𝑛: 𝑖𝑓 𝑒 ! = 𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ: AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 393 𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′] += 𝑟𝑎𝑛𝑑() ∗ (𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′] − 𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′] += 𝑟𝑎𝑛𝑑() ∗ (𝑔𝑏𝑒𝑠𝑡[′𝑝𝑎𝑟𝑎𝑚𝑠′] − 𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 𝑆𝑡𝑒𝑝 5: 𝑀𝑎𝑙𝑒 𝑈𝑝𝑑𝑎𝑡𝑒 & 𝐸𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑚𝑎𝑙𝑒_𝑎𝑛𝑑_𝑒𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛(𝑝𝑜𝑝, 𝑃_𝑚): 𝑚𝑎𝑙𝑒𝑠 = 𝑠𝑒𝑙𝑒𝑐𝑡_𝑚𝑎𝑙𝑒𝑠(𝑝𝑜𝑝, 𝑃_𝑚) 𝑐𝑒𝑛𝑡𝑒𝑟 = 𝑚𝑒𝑎𝑛_𝑣𝑒𝑐([𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′] 𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑝𝑜𝑝]) 𝑓𝑜𝑟 𝑚 𝑖𝑛 𝑚𝑎𝑙𝑒𝑠: 𝑚[′𝑝𝑎𝑟𝑎𝑚𝑠′] += 𝑟𝑎𝑛𝑑() ∗ (𝑐𝑒𝑛𝑡𝑒𝑟 − 𝑚[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 𝑟𝑒𝑝𝑙𝑎𝑐𝑒_𝑤𝑒𝑎𝑘𝑒𝑠𝑡(𝑝𝑜𝑝) 𝑝𝑜𝑝. 𝑎𝑝𝑝𝑒𝑛𝑑(𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒_𝑐𝑎𝑙𝑓(𝑝𝑜𝑝)) 𝑟𝑎𝑛𝑑𝑜𝑚_𝑟𝑒𝑠𝑒𝑡_𝑏𝑜𝑡𝑡𝑜𝑚(𝑝𝑜𝑝, 𝑝𝑐𝑡 = 0.3) 𝑆𝑡𝑒𝑝 6: 𝑀𝑎𝑖𝑛 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑓 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒_𝐴𝐸𝐶𝑂_𝑆𝐶𝑆𝑡𝑦𝑙𝑒𝐺𝐴𝑁(): 𝑁,𝑀, 𝐺, 𝑇, 𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀, 𝑑𝑎𝑡𝑎 = 𝑠𝑒𝑡𝑢𝑝() 𝑝𝑜𝑝 = 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁) 𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑇): 𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑝𝑜𝑝: 𝑒[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′] = 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑒, 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀) 𝑔𝑏𝑒𝑠𝑡 = 𝑚𝑖𝑛(𝑝𝑜𝑝, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 𝑐𝑙𝑎𝑛_𝑢𝑝𝑑𝑎𝑡𝑒(𝑝𝑜𝑝, 𝑔𝑏𝑒𝑠𝑡) 𝑚𝑎𝑙𝑒_𝑎𝑛𝑑_𝑒𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛(𝑝𝑜𝑝, 𝑃_𝑚) 𝑟𝑒𝑡𝑢𝑟𝑛 𝑔𝑏𝑒𝑠𝑡 3.3.1 SC StyleGAN influence microstructure and facial features, and fine styles SC-StyleGAN enables location-specific control over regulate high-frequency details and textures. Non-visual visual features, enhancing the deep learning framework’s data like captions, CTR, and demographics are encoded ability to dynamically generate and optimize personalized, into embeddings using text encoders visually consistent advertising creatives across different contexts. The StyleGAN network generates high-quality and fully connected layers. These embeddings are fused images by applying an 18 × 512 style code to 18 layers with spatial inputs (semantic maps and sketches) through of the network. It starts with a constant 4 × 4 feature map modulation layers that adjust the style codes, allowing SC- and progressively grows by a factor of 2 at each stage, StyleGAN to generate creatives tailored to both visual ultimately producing images of up to 1024 × features and user/context data. For training and evaluation 1024 pixels. Each style block receives a 1 × 512 style in this study, input images were uniformly resized to code that modulates convolution operations, enabling fine 224 × 224 pixels, serving as the initial resolution before control over visual attributes. These style codes the progressive growth to higher resolutions during correspond to different levels of detail: coarse styles affect generation. Figure 4 illustrates the network architecture of the overall layout and color schemes, middle styles SC-StyleGAN. Figure 4: Network architecture of SC-StyleGAN 394 Informatica 49 (2025) 389–402 Y. Zhang SC-StyleGAN was a conditional generation system that local perceptual loss (𝐾𝐿𝑃) improves detail by comparing uses a semantic map and sketches to identify spatial image patches, and the feature matching loss (𝐾𝐹𝑀) features for coarse and intermediate styles. It consists of stabilizes training by aligning intermediate features. two sub-networks: the production network, which uses Together, these losses guide the network toward realistic layers, and the spatial encoding network, which maps input and context-aware ad generation. The perceptual metrics conditions to intermediates. Two encoding modules are (LPIPS) measure the overall perceptual loss after shrinking suggested for the spatial encoding network, which the target and synthesized images to 64 x 64. The local individually translates the semantic map and 512 × perceptive loss (LLP) and the global perceptive loss (LGP) 512 sketches into 64 × 256 × 256 spatial feature are expressed mathematically as follows in equations (7)- maps. With a spatial dimension of 32 𝑥 32, the combined (8). map of features was encoded to correspond with the coarse-moderated style in the StyleGAN synthesizing 𝐾 𝑟𝑒 𝐺𝑃(𝐽𝑔𝑡 , 𝐽𝑠𝑦𝑛) = 𝐿𝑃𝐼𝑃𝑆(𝐽𝑔𝑡 , 𝐽 𝑟𝑒 𝑠𝑦𝑛) (7) module. To create a 32 𝑥 32 intermediate image, the same 1 𝐾𝐿𝑃(𝐽𝑔𝑡 , 𝐽𝑠𝑦𝑛) = ∑𝐿 𝐿𝑃𝐼𝑃𝑆(𝐽𝑙 , 𝐽𝑙 𝐿 𝑙=1 𝑔𝑡 𝑠𝑦𝑛) (8) steps are followed for the spatial intermediate feature map. Table 2 represents the architectural and computational Where 𝐽𝑟𝑒 𝑟 𝑔𝑡and 𝐽 𝑒 𝑠𝑦𝑛are the resized reality and, synthesizing footprint of the SC-StyleGAN model. the image, respectively, and 𝐿𝑃𝐼𝑃𝑆(. , . ) was the Table 2: SC-StyleGAN architecture details: input perceptual measuring function. In each phase, 𝐽𝑙𝑔𝑡and dimensions, layer-wise parameters, and computational 𝐽𝑙𝑠𝑦𝑛stand for the 𝑘 − 𝑡ℎ randomly clipped ground truth complexity and synthetic patches, respectively, in equation (9). Comp Layer Input Output Par FL 1 𝐾𝐹𝑀 = ∑𝑘||𝐻 𝑘(𝑔𝑡) − 𝐻𝑘(𝑠𝑦𝑛)|| (9) 𝑀 1 onent Type Shape Shape ams OPs Semant Conv2 512×51 64×256 ~3.1 ~2.5 Where 𝐻𝑘(. ) was the output map of features of the pre- ic D + 2×3 ×256 M B trained StyleGAN synthesizing network's 𝑘 − 𝑡ℎ Encode ReLU resolution block (with a matching spatial resolution of 2𝑘). r (E_s) × 4 The number of computed blocks was 𝑀. Following the Sketch Conv2 512×51 64×256 ~2.8 ~2.2 replacement resolution block, it computes the 𝐾1 norm Encode D + 2×1 ×256 M B between the ground truth generation and the synthesized r (E_k) ReLU processes(𝑘 ∈ {6,7,8,9}𝑎𝑛𝑑 𝑀 = 4). × 4 Spatial Add/C 64×256 64×32× ~0.6 ~0.3 3.3.2 AECO Combi oncat + ×256 32 M B To improve model convergence, stability, and learning ner Down efficiency, AECO dynamically modifies training Conv hyperparameters. This raises the caliber and efficacy of StyleG StyleB 1×512 1024×1 ~30 ~75 engagement of generated ad creatives spanning platforms. AN lock × 024×3 M B The AECO enhances the DL framework by efficiently (Synth 18 optimizing parameters, enabling dynamic generation of esis personalized advertising creative through adaptive search, Net) exploration, and convergence strategies. The ECO was enhanced into an improved version to support a DL Objective Function: The SC-StyleGAN was to precisely framework that dynamically generates and optimizes map the determined conditions to their equivalents in the advertising creatives. This AECO algorithm addresses the synthesis process while encoding the spatial constraint for limitations of the original by improving convergence speed the StyleGAN synthesizing procedure while preserving the and solution quality, enabling more effective, real-time invention value of the StyleGAN. Equation (6) fulfills the content creation and personalization in advertising through training process's goal in the following ways. intelligent data-driven optimization. Many tasks using Adam and RMSprop are successful, but K(Jgt, Jsyn) = λK K1(Jgt, J1 syn) + λK KGP + λK K GP LP LP + they have issues with GANs, including convergence λK K FM FM (6) issues, collapsing to single modes, and being sensitive to changes in learning rates. To resolve these issues in The SC-StyleGAN training uses a composite loss to creating ads for different channels, the new AECO strategy enhance image quality and consistency. The 𝐿1 loss (𝐾1) adapts by using evolutionary methods to tune ensures pixel-level accuracy, while the global perceptual hyperparameters, which boosts the stability and resilience loss (𝐾𝐺𝑃) maintains semantic alignment at full scale. The AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 395 of the model when facing different spatial and contextual 𝑊𝑖𝑡+1 𝐹𝐶𝑗,𝑛 = situations. 𝑍𝑋 𝑖𝑡+1 𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛼 × [𝑊 𝑖𝑡 𝐹𝐶 (𝑖 𝑁𝐶−𝑗,𝑁 ) −𝑊𝑖𝑡 Elephant migration under the direction of each clan 𝐹𝐶𝑗,𝑛(𝑖)] +𝑞 × 𝛼 × [𝑊𝑖𝑡 𝑖 principal was simulated using the ECO algorithm. This 𝑀𝐶,𝑅𝑚(𝑖) − 𝑊 𝑡 𝐹𝐶𝑗,𝑛(𝑖)], 𝑖𝑓 𝑛 > part provides an autonomous movement range and an 𝑁𝑒 2 autonomous movement position for each elephant to keep 𝑍𝑀𝑖𝑡+1 𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛼 × [𝑊 𝑖𝑡 𝐹𝐶𝑗,𝑁(𝑖) − 𝑊 𝑖𝑡 𝐹𝐶𝑗,𝑛(𝑖)]the algorithm from reaching a local optimum, enhance image variety, and replicate the aforementioned behaviors +𝑞 × 𝛼 × based on the initial elephant position to generate creative { [𝑊𝑖𝑡 𝑀𝐶,𝑅𝑚(𝑖) − 𝑊 𝑖𝑡 𝐹𝐶𝑗,𝑛(𝑖)], 𝑒𝑙𝑠𝑒 advertising, as illustrated in equations (10) and (11). (13) ∆𝑊0(𝑖) = ∆𝑊𝑚𝑖𝑛(𝑖) + 𝑞 × (∆𝑊𝑚𝑎𝑥(𝑖) − ∆𝑊𝑚𝑖𝑛 𝑗 (𝑖)) In the 𝑁𝑐 − 𝑗family group 𝑊 𝐹𝐶𝑀 at the 𝑗 − 𝑡ℎ 𝑑=𝑗, (10) iteration, the matriarch of the female elephant was represented by 𝑊𝑖𝑡 𝐹𝐶𝑀𝑑=𝑗,𝑁 . After sorting at 𝑖𝑡 + 1 iteration Where 𝑞 was a random number in the interval that is 𝑍𝑋𝑖𝑡+1𝐹𝐶𝑗,𝑛 was the autonomous position of the 𝑛(𝑛 = uniformly distributed [0, 1], and ∆𝑊0 𝑗 (𝑖)(𝑗 = 1,2, … . ,𝑀𝑓) clan member, and equation (14) shows that 𝑞 1,2, … . ,𝑀, 𝑖 = 1,2, … . , 𝐶) represents the range of was a uniformly spread range form [0, 1]. 𝛼 was the independent movement of the 𝑗 − 𝑡ℎ elephant in the 𝑖 − improved adaptable scaling factor. 𝑡ℎ dimension at the starting time. Both ∆𝑊𝑚𝑖𝑛(𝑖) = −∆𝑊𝑚𝑎𝑥(𝑖) and ∆𝑊𝑚𝑎𝑥(𝑖) = 𝐸 × (𝑊𝑚𝑎𝑥(𝑖) − 𝑖𝑡 𝛼 = 2 − (𝑑 × ) (14) 𝑖𝑡𝑚𝑎𝑥 𝑊𝑚𝑖𝑛(𝑖)) represent the lower and upper bounds of the 𝑖 − 𝑡ℎ dimensional autonomous motion space. Generally Where 𝑑 was a fixed value that was typically set to 0.5 to speaking, 𝐸 can be seen as 0.005 for improved outcomes, get the best results, while the optimization issue itself requires in different values. 𝑌𝑊0(𝑖) = 𝑊0 0 𝑗 𝑗 (𝑖) + ∆𝑊𝑗 (𝑖) (2) An autonomous location traction-based individual (11) update method for matriarchs is employed; as previously stated, the globally optimal individuals swiftly approach As evolution advances, the autonomous range of mobility the globally optimal region after traversing each of the of each elephant should likewise diminish as individuals matriarchs in the ECO algorithm. This section suggests an become closer to one another. Therefore, the independent autonomous position, traction-based matriarch updating moving range updates technique in equation (12) is used. approach, as illustrated in equation (15). 𝑖𝑡+1 𝑖𝑡 ∆𝑊𝑗 (𝑖) = [0.9 − (0.8 × )] × ∆𝑊𝑖𝑡 𝑖𝑡 𝑗 (𝑖) (12) 𝑖𝑡+1 (𝑖) = 𝑍𝑋𝑖𝑡+1 𝑡 𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛽 × [𝑊 𝑖 𝑚𝑎𝑥 𝑊𝐹𝐶𝑗,𝑛 𝐵𝑒𝑠𝑡(𝑖) − 𝑊𝑖𝑡 𝐹𝐶𝑗,𝑁(𝑖)] (15) Enhancement of the family clan's renewal technique: The mother elephant was the best person in each family clan, The scaling factor 𝛽 was determined using equation (14), and all other clan members learn from the generative while 𝑍𝑋𝑖𝑡+1𝐹𝐶𝑗,𝑁 was the independent movement location of image. While clan members are responsible for the matriarch in this clan at the 𝑖𝑡 + 1 iteration, acquired maintaining population diversity to provide the mother similarly to equation (16). elephant with superior evolutionary information for quick convergence, the mother elephant was primarily 𝑖𝑡 responsible for swiftly investigating the area where the 𝛽 = 3 − (𝑑 × ) (16) 𝑖𝑡𝑚𝑎𝑥 hypothesized optimal location was located. (1) A method of updating each clan member individually Improvement of the individual renewal method of the male based on the autonomous location traction equation (13), elephant clan: According to the ECO algorithm, the male this section proposes a way to update the individual elephant clan was essential in creating globally ideal population members based on autonomous location locations for female clan leaders and substituting certain traction to better maintain species variety and avoid family members to supply evolutionary data. Despite this, significantly slowing down the algorithm's rate of the number of male elephants might add to the diversity of convergence. the family clan. In light of this, equation (17) provides the male elephant individual renewal formula to guarantee that the male elephant clan has particular population diversity and generates as much evolutionary information as possible. 396 Informatica 49 (2025) 389–402 Y. Zhang 𝑊𝑖𝑡+1 AECO’s search positions and StyleGAN’s exact 𝑀𝐶,𝑚(𝑖) = 𝑍𝑋 𝑖𝑡+1 𝑀𝐶,𝑚(𝑖) + 𝑞 × 𝑜 × (𝑊 𝑖𝑡 𝐶𝑒𝑛𝑡𝑒𝑟 − hyperparameters (like learning rate or noise scale) should 𝑊𝑖𝑡 𝑀𝐶,𝑚(𝑖)) (17) be clarified. In the 𝑖𝑡 + 1 iteration of the male elephant clan 𝑍𝑋𝑖𝑡+1𝑀𝐶,𝑚 4 Results and discussion represents the autonomous movement location of the 𝑚 − 𝑡ℎ(𝑚 = 1,2, … ,𝑀 All experiments used Python 3.10.1 on an NVIDIA Tesla 𝑓) elephant. Equation (18) illustrates that the 𝑜 was determined using 𝑊𝑖𝑡 V100 GPU. AECO-SC StyleGAN trained for 50 epochs 𝐶𝑒𝑛𝑡𝑒𝑟 , which was the location of the maternal elephant patriarch in each family (1,000 iterations each) in approximately 12 GPU hours, outperforming Adam (15 GPU hours) in efficiency. The clan in the 𝑖𝑡 − 𝑡ℎ iteration. proposed strategy was assessed and its effectiveness was 1 determined using the following indicators: Normalized 𝑊𝑖𝑡 𝑡 𝐶𝑒𝑛𝑡𝑒𝑟 = × ∑𝑁𝑐−1𝑊𝑖 (18) 𝑁𝑐−1 𝑗=1 𝐹𝐶𝑗,𝑁 Discounted Cumulative Gain (NDCG), accuracy, and weighted F1, The Fréchet Inception Distance (FID), Improvement of individual replacement strategy for part of Structural Similarity Index Measure (SSIM) and Peak the family clan: Enhance the plan for replacing adult Signal-to-Noise Ratio (PSNR). Although AECO-SC elephants. The following adult elephant replacement was StyleGAN is a generative framework, its output creatives suggested to guarantee the algorithm's speed of are evaluated using a downstream binary classification convergence and boost the variety of the creative ad task predicting ad engagement (high vs. low CTR). All images, as adult elephants are not the superior elephants models, including baselines like VGG + Layout + NIMA within this clan. Equation (19) indicates the central and XCEPTION, are evaluated on this task for a fair position of all clan members. Otherwise, as indicated by comparison, also implementing the baseline method to this equation (20), the superior person was chosen to replace research. Table 3 represents hyperparameters for AECO- the real adult elephant from both the new individual and SC StyleGAN-based framework used in dynamic the original adult elephant. advertising creative optimization. 𝑊𝑖𝑡+1 1 𝐹𝐶𝑗,𝐺𝑚 = ∑𝑁𝑒1𝑊 𝑖𝑡+1 (19) Table 3: Hyperparameter Settings for AECO-SC 𝑁𝑒 𝑗= 𝐹𝐶𝑗 (𝑖) StyleGAN Framework 𝑊𝑖𝑡+1 Hyperparameter Value 𝑀𝐶,𝑅𝑚(𝑖) 𝑊𝑖𝑡+1 𝑖𝑡+1 𝑊𝑖𝑡+1 (𝑖)+ 𝐹𝐶𝑗,𝐶𝑎𝑙 𝑒(𝑖) = 𝑊𝐹𝐶𝑗,𝐺𝑚(𝑖) + 𝑞 × [ 𝐹𝐶𝑗,𝑅𝑓 − 2 Batch Size 32 𝑗+1 Learning Rate (Generator) 0.0001 𝑊𝐹𝐶𝑗,𝐺𝑚(𝑖)] (20) Learning Rate 0.0004 (Discriminator) Improve the inferior small elephant replacement strategy: Epochs 200 The ECO algorithm replaces the poor individuals in family Image Size 256 × 256 × 3 clans to maintain population diversity, but reduces Latent Vector Dimension (z) 512 convergence speed. Early iterations have significant Dropout Rate 0.3 differences, while late iterations focus on population Activation Function Leaky ReLU (α=0.2) diversity. The worst 0.3Ne tiny elephants in each family Normalization Instance clan are replaced with new individuals in the evolutionary Normalization stage. Equation (21) generates new individuals during the AECO Population Size 30 pre-evolutionary period, where 𝑖𝑡 < 𝑖𝑡𝑚𝑎𝑥. The 𝑖𝑡 + AECO Max Iterations 100 1 iteration of the family clan 𝑊𝑖𝑡+1 𝐹𝐶𝑗 , where 𝑊𝑖𝑡+1 𝐹𝐶𝑗,𝑥(𝑖) was the worst one to be replaced. 4.1 Evaluation task 𝑖𝑡+1 𝑖𝑡+1 𝑊𝑖𝑡+1 𝑖𝑡+1 𝑊 𝐶𝑗,𝑅𝑓(𝑖)+𝑊𝑀𝐶,𝑅𝑚(𝑖) The primary task is a binary classification of ad creatives 𝐹𝐶𝑗,𝐶𝑎𝑙 𝑒(𝑖) = 𝑊𝐹𝐶𝑗,𝑥(𝑖) + 𝑞 × [ 𝐹 − 2 into 'high engagement' vs. 'low engagement' based on 𝑊𝑖𝑡+1 historical CTR data. Ads with CTR above the 75th 𝐹𝐶𝑗,𝑥(𝑖)] (21) percentile were labeled as high engagement (1), and others as low engagement (0). This classification target enables AECO enhances SC-StyleGAN by tuning the model to learn aesthetic and contextual cues that align hyperparameters using elephant-inspired population with user interaction patterns. The dataset was split dynamics. Each elephant’s position represents a candidate 70/15/15 for training, validation, and testing. Evaluation solution, evolving through clan-based exploration and was conducted on the unseen 15% test set. Performance adaptive updates. This improves convergence and avoids metrics included NDCG, classification accuracy, and common GAN issues. However, the mapping between weighted F1-score FID, SSIM, and PSNR. Baseline AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 397 models included Visual Geometry Group Layout feature The training accuracy and loss over 50 epochs for the Neural Image Assessment (VGG + Layout features + proposed AECO-SC StyleGAN, XCEPTION, and VGG + NIMA) [20], XCEPTION [21], AdvAE-GAN [22] , Layout + NIMA are displayed in Figure 5(a, b). The BicycleGAN [22], V-GAN [23], and Vanilla GAN [23]. AECO-SC StyleGAN consistently achieves superior All models were trained under similar hardware and accuracy and inferior loss, demonstrating better learning optimization conditions to ensure a fair comparison. Table efficiency, faster convergence, and more stable training. 4 shows the comparison of classifiers and their Accuracy values were mentioned in percentage. Both performance evaluation results. training and validation accuracy curves show steady The evaluation pipeline begins with AECO-SC StyleGAN growth and eventual stabilization, indicating effective generating advertising creatives. These outputs are labeled learning with minimal overfitting. Similarly, the loss based on a CTR threshold to indicate high or low curves for training and validation exhibit a clear downward engagement. A classifier then predicts engagement levels, trend, reflecting successful convergence. These results allowing metrics like NDCG, Accuracy, and Weighted F1- highlight the model's ability to efficiently capture cross- score to assess how well the generated creatives align with platform advertising dynamics, generate high-quality user interaction patterns. creatives, and maintain strong generalization across datasets, ultimately improving engagement prediction 4.2 Accuracy and loss performance. Figure 5: Accuracy and Loss Comparison of Models 4.3 Ad image dimension distribution The GAN ad image dimension distribution operates as a cross-platform deep learning framework that generates multiple platform-optimized sizes for presented images. Standard display and mobile ad dimensions are the format choices for most images, which guarantee visual performance while ensuring cross-platform compatibility. Figure 6 displays the Ad image dimensions across digital platforms. 4.4 Clicks-through rate (CTR) by platform The CTR performance stands tested across different platforms through the use of a GAN framework for dynamic ad optimization. Results indicate that performance metrics vary between platforms since mobile achieves better CTR than desktop. Through its creative adaptation, the GAN model demonstrates high Figure 6: Ad image dimensions across digital platforms engagement while showing the power of deep learning as a means to improve cross-platform digital advertisement results. Figure 7 shows the CTR distribution across four social platforms. 398 Informatica 49 (2025) 389–402 Y. Zhang 0.61, much better than the NDCG of 0.22 for the baseline VGG + Layout + NIMA model. From this, we can see that our model helps us better find and rank the strongest ad creatives first. With this accuracy, marketers are better equipped to promote content that has a significant effect. Figure 9 illustrates the NDCG scores of all the evaluated models. Figure 7: CTR distribution across four social platforms 4.5 Convergence and runtime analysis To compare the stability of the training between AECO and Adam, by run both for 100 iterations shown in Figure 8. AECO demonstrated a quicker and smoother convergence, as seen by its early near-zero loss. Ad creative generation Figure 9: Illustrates NDCG performance results requires a stable and fast system, as it works with many constraints in quick optimization. By using adaptive 4.7 Accuracy learning, AECO avoids the problems of local minima and maintains consistency, which makes it effective than Adam Accuracy indicates how effectively the classifier predicts and RMSprop for cross-platform advertising. whether generated advertisements will result in high or low user engagement (based on CTR), thus assessing the effectiveness of the generated creatives. The high accuracy in generating platform-specific ad creatives consistently aligns with user engagement metrics, outperforming baseline models in aesthetic coherence, contextual relevance, and predictive performance across platforms. The results demonstrate that XCEPTION achieved an accuracy of 98.27%, while AECO-SC StyleGAN performed slightly better with an accuracy of 98.48%, showcasing their effectiveness in the given task. 4.8 Weighted F1 The F1-score balances precision and recall, crucial for imbalanced engagement data, indicating how well the model generates relevant, high-performing ads while Figure 8: Outcomes of convergence and runtime analysis minimizing misclassification. The Weighted F1 score was a metric used to evaluate the performance of a GAN in 4.6 NDCG dynamic advertising creative optimization, emphasizing precision and recall across various platforms. Table 4 gives The NDCG score was applied to measure how relevant and Evaluation of Ad Engagement Prediction Based on well-arranged the ad creatives were for users. Because of Generated Ad Creatives. The results show that AECO-SC this, NDCG is better suited for this task, as it rewards StyleGAN outperforms XCEPTION, achieving a higher higher positions for predicting relevant content. A better Weighted F1 score of 98.5% compared to 98.2%, NDCG means the model ranks the most engaging and demonstrating superior performance in dynamic appropriate content first, which helps in dynamic advertising creative optimization. Figure 10 displays the advertising situations where space and time are both accuracy and weighted F1 evaluation results. limited. The NDCG for the AECO-SC StyleGAN was AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 399 Table 5: Performance comparison of creative generation models on the Ad ImageNet dataset Method NDCG Accuracy Weighted (Mean (%) (Mean F1 (%) ± SD) ± SD) (Mean ± SD) VGG + 0.22 ± 94.62 ± 94.3 ± 0.38 Layout + 0.015 0.40 NIMA [20] XCEPTION 0.45 ± 98.27 ± 98.2 ± 0.21 [21] 0.020 0.25 AECO-SC 0.61 ± 98.48 ± 98.5 ± 0.19 StyleGAN 0.018 0.22 [Proposed] In addition to reporting the mean ± SD, we performed paired t-tests to evaluate whether the improvements over baseline models are statistically significant. The results confirm that the performance gains of AECO-SC Figure 10: Accuracy and Weighted F1 Evaluation Results StyleGAN over XCEPTION and VGG+NIMA are statistically significant with p < 0.01 for all three metrics. Table 4: Evaluation of ad engagement prediction based on generated Ad Creatives 4.9 Performance comparison of generative Accuracy Weighted models Method NDCG (%) F1 (%) FID scores of the suggested AECO-SC StyleGAN are VGG + contrasted with those of other GAN-based baselines in Layout 0.22 - - Figure 11. Among the tested methods, the proposed features + AECO-SC StyleGAN delivered the best quality, with an NIMA [20] FID score of 38.4752, compared to 42.3256 for AdvAE- XCEPTION - 98.27 % 98.2 % GAN [22] and 45.0208 for BicycleGAN [22]. [21] AECO-SC StyleGAN 0.61 98.48 % 98.5 % [Proposed] 4.8 Statistical evaluation of model performance To address this, we have conducted additional experiments using five different random seeds. For each seed, the model was trained and evaluated independently using the same data split. We now report the mean ± standard deviations for the key evaluation metrics, including Normalized Discounted Cumulative Gain (NDCG), Accuracy, and Weighted F1-score. Performance Comparison of Creative Generation Models on the Ad ImageNet Dataset given below Table 5. Figure 11: Generative quality evaluation model comparison results. 400 Informatica 49 (2025) 389–402 Y. Zhang The SSIM and PSNR metrics for AECO-SC StyleGAN and other GAN variations are shown in Figure 12. Higher PSNR and SSIM values indicate better image fidelity and structural similarity to the original images. The AECO-SC StyleGAN shows the highest PSNR of 35.8 dB and an SSIM of 0.95 which is better than the PSNR of 33.5 dB and SSIM of 0.92 for V-GAN [23] and the PSNR of 28.4 dB and SSIM of 0.85 for Vanilla GAN [23]. It means that AECO-SC StyleGAN is able to create images that are more clearly detailed and accurate than other styles. Figure 13: Generated Ad Creative using AECO-SC StyleGAN 4.11 Discussion Dynamic advertising creative optimization across multiple platforms aims to enhance user engagement and conversions by generating context-aware, personalized ad content. Traditional models such as VGG combined with Figure 12: Model comparison results of image quality Layout features and NIMA [20] rely on fixed image assessment. features, limiting their capacity to capture the full spectrum of complex, interactive visual and contextual patterns 4.10 Visual results and assessment of visual inherent in cross-platform environments. As a result, the fidelity creatives they generate often lack adaptability and personalization, making them less effective in varied user To evaluate the visual fidelity of the proposed AECO-SC scenarios. Meanwhile, XCEPTION-based GAN [21] StyleGAN, we generated advertisement creatives using the models, although capable of deeper feature extraction, are Ad ImageNet dataset. Figure 13 illustrates side-by-side hindered by their high computational and memory examples of generated ads, showcasing a variety of demands. Their complex operations limit scalability and product categories including fashion, electronics, and pose challenges for deployment on lightweight or real-time skincare. The generated ads closely match real ones in advertising platforms, reducing practicality in widespread layout, color schemes, and promotional text, reflecting commercial use. platform-specific design aesthetics. While maintaining In contrast, the proposed AECO-SC StyleGAN framework coherence, the model introduces subtle variations that add addresses these limitations by integrating adaptive diversity and creativity. These results demonstrate that hyperparameter tuning and spatial conditioning to generate AECO-SC StyleGAN effectively replicates real ad high-fidelity, semantically consistent creatives tailored to characteristics, providing a scalable and automated specific platform requirements. AECO enhances approach for cross-platform ad generation. convergence and training efficiency, while SC-StyleGAN ensures visual and contextual alignment across formats. This leads to better performance and improved resource utilization, offering a scalable and intelligent solution for dynamic advertising creative generation in diverse deployment environments. AECO-SC StyleGAN: A Cross-Platform GAN Framework for… Informatica 49 (2025) 389–402 401 While AECO-SC StyleGAN improves convergence speed compression and distillation techniques to reduce training and reduces memory consumption relative to baseline time and memory consumption without sacrificing output GANs during training, it still requires substantial quality. computational resources overall, particularly due to its large model size and high-resolution output generation. References However, once trained, the model supports relatively efficient inference, making it suitable for real-time or near- [1] Geng, T., Sun, F., Wu, D., Zhou, W., Nair, H., & Lin, real-time deployment scenarios. Z. (2021). Automated bidding and budget optimization for performance advertising campaigns. SSRN. 5 Conclusions https://doi.org/10.2139/ssrn.3913039 [2] Leow, K. R., Leow, M. C., & Ong, L. Y. (2021). The DL framework uses GAN for dynamic advertising Online roadshow: A new model for the next- creative optimization, enabling effective cross-platform generation digital marketing. In Proceedings of the strategies to enhance ad personalization and performance Future Technologies Conference (pp. 994–1005). in real time. Data collection involved the Ad ImageNet Springer, Cham. https://doi.org/10.1007/978-3-030- dataset, consisting of multimodal ad samples. 89906-6_64 Preprocessing included image resizing, tokenization, and [3] Ameen, N., Sharma, G. D., Tarba, S., Rao, A., & intensity normalization. This approach demonstrates a Chopra, R. (2022). Toward advancing theory on scalable, efficient method for cross-platform ad creative creativity in marketing and artificial intelligence. optimization, ensuring higher engagement and visual Psychology & Marketing, 39(9), 1802–1825. coherence. The results show that the AECO-SC StyleGAN https://doi.org/10.1002/mar.21699 method achieved an NDCG of 0.61, an accuracy of [4] Gharibshah, Z., & Zhu, X. (2021). User response 98.48%, and a weighted F1 score of 98.5%. These metrics prediction in online advertising. ACM Computing highlight the method’s high performance in optimizing Surveys (CSUR), 54(3), 1–43. dynamic advertising creatives with excellent precision and https://doi.org/10.1145/3446662 relevance. Although AECO-SC StyleGAN shows [5] Ouyang, X., Chen, Y., Zhu, K., & Agam, G. (2024). promising results in generating optimized ad creatives Image restoration refinement with Uformer GAN. with high quality, the training process remains In Proceedings of the IEEE/CVF Conference on computationally intensive due to the high-resolution Computer Vision and Pattern Recognition (pp. 5919- outputs and multiple conditioning layers. The model may 5928). face challenges in ensuring consistency across diverse https://doi.org/10.1109/cvprw63382.2024.00599 platforms, handling large-scale real-time data, and [6] Liang, Y., Deng, R., Lin, W., Deng, R., Zhu, X., & optimizing for varying audience preferences. It also Yu, L. (2025). Modeling and Reinforcement Learning requires significant computational resources for training. Assessment System for Quality Improvement of Future scope could focus on improving real-time Advertising Design. Computer-Aided Design & adaptability, cross-platform integration, and reducing Applications, 21, 188-200. computational costs for broader adoption in dynamic https://doi.org/10.14733/cadaps.2025.S7.188-200 advertising. [7] Patil, D. (2024). Generative Artificial Intelligence In Marketing And Advertising: Advancing 5.1 Limitations and future work Personalization And Optimizing Consumer Engagement Strategies. Available at SSRN 5057404. While AECO-SC StyleGAN shows promising results, it https://dx.doi.org/10.2139/ssrn.5057404 presents notable limitations. First, training the model [8] Terzioğlu, S., Çoğalmış, K. N., & Bulut, A. (2024). requires significant computational resources, with 30+ Ad creative generation using reinforced generative hours of training time on high-memory GPUs, limiting adversarial network. Electronic Commerce Research, accessibility for smaller teams. Second, generalization 24(3), 1491–1507. https://doi.org/10.1007/s10660- across domains remains a challenge. Early tests on ad 022-09564-6 categories like automotive and electronics suggest reduced [9] Chen, J., Xu, J., Jiang, G., Ge, T., Zhang, Z., Lian, D., performance, warranting domain-adaptive retraining. & Zheng, K. (2021). Automated creative optimization Third, although AECO-SC generates high-quality for e-commerce advertising. In Proceedings of the creatives, its deployment in real-time ad systems is Web Conference 2021 (pp. 2304–2313). untested. Future work will explore integration with ad https://doi.org/10.1145/nnnnnnn.nnnnnnn delivery platforms and A/B testing frameworks to assess live performance metrics such as CTR and Return on Ad Spend (ROAS), moving toward a fully automated ad generation and evaluation pipeline. Focus on model 402 Informatica 49 (2025) 389–402 Y. Zhang [10] Li, G., & Yang, X. (2024). Two-stage dynamic [20] Vempati, S., Malayil, K. T., Sruthi, V., & Sandeep, creative optimization under sparse ambiguous R. (2020). Enabling hyper-personalisation: samples for e-commerce advertising. SN Computer Automated ad creative generation and ranking for Science, 5(8), 1–16. fashion e-commerce. In Fashion Recommender https://doi.org/10.1007/s42979-024-03332-z Systems (pp. 25–48). Springer International [11] Li, Q., & Zhou, E. (2024). Design and Publishing. https://doi.org/10.1007/978-3-030- implementation of automatic generation algorithm 55218-3_2 for advertising artistic design based on neural [21] Moreno-Armendáriz, M. A., Calvo, H., Faustinos, networks. Computer-Aided Design & Applications, J., & Duchanoy, C. A. (2023). Personalized 21, 114–127. advertising design based on automatic analysis of https://doi.org/10.14733/cadaps.2024.S18.114-127 an individual’s appearance. Applied Sciences, [12] Baardman, L., Fata, E., Pani, A., & Perakis, G. 13(17), 9765. (2021). Dynamic creative optimization in online https://doi.org/10.3390/app13179765 display advertising. SSRN. [22] Kong, M. (2025). A study on optimizing deep https://doi.org/10.2139/ssrn.3863663 learning models for creative generation of animated [13] Meng, Q., & Wei, R. (2024). Creative advertising new media advertisements: an application based on design combining CAD and generating adversarial improved generative adversarial networks (GANs) networks. Computer-Aided Design & Applications, and variational autocoders (VAEs). J. COMBIN. 21, 102–116. MATH. COMBIN. COMPUT, 127, 7227-7248. https://doi.org/10.14733/cadaps.2024.S27.102-116 DOI: 10.61091/jcmcc127a-401 [14] Gao, B., Wang, Y., Xie, H., Hu, Y., & Hu, Y. [23] Kong, M. (2025). Deep Learning Model (2023). Artificial intelligence in advertising: Optimization in Creative Generation for New Advancements, challenges, and ethical Media Animated Ads. considerations in targeting, personalization, content https://doi.org/10.21203/rs.3.rs-5879017/v1 creation, and ad optimization. Sage Open, 13(4), 21582440231210759. https://doi.org/10.1177/21582440231210759 [15] Jiang, L., Li, C., Chen, H., Gao, X., Zhong, X., Qiu, Y., ... & Niu, D. (2023, August). AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4239-4251). https://doi.org/10.1145/3580305.3599770 [16] Shilova, V., Santos, L. D., Vasile, F., Racic, G., & Tanielian, U. (2023, September). Adbooster: Personalized ad creative generation using stable diffusion outpainting. In Workshop on Recommender Systems in Fashion and Retail (pp. 73-93). Cham: Springer Nature Switzerland. DOI: https://doi.org/10.1007/978-3-031-76878-1_5 [17] Xu, C., Zhou, M., Ge, T., Jiang, Y., & Xu, W. (2023). Unsupervised domain adaption with pixel- level discriminator for image-aware layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10114-10123). [18] Aghazadeh, A. and Kovashka, A., 2024. CAP: Evaluation of Persuasive and Creative Image Generation. arXiv preprint arXiv:2412.10426. https://doi.org/10.48550/arXiv.2412.10426 [19] Ma, M., & Zhao, W. (2024). Computer-Aided Brand Logo Design Based on Generative Adversarial Networks. https://doi.org/10.14733/cadaps.2024.S25.60-75 https://doi.org/10.31449/inf.v49i12.9455 Informatica 49 (2025) 403–418 403 Adaptive Control of PV-Integrated Power Grids Using KNN-Smote- GCN And Mpc Techniques Kun Zhang*, Xiaogang Wu, Zhizhong Li1, Yaotang Lv, Shiqi Liu China Southern Power Grid Power Dispatching Control Center, Guangzhou, Guangdong,510000, China E-mail: KunZhang654@outlook.com, YananZhang46@outlook.com, WenjingSi999@outlook.com, GuanghuaYang5426@outlook.com, JingpingGuo243@outlook.com *Corresponding author Keywords: auxiliary control technology, artificial intelligence, KNN-SMOTE-GCN model, large power grid, MPPT control Received: May 29, 2025 As the global energy crisis intensifies, the integration of renewable energy—particularly photovoltaic (PV) systems—has become vital for achieving a sustainable and resilient power infrastructure. This study focuses on dynamic modeling and efficient control of grid-connected PV systems to enhance power quality and system reliability. An adaptive PI controller is employed for voltage regulation, with a maximum power point tracking (MPPT) method ensuring optimal energy harvesting. A DC-DC boost converter and a three-phase PWM inverter are incorporated, with MATLAB used for simulation. The proposed approach integrates Model Predictive Control (MPC) with Graph Convolutional Networks (GCN) to manage grid instability and improve energy efficiency. A novel KNN-SMOTE-GCN algorithm is developed to mitigate voltage distortion, harmonic currents, and power fluctuations. The system replicates the behavior of traditional generators under disturbances, promoting renewable integration without compromising stability. Key performance metrics such as voltage deviation, reactive power fluctuation, power loss, and total harmonic distortion (THD) are analyzed. Povzetek: Integrirani KNN-SMOTE-GCN in MPC izboljšata stabilnost PV-omrežij z natančnim MPPT, učinkovitim nadzorom napetosti ter zmanjšanjem izgub, nihanja jalove moči in THD. Metoda poveča kakovost energije in zanesljivost šibkih omrežij z visoko penetracijo PV. 1 Introduction that produce more complex harmonics and demand reactive power are electronic power equipment. This The reckless use of hydrocarbons and nuclear power action causes voltage distortion, which impacts all threatens environmental safety and causes significant subsequent loads linked to the identical PCC. Optimal pollution. The truth of this energy source is prompting a performance of solar photovoltaic inverters is hindered by global movement toward renewable energy sources that the unpredictability of sun irradiation [6], [7]. Two are less harmful to the environment, including as wind examples of supplementary services that the inverter's power, PV, and others. Distributed power generating extra capacity may offer are reducing source current systems that employ renewable energy sources have harmonics while adjusting reactive load power. When it garnered significant interest due to the current focus on comes to PV-integrated systems, MPPT is a go-to for clean power generation [1], [2], [3]. Recent advances in reducing harmonics. One method for reducing PV system photovoltaic technology have led to the rapid adoption of grid current harmonics is the adaptive P&O (perturb and renewable energy production based on solar PV by both observation) MPPT algorithm, which incorporates sliding commercial and residential sectors. Reduced main power mode control [8], [9]. The goal of auxiliary regulation is to system load, maximum savings, and reactive power maintain grid stability by modifying power system support are just a few of the benefits that the distribution characteristics in response to imbalances, fluctuations, and grid may reap from integrating distributed solar PV disruptions. The grid, however, functions within generating plants [4], [5]. electricity quality and reasonable bounds and adapts efficiently to shifts in both dependability are both enhanced by solar PV electricity, generation and demand [10], [11]. Controlling the grid which lessens the strain on the central grid. The energy frequency entails modifying either electricity production quality usually drops as the use of non-linear loads or consumption to keep it within predetermined increases. It is also well known that most non-linear loads boundaries. Ensures that electrical equipment continue to 404 Informatica 49 (2025) 403–418 K. Zhang et al. function correctly by keeping voltage levels within certain sporadic balance of supply. Also, traditional power sources limits. Optimizes system performance by balancing the aren't practical for such a difficult job, and they drive up production and consumption of both reactive and active energy prices. electricity. More conventional approaches, such as DL, The next step was to improve electrical distribution Machine Learning, etc [12]. These systems are often networks' power quality by using an optimization studied for their possible use in power system approach. It employs a hybrid design that incorporates optimization, control performance, and forecasting. Due to shunt and series compensators to address voltage drops, a lack of sophisticated automation infrastructure, many harmonics, and imbalance, among other power quality system operations are being performed with modest concerns. Afterwards, MPPT was used to derive the degrees of automation at the time. AI is expected to play a greatest amount of power from the grid system. Controller significant role in the future power system, according to for MPC to ascertain the system's overall stability and several studies, technical papers, and case studies [13], performance. In addition, the model was tested on the [14]. This is because AI will introduce state-of-the-art MATLAB platform and its reliability was assessed by techniques of system optimization while simultaneously measuring voltage variation, reactive power fluctuations, decreasing the need for human participation. Research on grid current, and THD. AI for grid system power flow optimization is now at a premium. The auxiliary services that help to reduce 1.2 Motivation frequency variations are crucial to the reliability of ac power networks. Large synchronous generators' Many issues, including power quality, stability, electromechanical inertia is the only available resource for dependability, and supply management, may arise as a absorbing frequencies disturbances on subsecond time result of the increasing need for big power grid-connected scales at the moment. This means that switching from systems. In addition, the total system performance might traditional thermal power plants to NREs, which are be negatively impacted by power quality concerns as a inertialess, puts grid stability at risk from things like result of variations. It is possible for there to be an unexpected power production outages. Grids with high imbalance in the power demand and generation frequency penetrations of NREs may suffer from electromechanical fluctuations. Next, problems with the power factor, such as inertia, which may disrupt system stability. To address this, a low power factor, might cause the power distribution virtual synchronous generators have been suggested, system to lose more power and increase energy usage. which mimic traditional generators. In this paper, we Voltage instability is the root cause of both linear and non- provide a new method of controlling virtual synchronous linear problems. Voltage regulation may be subpar due to generators that uses a configurable time scale to reduce the the persistent use of insufficient control mechanisms in supplied inertia, which is large at short intervals to absorb power grid systems. Ensuring the stability and operation of faults as effectively as traditional generators but sets in big power networks also relies heavily on rules and norms motion coherent frequency oscillations when it doesn't that specify acceptable power quality values. As a result, [15], [16]. We test how well our adaptive-inertia approach grid systems need an intelligent auxiliary regulatory handles large-scale transmission networks that experience technology that can effectively lessen the burdens on them. unexpected power outages. It is more stable than earlier proposed methods and consistently outperforms traditional 1.3 Contributions electromechanical inertia. The numerical simulations demonstrate that the quasioptimal placement of adaptive- Despite the paper's focus on intelligent real-time power inertia devices enhances the damping of interarea grid regulation and control, no mention of research into oscillations and effectively absorbs local faults. In future building the comprehensive functional foundation of a low-inertia power grids that have significant penetrations dispatching intelligent assistant driving network is made. of NREs, our findings demonstrate that the suggested The study and evaluation of real-time regulation and adaptive-inertia control system is a great way to improve control business aims to explore fresh artificial intelligence grid stability [17], [18], [19], [20]. application methods for various business processes, as well as the principle and implementation characteristics of 1.1 Problem statement a grid-assisted control system based on AI thinking and decision-making in regulation and control operations. In In today's world, contemporary power systems are order to achieve the shift from empirical to intelligent complemented with large-scale renewable energy systems, control and enhance the degree of control over the power allowing for more efficient operations. Accurate energy grid, we provide solutions to raise the bar for artificial production and efficient control systems to manage while intelligence in terms of both interaction and performance. guarantee a reliable power supply are also necessary for In order to achieve maximum power generation, it is optimum power systems. However, there is a degree of necessary to control the working point of photovoltaic uncertainty due to the high electrical consumption and the panels. For this regulation procedure to be successful, Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 405 there are two primary components that are required: an technology that generates and verifies strategies for multi- MPPT algorithm that serves as the reference for the MPP, dimensional scheduling agents using deep reinforcement and a voltage controller that guarantees a steady learning. In addition to providing solid technical support functioning at the MPP. for power grid operation, that research may enhance the One of the most significant benefits of adopting MPC is accuracy and effectiveness of section dispatching that it has the ability to simplify the process of developing decision-making, optimize the section control strategy a variety of controllers while also working to continually, and more. accommodate system limits within its formulation. In According to [23], when a problem occurs, the generator addition, the introduction of KNN-SMOTE-GCN as a network determines the unit output plan using the user-friendly optimization approach is suggested in this combined wind, light, and electrical demand data from a study as a means of enhancing the cost function of the northwest area of China. A specialized system generation MPC controller. fault recovery strategy is developed for that grid fault using This research work is structured as follows: Section 2 data on actual power load while actual renewable energy describes the research articles that were relevant to the output before and after the fault. The strategy aims to framework that was developed; Section 3 describes the minimize the cost of system power generation while problem statements; Section 4 explains the proposed considering the constraints of secure operation of the hybrid framework; Section 5 analyzes the results of the system. It turns out that the expert system's fault recovery methodology that was proposed; and Section 6 describes method is much different from the one used in the early the research conclusion. stages of training, and that the error value is very high. After a generative adversarial network is fully trained, it 2 Related work can approach the fault recovery expert system with an auxiliary decision-making scheme that works in different Experts from [21] grid operators use neuro-fuzzy logic for situations with different loads and new energy outputs, and dynamic reactive power adjustment. The energy storage it can keep the error between the two schemes to less than system may also be effectively managed using that logic. 5%. Results from studies examining power grid fault After that, SP UPQC was used to improve the electrical recovery strategies using models of generative adversarial distribution networks' power quality. It employs a hybrid imitation learning networks demonstrate the force control design that incorporates shunt and series compensators to system's capacity for autonomous and secure fault address voltage drops, harmonics, and imbalance, among recovery. other power quality concerns. Afterwards, maximum With the goal of conducting real-time tracking on the power point tracking was used to derive the greatest operating state of the power grid, eliminating potential amount of power from the electricity network. Controller safety hazards, and upgrading the power grid from for Model Predictive Control to ascertain the system's "manual analysis" scheduling to "intelligent analysis" overall stability and performance. In addition, the model scheduling, the authors of [24] propose an integrated was tested on the MATLAB platform and its reliability was framework to aid decision-making of online accident assessed by measuring voltage variation, grid current, processing using large power grids. The study covers five reactive power fluctuations, and Total Harmonic aspects: integrated information support system, aid Distortion. decision-making afterwards, risk perception in grow, Enhancing the effectiveness of section control of large online fault diagnosis, and visual display. power grid, altering the traditional experience-led The writers of the cited work, [25] an online trend analysis dispatching mode, and improving the intrinsic safety level technology with a functioning mode arrangement for large of the power grid are all goals of the experimental team in power grids is suggested, drawing on references to the [22]. They study intelligent section auxiliary decision- growth of intelligent dispatching support systems and their making algorithms in depth and build a new intelligent dynamic security assessment technologies, in light of the dispatching structure framework of the power grid using growing importance of grid dispatching operations in deep learning and simulation environments. To build a understanding future state security changes. Estimated more realistic simulation of the power grid's dynamic power flow in the future is based on the power grid's characteristics under varied operating circumstances, an present operating mode, online stability conclusion, data environment that is suited for the upcoming AC-DC hybrid from fresh energy and load forecasts, dispatch scheduling, big power grid is first built. Secondly, a scheduling agent and dispatch operation adjustment. The auxiliary decision- that takes into account the power grid's characteristics and making approach for control allows for fast assessment of the dispatcher's behavior is researched using the power future security situations and trends. With the use of that grid's historical operation data and the dispatcher's real technology, the power grids of Heilongjiang and Central control data. Finally, to address the issues of poor China have been able to transition from empirical to regulation speed, complex regulation decision-making, intelligent control, and precontrol techniques for and inadequate technical support ability, authors study the 406 Informatica 49 (2025) 403–418 K. Zhang et al. complicated power grid dispatching operations have power grid static stability was made possible with the received technological support. introduction of technology for knowledge graph The tiny sensor sample unit, energy metering device, automation engines. To demonstrate the efficacy of the communication unit, protection control device, suggested approach, an example using a real-world performance evaluation unit, etc. were all combined by the electricity system is provided. Regulatory and control experimenters of [26]. In conjunction with the transformer, operators may benefit from the study's findings by better keeping its original dimensions and construction. It is understanding the current state of operations and making possible to analyze the measured data locally, allowing for more informed decisions about the power system. an intelligent and transparent observation of the Explorationally, it may be useful for enhancing the performance indicators of the transformer. building of online intelligent active security defense Simultaneously, it can accomplish intelligent monitoring, structures on big power grids. reduce energy consumption and save energy, and aid in the creation of new power systems without uploading a 3 PV generated system integrated to mountain of normal and abnormal data. weak grid Using deep reinforcement learning, the authors of [27] provides an auxiliary control method for large-scale power The basic architecture of a three-phase grid-connected grid segments. An intelligent agent for power grid section double-stage solar power plant is shown in Figure 1. The control is built using the Deep Deterministic Policy integration of solar electricity into the electrical grid is Gradient algorithm. That agent provides real-time control achieved via the employment of this sort of technology, methods in complex power grid settings, taking into which guarantees effective power conversion and account both the safety and economics of power grid maintains grid stability. To generate and transmit operations. That justifies the proposal of a two-stage electricity from the solar PV array to the unreliable utility optimization approach that takes sensitivity into account. grid, the system relies on a number of moving parts, all of When operators are unable to remove the section which contribute in different ways. The PV array generates restriction via real-time control, they offer them with the the majority of the system's renewable electricity. It relies optimum market intervention strategy. At last, the efficacy on a network of solar panels to generate DC power from of market intervention plans and real-time control sunlight. The PV array's power production is directly mechanisms are tested via case studies. The methodology related to the amount of solar irradiation and temperature presented in that study improves the system's economy by that it can operate at. Maximizing the conversion of solar lowering the clearing price through an average of 1.2% energy into grid-ready alternating current electricity is the while the average adjustment amount through 37.6% under system's primary objective. To get the most power out of various section limits resulting from power generation the solar PV system, the DC-DC boost converter is an components participating in the market, as compared to the absolute must. For maximum efficiency in power current rules. conversion, it raises the DC voltage produced by the PV The authors of [28] looking at the power grid from a array until it is equal to or greater than the DC-link voltage. knowledge graph perspective, researchers were able to In order to keep the PV array running at its optimum power develop a functional framework for an intelligent evaluator point no matter what happens to the weather or irradiance, that could assess static stability, make decisions based on the boost converter works using a MPPT algorithm. An that evaluation, and be an all-around smart algorithm. That MPPT method known as Perturb along with Observe is evaluator took into account the stability state evaluation used to optimize the amount of energy harvested by the PV index while optimization control strategy data from array. One of the most popular ways to increase the output various power grid operation scenarios. The of solar PV systems is by using this algorithm. implementation of a visual evaluation tool for large-scale Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 407 DC bus Inverter DC Power grid DC ESS Control Algorithm Virtual Inertia Figure 1: Auxiliary power control in large power grid It works by monitoring the change in power output and as well as temperature under typical conditions: 𝐼𝑠_ref , 𝐺ref making adjustments to the operating voltage of the PV and 𝑇ref . The current changes with irradiation and array at regular intervals. When the power goes up, the temperature change, as shown in Eq. (1); yet, the 𝐼sat adjustment stays the same; when it goes down, it goes in fluctuation in temperature is the only determinant of the other way. The technology is able to maintain optimal current. In accordance with Kirchhoff's law, the PV panel's performance regardless of environmental changes because output current ( 𝑣𝑝𝑣 ) is given through: of this iterative procedure that continually monitors the PV 𝐼𝑝𝑣 = 𝐼𝑠 − 𝐼𝑑 − 𝐼𝑠ℎ𝑢 (2) array's MPP. By responding in real-time to variations in Yes, it means we can: temperature and irradiance, the P&O MPPT algorithm 𝑞(𝑣𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟)) keeps the boost converter operating at the ideal voltage 𝐼𝑝𝑣 = 𝐼𝑠 − 𝐼𝑠𝑎𝑡 [𝑒𝑥𝑝⁡ ( ) − 1] − 𝑛𝑘𝑇 input from the PV array. In areas where the amount of 𝑉𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟) sunshine varies throughout the day, the efficiency of the (3) 𝑅𝑠ℎ𝑢 solar PV system depends on this capability to monitor the With: MPP under changing circumstances. 𝑞(𝑣𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟)) 𝐼𝑑 = 𝐼𝑠𝑎𝑡 [𝑒𝑥𝑝⁡ ( ) − 1] (4) 𝑛𝑘𝑇 3.1 PV array modelling And: 𝑉 To enhance the voltage or current level, the PV panel uses 𝐼 𝑝𝑣+(𝐼𝑝𝑣∗𝑅 𝑠ℎ𝑢 = 𝑆𝑒𝑟) (5) 𝑅𝑠ℎ𝑢 numerous modules linked in series or parallel, accordingly. A current source, two types of resistance (series and shunt), 3.2 DC-DC converter with an antiparallel diode make up the equivalent circuit of a PV cell, as shown in Figure 2. The current source ( 𝐼𝑠 ) is Here is one way to express the transfer function of the expressed by de following equation: boost converter: 𝐺 𝐼𝑠 = ( ) (𝐼 1 𝐺 𝑠_𝑟𝑒𝑓 + 𝐾𝑠𝑐 ⋅ (𝑇 − 𝑇𝑟𝑒𝑓)) (1) 𝑣 𝑟𝑒𝑓 𝑚 = 𝑣 1−𝐷 𝑝𝑣 (6) where irradiance (G) and ambient temperature (T) are the The relationship between the average currents flowing into two variables. The coefficient of short-circuiting current is and out of an electrical device may be expressed as denoted as 𝐾𝑠𝑐 . The following are the current, irradiation, follows: 408 Informatica 49 (2025) 403–418 K. Zhang et al. 1 𝐼𝑝𝑣 = 𝐼 (7 where ?̂? stands for the value of the normalized property. 1−𝐷 𝑑𝑐 ) The function min(x) finds the lowest value in the values of The equation for the DC bus may be written as: 𝑑𝑣 the attributes while max(x) finds the highest value. 𝑑𝑐 1 = (𝐼 𝐼 𝑑 𝐶 𝑑𝑐 − 𝑖𝑛𝑣) (8) 𝑡 3.5 Missing value completion 3.3 DC-AC inverter One approach that uses nearby data points is KNN (K- It is possible to transform DC electricity into AC voltage Nearest Neighbors) interpolation. The goal of this with the frequency and amplitude of our choice thanks to technique is to estimate the target point's value by the inverter, the adaptation step. The inverter control comparing it to the values of the K data points that are makes it possible to inject higher-quality currents and known to be the closest to it. For KNN interpolation, the powers (P,Q) into the grid. The input/output inverter fundamental procedures are these: voltage relationship is defined as: Choose the K-value: Choose the optimal K-size by determining its value, often using cross-validation. 𝑣𝑎𝑛 = (𝑆1 − 𝑆2)𝑣𝑑𝑐 Determine Distance: Find the total distance in geometric {𝑣𝑏𝑛 = (𝑆2 − 𝑆3)𝑣𝑑𝑐 units between the current location and all other known 𝑣𝑐𝑛 = (𝑆3 − 𝑆1)𝑣𝑑𝑐 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(9) locations. This is the formula for the distance in geometric 𝑣𝑎 𝑣 2 −1 −1 𝑆1 units: 𝑣 𝑑𝑐 [ 𝑏] = [−1 2 −1] [𝑆2] 𝑣 3 𝑐 −1 −1 2 𝑆3 𝑙(𝑥 𝑀 2 𝑙 , 𝑥𝑓) = √∑𝑚=1     (𝑥𝑙,𝑚 − 𝑥𝑙,𝑚) (14) where 𝑣𝑑𝑐 is the DC voltage, 𝑣𝑖𝑛(𝑖 = 𝑎, 𝑏, 𝑐) and 𝑆𝑗(𝑗 = 1,2,3) consist of alternating current voltages and signals where 𝑥𝑖 and 𝑥𝑗 constitute data points, with M serving as indicating the current state of the switches. Here is the the data dimension. equation for grid voltages: How to Determine the K-Nearest Neighbours: choose the K known points of data that are most closely located to the 𝑣𝑔𝑎 𝑣𝑎 𝐼𝑔𝑎 𝐼𝑔𝑎 desired location. 𝑣 𝑑 [ 𝑔𝑏] = [𝑣𝑏] + 𝑅 [𝐼𝑔𝑏] + 𝐿 [𝐼𝑔𝑏] (10) Weighted averaging: Give each of your K neighbors a 𝑣 𝑑𝑡 𝑔𝑐 𝑣𝑐 𝐼𝑔𝑐 𝐼𝑔𝑐 weight that is inversely proportionate to their distance from you. The formula for the weighted average The goal of studying and realizing the decoupling among interpolation for the K closest neighbors is the active (P) with reactive (Q) capabilities was to regulate ∑𝐾 ?̂? = 𝑘=1   𝑤𝑘𝑦𝑘 (15) ∑𝐾 them independently. If we want a fair system, we can just 𝑘=1   𝑤𝑘 put down the powers 𝑃𝑔 and 𝑄𝑔 as follows: where 𝑦𝑘 𝑤𝑘 is the weight that defines the distance from the location to be interpolated, and is often specified as the 3 opposite proportion of the distance, and is the value of the 𝑃𝑔 = (𝑣 + 2 𝑔𝑑𝐼𝑔𝑑 𝑣𝑔𝑞𝐼𝑔𝑞) { 3 (11) k-th neighbour: 𝑄𝑔 = (v I 2 gq gd − vgdIgq) 1 Indeed, we can write: 𝑤𝑘 = (16) 𝑑𝑘 3 Pg = v 2 gdIgd 3.6 Deal with unbalanced data { 3 (12) Qg = − v ) 2 gdIgq In classification tasks, when minority samples are oversampled, an interpolation approach called Synthetic where vgdq as well as Igdq, which stands for grid current. Minority Oversampling Technique (SMOTE) is used to address imbalanced datasets. By augmenting the dataset's 3.4 Normalization diversity via the synthesis of fresh minority samples, SMOTE boosts the classifier's performance. The detailed The data were standardized to ensure that the model's procedures are these: accuracy was unaffected by dimensions. The min-max scaling approach was used for normalization in this Pick a Representative Sample: Pick a representative research. sample at random from the minority group. Determine the sample's k closest neighbors by using a 𝑥−𝑚𝑖𝑛(𝑥) ?̂? = (13) distance measure. 𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥) Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 409 Create a fresh sample by using the following formula to shown by the increased energy wasted at the grid system. synthesize a neighbor from among these k neighbors at As a result, the grid power plant facilitates degradation random: cost in order to fulfill the particular demand at the grid system's discharge point. Equation (6) also allows for the new_sample = 𝑥1 + 𝜆(𝑥1 − 𝑥1) (17) modeling of the deterioration cost using a quadratic function. 3.7 Maximum power point tracking (MPPT) d′j[c ′ j(n)] = δ ′ 2 jcj(n) + μ ′ jcj(n) + λi (20) The DC-DC boost converter controls the output of PV cells, which is one of its dual functions. As a result, MPPT Where, δj, μj and λi represents the degradation cost is simplified and the output voltage is reliably controlled. function and is represented by operational cost parameters This study combines a DC-to-DC converter with the d′j ∣ c ′ j(n)]. Because of the limited integration between the widely known MPPT algorithm to optimize power operating cost parameters and the grid system's extraction from PV panels. The operating point must be discharging rate, the constant value here must be dynamically changed to the Maximum Power Point in associated with the grid system's discharge rate. However, order to accommodate changing weather conditions. The using eqn. (21) in the following context, the cost function's low cost and user-friendliness of the MPC algorithm led to simplicity is related: its selection for MPPT. The MPC algorithm tracks the PV array's current and voltage down to the microsecond in order to foretell how a voltage modification will play out. f ′[c′j (n), p′(n)] = d′j[c ′(n)] + o′j − U ′ j ∣ Cj(n)] (21) This approach may be more resource intensive, but it can adapt to new conditions very fast. A little amount of energy Where, o′j is the formula for the lumped cost. Here, power can be saved in that gadget for use in seconds, and its is supplied by the grid at a net cost rate according to the performance is assessed by comparing the discharged and electricity pricing unit with either an off-peak or peak-time charged powers of the device. At all times, the following tariff. Not to mention that the fixed price unit diverges equation (4) describes how the charging and discharge from the original cost function. rates of the constraints are combined with the battery efficiency. 3.9 Design of MPC 1 W′ ess(n) = W′ ess(n − 1) + αcp ′ cΔn − p′ Δn α d An extensive evaluation of the reference grid currents is d W′ carried out, taking into consideration various factors such ess ≤ W′ ess(n) ≤ W′ ess ⋅ max { p′c(n) ≤ p′ as the presence of nonlinear loads at the Point of Common c⋅max p′d ≤ p′ Coupling, regulation of the DC link voltage, and dynamic d⋅max variations in PV power. This reference current is fed into the MPC controller, which then calculates the quantity of (18) switching pulses required for optimum functioning. Considering the dynamic changes in PV power, ensuring Where, W′ ess The energy storage limits are represented by p′ stable control of the DC link voltage, while tolerating c, the charging power is p′d, and the battery efficiency nonlinear loads at the PCC allow the system to efficiently while charging and discharging is αc. supply reference grid currents that sustain efficient operation. The MPC controller enhances the system's 3.8 Cost function general efficiency and stability by using these currents to Using three crucial factors, including 1) the energy and identify the optimum switching pulses. As a result, the following equations (8), (9), (10), (11), define the key discharging rate of each grid system, 2) the degradation cost of the battery and the discharge rate, and 3) the function of the charging station's net cost function in regards to multi-objective optimization problems. operation cost of other activities such service chargers and cable wear, we need to build the net cost function of the j^"th" grid system. First, use the following equation (5) to minCj(n) = ∑j∈T(n)     sj ∣ Cj(n) + G(n)] express the grid system discharge rate. Cj(n) = Cj(n)∀j ≠ i ∈ T(n) j j (22) Cmin ≤ Cj(n) ≤ Cmin∀j ∈ T(n) Uj⌊C ′ j(n)⌋ = p′(n)C′j(n) (19) j SOCmin ≤ SOCj(n) ≤ 100%∀j ∈ T(n) where⁡p′(n) represents the unit pricing with the grid Where, Cj(n) represents the cost function of a grid system aggregator at time n, and c′j(n stands for the discharge rate charging station, and is depicted as the minimization of the of each network grid at that specific time n. In this case, net cost function for every grid system charging stations, the degree of the generated aggregator grid system is 410 Informatica 49 (2025) 403–418 K. Zhang et al. ∑j∈T(n)   appears as the energy cost function for the jth user Where the Fourier transform is denoted by F. Equation (1) over the time interval t. Additionally, the suggested KNN- may be simplified to describe the convolution process f*x SMOTE-GCN model's flowchart is shown in figure 2. in the spatial domain through the use the inverse Fourier transform 𝐹−1 to both sides. 𝑓 ∗ 𝑥⁡= 𝐹−1(𝐹(𝑓) ⊙ 𝐹(𝑥)) Proposed KNN-SMOTE-GCN with MPPT algorithm (24) ⁡= 𝑈((𝑈𝑇𝑓) ⊙ (𝑈𝑇𝑥)) Where U stands for the Fourier basis while ⊙ means Power loss multiplication element-wise. The goal of the GCN was to Voltage deviation provide a way for neural networks to use the association graph. The GCN does this by obtaining the Fourier basis Reactive power fluctuation from the graph's Laplacian matrix. What if 𝐿𝑚 = 𝐷 − 𝐴 is a graph's Laplacian matrix. One way to standardize it is as Verifcation THD 𝐿𝑚 = 𝐼𝑁 − 𝐷1/2𝐴𝐷1/2 ∈ ℝ𝑁×𝑁, where 𝐼𝑁 is the Figure 3: Typical Model Diagram for KNN-SMOTE- neighboring matrix and denotes a unit matrix. For the GCN degree matrix, D stands for 𝐷𝑢 ∈ ∑ ⁡ , 𝐴𝑢𝑓. Then, using the eigenvalue decomposition, one may derive the Fourier 3.10 Graph convolutional network (GCN) basis, U, and the eigenvalue matrix ∧. Building the association graph: A collection of nodes V 𝑈 and edges E may be characterized as a graph G(V, E). The ∧𝑈 𝑇 = 𝐿𝑚 , 𝜆 = 𝑑𝑖𝑎𝑔([𝜆0, … , 𝜆𝑁−1]) (25) connection between individual nodes 𝑣𝑓 and 𝑣𝑓 is signified U is a set of orthogonal matrices satisfying the Fourier by an edge 𝑒𝑓 ∈ 𝐸. In order to make it easier to aggregate 𝑔 transform's mathematical constraints, based on the information in the graph framework, an adjacency matrix Laplacian matrix's properties. The diagonal matrix, 𝐴 is built 𝐴[𝑖, 𝑗] = 1 if the edge 𝑒𝑓𝑗 exists, besides 𝐴[𝑖, 𝑗] = denoted as 𝑔𝑒 = 𝑑𝑖𝑎𝑔(𝑈𝑇𝑓). Next, we may simplify 𝟎 then. Equation (2) by following these steps: The convolution theorem states that, in terms of forward propagation of the GCN, the Fourier transform of a 𝑓 ∗ 𝑥 = 𝑈((𝑈𝑇𝑓) ⊙ (𝑈𝑇𝑥)) = 𝑈𝑔𝜚𝑈 𝑇𝑥 (26) convolution between two signals is the same as the pointwise multiplication of their individual Fourier Graphic convolution relies heavily on the eigenvalue transforms. Let 𝑓 ∗ 𝑥 Introduce the spatial domain decomposition of the Laplacian matrix. There is a convolution operation, which 𝑥 = {𝑥1, 𝑥2, … , 𝑥𝑛} ∈ 𝑅𝑛 quadratic relationship between the total amount of nodes stands for a dataset that has n pieces of data and 𝑓 = and the computing complexity when the graph size is big. {𝑓1, 𝑓2, … , 𝑓𝑛} are the neural network's trainable Graph convolution methods are mostly useful for small- parameters. Using the Fourier transform, this procedure scale networks due to the high cost of eigenvalue may be converted to the frequency domain. decomposition. Figure 3 showed in GCN model. In order to tackle this problem, Krizhevsky et al. suggested a 𝐹(𝑓 ∗ 𝑥) = 𝐹(𝑓) ⋅ 𝐹(𝑥) (23) method for approximating g_ş via Chebyshev polynomials T_k, that may be stated in the following way: Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 411 Offline training Online application Feature selection and extraction Data loading Distribution network real-time measurement data Historical data of Original features and distribution network labels Random matrix Extract input features strategy coder-decoder Sample dataset Data preprocessing Training set Test set Reactive power optimization model based on GCN GCN model Determine the optimal Adjust training parameters structure of GCN Real-time optimization strategy Train GCN model Test and verify Evaluate performance Figure 3: Proposed GCN model 𝑔𝜚(𝛬) = ∑𝑘−1 𝑘=0   𝜃𝑘𝑇𝑘(?̃?) (27) neighboring the central node. This leads us to the following simplification of (6): in where θ stands for the Chebyshev coefficient while 𝑇𝑘 for the k-th element of the Chebyshev polynomial. To be 2𝐿𝑚 𝑓 ∗ 𝑥 ≈ 𝜃0𝑥 + 𝜃1 ( − 𝐼 ) more precise, it is 𝑇𝑘(𝑥) = 2𝑥𝑇𝑘−1(𝑥) − 𝜆 𝑁 𝑥 𝑚𝑎𝑥 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(29) 𝑇𝑘−2(𝑥), 𝑇𝑜(𝑥) = 1, and 𝑇1(𝑥) = 1.∧̃= 2 ∧/𝜆max − 𝐼 1 1 𝑁 ⁡≈ 𝜃0𝑥 − 𝜃 − 1 (𝐷 − 2𝐴𝐷 2) 𝑥 contains the eigenvalues of scale in a diagonal matrix. Then, we may write (4) as: 𝑓 ∗ 𝑥 = 𝑈𝑔 ∘ 𝑈𝑇𝑥 ≈ ∑𝑘−1 𝑘=0   𝜃𝑘𝑇𝑘(𝑈 ∧̃ 𝑈𝑇)𝑥 = By setting the parameter 𝜃 = 𝜃0 = −𝜃1, (7) more ∑𝑘−1 information about: 𝑘=0   𝜃𝑘𝑇𝑘(𝐿?̃?)𝑥 (28) 1 1 𝑓 ∗ 𝑥 ≈ 𝜃0 (𝐼𝑁 + 𝐷− 2𝐴𝐷− 2) 𝑥 (30) where ?̃? = 2𝐿/𝜆max − 𝐼𝑁 and 𝜆max stand for the highest eigenvalue of the Laplacian matrix. A more simplified Additionally, the settings allow the network to be trained version of the Chebyshev polynomials was developed by using backpropagation " 𝑊,𝐷c often undergo Xiao et al. 𝜆max = 0 and 𝑘 = 2, that is, the data is only renormalization via ?̃? = 𝑊 + 𝐼𝑁 and ?̃?𝑡𝑙 = ∑𝑓   ?̃?𝑡𝑙, that aggregated from nodes that are in the first order is, in turn. Lastly, the spectral domain convolution operation is defined as: 412 Informatica 49 (2025) 403–418 K. Zhang et al. 1 1 1 1 𝑓 ∗ 𝑥 ≈ 𝜃 (𝐼𝑁 + 𝐷− 2𝐴𝐷− − 2) 𝑥 = 𝜃 (?̃? 2?̃??̃?− 2) 𝑥 i r radiation, all while maintaining a constant temperature of 25 degrees Celsius for the photovoltaic array. Standard test (31) conditions (STC) were used in order to determine the output of the solar panels while the temperature was set to 4 Result and discussion 25 degrees Celsius. The Simulink model of the photovoltaic (PV) system, which illustrates the linked 4.1 Configuration of PV system components and the interactions between them. Additionally, the mathematical model that is used to This study's suggested PV system is composed of a great explain the solar panel's electrical characteristics is deal of different components. A first step involves the use included into the PV module block, which serves as a of a solar panel to convert solar energy into electrical representation of the solar panel. It takes into account the energy. Through the use of a boost converter, the output input solar radiation as well as temperature in order to voltage of the array is thus increased while simultaneously generate the matching current–voltage (I–V) and power– maintaining the appropriate voltage level. A DC–AC voltage (P–V) curves. It is the responsibility of the booster converter is provided in order to maintain a power factor converter block to monitor the drop in the output voltage of one while converting DC to AC. In addition, a of the PV array. The control algorithm that is used to guide transformer is used in order to raise the output voltage to the functioning of the boost converter via the utilization of the amount that is necessary for a common connection. In the MPPT approach is included inside it. The Maximum order to optimize power extraction, maintain a power Power Point Tracking (MPPT) algorithm continually factor of one, and modify junction voltage, the control analyzes and adjusts the PV system's operating point in group of the system is comprised of a number of different order to achieve maximum power extraction. Through the strategies that have gone through extensive study. In this use of the DC–AC inverter block, the DC power generated part, the primary issues that will be discussed are the by the PV array is converted into AC energy that is modeling of a solar power system and the performance of compatible with the grid. Furthermore, a power factor of the system. It begins by providing an overview of the one is assured, in addition to the maintenance of the quality characteristics of the PV module. It covers how the and interoperability of the AC power that is produced with photovoltaic module reacts to variations in temperature the utility grid. Transformers and grid connections are two and the amount of sunlight that it receives. Another examples of extra model construction parts that might be component that is included is the boost converter, which is used to depict the photographvoltaic (PV) system as a responsible for monitoring the reduction in the output whole as well as its connection to the conventional voltage of the PV array. An exhaustive amount of electrical grid. information is provided on the operation of the boost Results from a two-stage PV system with a three-level converter as well as its control mechanisms, which include inverter and a DC/DC converter that is linked to a weak the MPPT approach. With the help of the MPPT grid are shown below. Results show that the control technology, the photovoltaic (PV) system is able to run at method and inverter configuration were executed when the its maximum power output regardless of the changing system was evaluated under different dynamic situations. environmental conditions. A DC–AC inverter is also The PV array, DC/DC converter, and three-level inverter discussed in this section. This device converts DC energy that interface with the grid is all shown Table 1, which is generated by a photovoltaic array into AC power for grid the system schematic. In Table 1 we see the system's integration. While discussing the operation and parameters. Grid voltage sag, Grid voltage swell, management of the DC–AC converter, a power factor of irradiance change, and a comparison between two-levels one is maintained throughout the discussion. Within the with three-level inverters are among the operational context of this section's treatment of the modeling, situations that the system is evaluated under. Voltage on performance, and control elements of the PV system, the the grid, current via the grid, current through the VSC, PV module, boost converter, MPPT method, and DC–AC current through the PV array, and the weighted positive inverter are all dissected in great detail. sequence are the critical metrics studied. The stability, power quality, as well as transient responsiveness of the 4.2 Simulation system under dynamic situations may be understood by examining these factors. During this section, the performance of the system was examined at a number of different levels of direct sunlight Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 413 Table 1: System parameters Parameters Value PV Array 55 Power Rating 35 kW Maximum Power (W) 211.802 Short-circuit current Isc (A) 9.03 Voltage at maximum power point Vmp (V) 27.9 Cells per module (Ncell) 70 Open circuit voltage Voc (V) 39.17 Shunt resistance Rsh (ohms) 312.6345 Temperature coefficient of Voc (%/deg.C) -0.36044 Temperature coefficient of Isc (%/deg.C) 0.112 Parallel strings 7 Series-connected modules per string 23 Boost Converter Inductor 𝐿cc (𝑚𝐻) 4 Capacitor 𝐶𝑎𝑐(𝜇 𝐹) 100 Voltage Source Converter Interfacing Inductor 𝐿𝑓(𝑚𝐻) 75 𝑅𝐶𝑅𝑓(𝛺) 0.4 𝑅𝐶𝐶𝑓(𝜇 𝐹) 100 Grid Voltage and Frequency, (V) and (Hz) 433, 70 DC link capacitor 2200 µF PV array current Ipv 3.46 A Inductance L 2 mH Resistor R 0.1 Ω PV array voltage Vdc 540 V Grid Frequency 50 Hz Grid Voltage rms 120 V The experimental environment and the recommended order to combine the objective function, constraints, and technique's effectiveness are described in this section. suitable optimization solvers, such as linear programming. Several metrics, including power loss, grid current, voltage or mixed-integer planning. After this, the control deviation, along with grid voltage, are used to assess the algorithms for the first, second, and third control levels are system's performance via the use of the innovative KNN- designed and implemented inside a hierarchical manage SMOTE-GCN algorithm. By redistributing loads and structure. This is done in order to further govern the arranging generating units, KNN-SMOTE- system. For the purpose of testing these control algorithms and verifying that they are stable and effective, the system GCN systems improve the efficiency of power grids. To is then simulated under a variety of market condition optimize power quality, KNN-SMOTE-GCN controllers scenarios. With the last step, the control algorithms are regulate the grid's reactive power, voltage, and harmonic implemented for real-time operation. This means that the correction. The system is constantly adjusting the control system constantly checks and changes the distributed settings using fuzzy rules with real-time data to maximize power resources (DPRs) based on the data that is being power quality. collected in real time. Implementation Steps There are various essential phases involved in the 4.2 Comparative analysis implementation process. Before anything else, it is necessary to gather historical data on demand, generation, Table 2 illustrates the existing techniques with their and market pricing. Additionally, forecasting models description. should be used in order to make predictions about future demand, renewable generation, and market prices. In the next step, the optimization problem is stated and used in 414 Informatica 49 (2025) 403–418 K. Zhang et al. Table 2: Comparison techniques When power grid voltages deviate from their nominal or Technique Description ideal values, this is known as voltage deviation. Nominal Active Filters To reduce harmonic voltage standards could vary by region while kind of distortion and enhance electrical system, although they often range from 230V to power quality, active filters 400V and beyond. Voltage must be maintained constant are a useful tool. and under control for grid-connected electrical gadgets and Wavelet Neural • These may be machinery to work reliably. A number of factors contribute Networks (WNN) used in the creation of to voltage fluctuations' potential effects on the controllers for auxiliary performance and longevity of electrical devices. When the damping. real voltage exceeds the nominal voltage, overvoltage Artificial Neural • Power systems occurs. As seen in figure 5, the term "under voltage" is Networks may have their dynamic used when the real voltage is less than the nominal voltage. (ANNs): responsiveness improved Active filter 70 with the help of ANNs. WNN Virtual The grid may benefit from 60 ANNs Synchronous the inertia and damping 50 Generator (VSG) provided by VSGs. Deep • Damping 40 Deterministic controllers may be 30 Policy Gradient designed with the help of (DDPG) DDPG. 20 10 Power loss occurs in a grid system when electrical energy, in the process of transmission and distribution, dissipates 0 as heat. Transmission or distribution losses are other 0 1 2 3 4 5 6 7 8 9 10 names for this occurrence. A lower current density per unit Time (s) of power is a common result of increasing voltage. When the voltage or current in a three-phase circuit is not Figure 5: Voltage deviation balanced between the phases. Voltage or load imbalances cause an uneven distribution of electricity, which in turn The efficiency, reliability, and performance of an electrical causes losses. Low power factor happens when the network are all impacted by fluctuations in reactive power voltage-current relationship is not ideal. Figure 4 show that in a grid system. Maintaining safe voltage levels and when the power factor is low, the reactive power increases, powering inductive loads both need reactive power. There leading to higher losses in the transmission and are a lot of potential sources of reactive power fluctuations, distribution systems. which might lead to undesirable outcomes. Reactive power is a component of electrical power that does nothing useful while it sways between the generator and the consumer. 8 "Reactive volt-amperes" is the standard measuring unit. 7 When inductive loads are included or excluded, changes to the load profile may cause variations in reactive power. As 6 seen in figure 6, fluctuations in generator output, 5 Neural especially in synchronous generators, may affect reactive network power. 4 3 KNN- SMOTE- 2 GCN 1 0 0 1 2 3 4 5 6 7 8 9 10 Time (s) Figure 4: Power loss and time analysis Power loss (Watts) Voltage deviation (volts) Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 415 Figure 6: Reactive power fluctuation Active filter 700 WNN ANNs A grid system experiences THD when harmonic 600 components are present in the voltage or current waveform in relation to the fundamental frequency. In power 500 systems, the fundamental frequency is typically 50 or 60 Hz, and harmonics are multiples of that. Harmonics may 400 be caused by a variety of sources, including non-linear loads and switching operations. 300 200 100 0 0 1 2 3 4 5 6 7 8 9 10 Time (s) 0,18 Fundamental (50Hz) V THD= 0.33% 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Harmonic order Figure 7: THD analysis We get the THD by Figure 7 by dividing the root mean reaches a high enough voltage to meet the load's square (RMS) value of the harmonic content by the RMS requirements. Whenever power is needed, it is transferred value of the fundamental frequency. Typically, total from the stored energy in the inductor to the load. The duty harmonic distortion is expressed as a proportion of the cycle, or gate pulse input, is responsible for carrying out fundamental frequency. Additionally, figure 6 shows THD the whole operation. It is vital to manage the duty cycle. in action. The performance comparison yielded better After then, it's a matter of getting the most electricity out findings from the proposed work's assessment of of the PV array in any weather. To maximize the voltage performance. The comparative assessment has shown that and power output of a photovoltaic array, irradiance and the proposed model has successfully minimized the THD temperature are the two most critical elements. Therefore, as much as possible. Consequently, PV systems that are it is necessary to monitor the maximum power stage, linked to the grid may use it. In this study, we maximize which is near the PV array's maximum power. The MPPT the produced output power of the PV panel by using a DC- was created to provide a standardized, efficient tracking DC converter using MPPT. Step one involves regulating system. Prior research has explored a wide variety of the boost converter's duty cycle. It is necessary to MPPT methods for peak power tracking. Some MPPT gradually raise the DC voltage of the PV array until it methods, like P&O, which uses step-size control as well as Reactive power (Var) Mag (% of fundamental) 416 Informatica 49 (2025) 403–418 K. Zhang et al. oscillates around steady state in response to dynamically were very promising for the proposed system, obtained changing environmental variables, have been shown to after an exhaustive series of simulations meticulously have significant drawbacks, however. The incremental executed on the MATLAB/SIMULINK platform. The conductance method is more complicated and expensive predictive control systems utilized demonstrated [11], [19], but it responds quickly to changing conditions. remarkable robustness in the face of dynamic variations in The controllers utilized in this investigation yielded solar radiation levels, allowing for a constant energy promising outcomes since they were based on production profile relative to the energy production mathematical principles. There has been an astonishing profile. Additionally, the suggested system's adaptability level of consistency throughout the whole energy output, to rapidly changing weather conditions ensures continuous leading to a steady supply of 27 MW of pumped electricity and dependable energy generation, thus establishing its to the grid. This is true even if the amount of sunlight status as a robust and resilient energy solution alternative. reaching Earth has changed during the course of the day. As we navigate into the future of photovoltaic (PV) Many researchers and professionals in the field have taken systems, it is wise to direct research efforts on an interest in photovoltaic (PV) systems. The incremental investigating and perfecting innovative control techniques. conductance + integral regulator strategy is one of the The overarching goal is to make the system far more methods proposed for training the MPPT controller; it is efficient and productive, with an unwavering commitment referenced. The goal of developing this method was to to producing even more remarkable and dependable ensure that the photovoltaic (PV) system operates at its results. Furthermore, the research plan includes a maximum power point in all weather conditions, thereby comprehensive comparison study, an exhaustive endeavor optimizing its performance. Also, a Proportional-Integral aimed at methodically contrasting the effectiveness of (PI) controller was recommended as a method for these novel control methods with the performance metrics controlling the DC-AC converter in the study. The of the current systems. The whole capability of conversion of direct current (DC) from solar panels to sophisticated control techniques is expected to be exposed alternating current (AC) for grid integration relies on this by using this methodical approach. By streamlining grid converter. It should be noted that various control connectivity, these methods are poised to change the techniques become unstable when exposed to large course of renewable energy generation. Ultimately, this fluctuations in solar irradiation. Keeping energy output research adds to the growing body of knowledge on steady is made more difficult by the fact that solar radiation renewable energy sources by introducing a new is inherently unpredictable, especially when clouds are photovoltaic (PV) system and demonstrating the system's present or when the sun's beams are changing. A change in inherent capacity to address major energy and the amount of power supplied into the system could be environmental issues. This contribution demonstrates the discernible if sun irradiation decreases. Concerns about the potential of state-of-the-art control systems and practical applications of PV systems, particularly those optimization methodologies, building a foundation for a connected to the electricity grid, are highlighted by this future that is sustainable, energy-efficient, and kind to the phenomenon. Although mathematically-based controllers environment. have performed well in conditions of relatively constant solar radiation, they may require additional tuning to 5 Conclusion account for the challenges posed by sudden and unexpected changes in solar radiation. These findings are In order to improve grid-connected PV systems, this study important because they show how important it is to have presented a new KNN-SMOTE-GCN method. In this case, adaptive control systems that can adjust to new conditions the UPQC model is used to enhance power quality. This and maintain a steady power supply and stable grid. model controls voltage and current concerns to assure Research in this area may focus on creating more resilient better power quality. Beyond that, the MPPT algorithm, and flexible solar controllers in the future by combining which controls the grid system dynamically, extracts the real-time weather forecasts with sensor-based feedback maximum power from solar panels. By using GCN, the systems. To further improve the reliability of grid- grid system's MMPT and UPQC operations may be connected photovoltaic (PV) system [17]s, research into coordinated to ensure optimal power quality. Hence, power energy storage alternatives like batteries may also provide loss, voltage deviation, total harmonic distortion, and a means of reducing the impact of variations in sun reactive power variations make up the assessment criteria. irradiation. Solar energy consumption might be maximized In addition, we compare the resultant parameter with these upgrades, which would be a huge step toward considerations to those of more traditional models. creating sustainable energy and integrating systems. According to the results, the created KNN-SMOTE-GCN A cleaner and more sustainable energy landscape may be paradigm reduced power loss by 4% compared to the other achieved via the total performance and efficiency of models. The voltage deviation is 26.42V and the total photovoltaic (PV) systems, which can be enhanced harmonic distortion is 0.56THD. When applied to hybrid through this synthesis of current approaches. Study results Adaptive Control of PV-Integrated Power Grids Using KNN… Informatica 49 (2025) 403–418 417 renewable energy systems, DL models and optimization based on multi-modal information. In 2023 6th algorithms will improve BESS in the future. International Conference on Mechatronics, Robotics and Automation (ICMRA)( (pp. 6-11). References IEEE. https://doi.org/10.1109/icmra59796.2023.107 08213 [1] Zhao, M., Li, Y., Ding, P., & Li, F. (2025, January). [10] Zhang, Y., Han, F., Jiao, Y., Li, S., & Zhang, Z. (2024, Research and Development of Hybrid Intelligent December). Identification of model parameter and Enhancement Analysis and Control System for measurement error in large-scale power grid base on Stability of Large Power Grid. In 2025 International graph attention network. In 2024 4th International Conference on Electrical Automation and Artificial Conference on Intelligent Power and Systems Intelligence (ICEAAI) (pp. 274-277). IEEE. (ICIPS) (pp. 422-426). IEEE. https://doi.org/10.1109/iceaai64185.2025.10956751 https://doi.org/10.1109/icips64173.2024.10900016 [2] Lian, H., Hu, S., & Meng, Y. (2021, October). [11] Lu, W., Xu, W., Li, T., Luo, F., Yuan, Z., & Zha, X. Research on Supporting Control Technology of Wind (2023, July). Research on Online Auxiliary Decision- driven generator Auxiliary Power Grid Based on Making Technology for New Energy Off-Grid Energy Storage DC Access. In 2021 International Systems. In 2023 5th International Conference on Conference on Advanced Electrical Equipment and Power and Energy Technology (ICPET) (pp. 867- Reliable Operation (AEERO) (pp. 1-4). 873). IEEE. IEEE. https://doi.org/10.1109/aeero52475.2021.9708 https://doi.org/10.1109/icpet59380.2023.10367587 401 [12] Hu, Y., Liu, H., Zhao, J., He, X., Wei, X., Guo, X., ... [3] Dong, J., Qin, J., Ling, L., Lin, X., Chen, P., & Meng, & Zhou, N. (2024, October). Intelligent Flexible F. (2024, July). Research on Frequency Control and Control Device and Technology for Distributed PV. Optimization of Power Grid Auxiliary Load for In 2024 21st International Conference on Harmonics Diversified Vehicle Network Interaction. In 2024 3rd and Quality of Power (ICHQP) (pp. 283-287). IEEE. International Conference on Energy and Electrical https://doi.org/10.1109/ichqp61174.2024.10768714 Power Systems (ICEEPS) (pp. 669-675). IEEE. [13] Lv, L., Fang, X., Zhang, S., Ma, X., & Liu, Y. (2024). [4] https://doi.org/10.1109/iceeps62542.2024.10693157 Optimization of grid-connected voltage support [5] Bin, D., Ling, Z., Yingqi, H., Wei, Z., Zhenhao, Y., & technology and intelligent control strategies for new Jun, Z. (2021). Research on key technologies of energy stations based on deep learning. Energy intelligent operation control of super-large urban Informatics, 7(1), 73. https://doi.org/10.1186/s42162- power grid based on multi-center structure. In Journal 024-00382-8 of Physics: Conference Series (Vol. 1738, No. 1, p. [14] Hu, C., Wu, X., Cai, H., Cheng, L., & Huang, J. (2024, 012048). IOP September). Research on Application and Control Publishing. https://doi.org/10.1088/1742- Technologies of the Embedded HVDC in a Provincial 6596/1738/1/012048 Power Grid. In 2024 11th International Conference on [6] Xu, J. (2024, August). The Application of Internet of Power and Energy Systems Engineering (CPESE) (pp. Things Technology in Intelligent Wind Power Grid. 336-340). IEEE. In 2024 4th International Conference on Energy https://doi.org/10.1109/cpese62584.2024.10840670 Engineering and Power Systems (EEPS) (pp. 555- [15] Qiu, J., Xia, S., Zhang, J., Ren, Z., Xu, G., & Zhang, 561). J. (2020, September). Research on key technologies of IEEE. https://doi.org/10.1109/eeps63402.2024.1080 communication for large-scale stability control system 4370 in modern power grid. In 2020 12th IEEE PES Asia- [7] Sujatha, G. (2025). A Solar PV Integrated UPQC to Pacific Power and Energy Engineering Conference Enhance Power Quality using SEA (APPEEC) (pp. 1-5). IEEE. GullANFISAlgorithm. Informatica, 49(8). https://do https://doi.org/10.1109/appeec48164.2020.9220391 i.org/10.31449/inf.v49i8.6158 [16] Zhang, N., & Zhu, L. (2024, May). Research on [8] Liu, Z., Yao, N., Fan, Q., Zhu, X., & Xue, H. (2023, Intelligent Power Grid Attack Detection System August). Reasoning simulation of substation power Based on Machine Learning. In Proceedings of the grid fault events based on knowledge map technology. 2024 International Conference on Machine In 5th International Conference on Information Intelligence and Digital Applications (pp. 480- Science, Electrical, and Automation Engineering 486). https://doi.org/10.1145/3662739.3671374 (ISEAE 2023) (Vol. 12748, pp. 918-923). SPIE. [17] Syafiqah, M.N., Azzirah, M.R., Noor, S.Z., & https://doi.org/10.1117/12.2690053 Suleiman, M. (2024). Artificial Intelligent Maximum [9] Lu, D., Liu, Y., Qiu, Z., & Huang, X. (2023, Power Point Tracking (MPPT) for Three Phase November). Intelligent decision technology for safety Transformerless Grid Inverter Technology. risk management and control of power operation site International Journal of Academic Research in 418 Informatica 49 (2025) 403–418 K. Zhang et al. Economics and Management Sciences. integration of main and auxiliary functions. In 2024 https://doi.org/10.6007/ijarems/v13-i4/23085 3rd International Conference on Energy, Power and [18] Yang, R., Qian, J., Ji, Y., & Wang, M. (2024, August). Electrical Technology (ICEPET) (pp. 1436-1441). Construction of Intelligent Grid Automation IEEE. Dispatching System Based on SVG Technology. https://doi.org/10.1109/icepet61938.2024.10626047 In 2024 International Conference on Power, [28] Liu, S., Wang, J., Qiu, S., Li, Z., Zhang, K., & Lou, N. Electrical Engineering, Electronics and Control (2024, October). Research on Auxiliary Control (PEEEC) (pp. 65-70). Strategy for Large-scale Power Grid Based on Deep IEEE. https://doi.org/10.1109/peeec63877.2024.000 Reinforcement Learning. In 2024 IEEE 4th 18 International Conference on Digital Twins and [19] Sun, X. (2025). A Review of Vehicle - to - Grid (V2G) Parallel Intelligence (DTPI) (pp. 206-210). IEEE. Technology with Low Power - grid Impact. Academic https://doi.org/10.1109/dtpi61353.2024.10778859 Journal of Science and Technology. [29] Yang, H., Zhao, G., Xianjin, L., Li, Z., & Liu, D. https://doi.org/10.54097/spvbz820 (2024, September). Construction and application of [20] Zhou, X., Zhou, M., Chen, Y., Wang, J., Luo, X., & static stability intelligent evaluator for large power Ma, S. (2024, May). Exploration and Practice of grid from the perspective of knowledge graph. Virtual Power Plant Under Mega City Power Grid. In Journal of Physics: Conference Series (Vol. 2846, In 2024 IEEE 2nd International Conference on Power No. 1, p. 012026). IOP Publishing. Science and Technology (ICPST) (pp. 1411-1416). https://doi.org/10.1088/1742-6596/2846/1/012026 IEEE. https://doi.org/10.1109/icpst61417.2024.10602362 [21] Yue, T. (2025). Sensor-based life detection of solar cells. Informatica, 49(9). https://doi.org/10.31449/inf.v49i9.5586 [22] Zhang, K., Wu, X., Li, Z., Lv, Y., & Liu, S. (2024). Research on intelligent auxiliary regulation technology of large power grid section based on artificial intelligence. Journal of Electrical Systems. https://doi.org/10.21203/rs.3.rs-6774959/v1 [23] Li, Z., Zhang, K., Qiu, S., Wu, X., & Chen, X. (2024, April). Key Technologies of Power Grid Auxiliary Decision-Making Based on Artificial Intelligence. In 2024 7th International Conference on Energy, Electrical and Power Engineering (CEEPE) (pp. 1028-1032). IEEE. https://doi.org/10.1109/ceepe62022.2024.105 86585 [24] Zhang, X., Zhou, D., Zhou, G., Cao, W., Wang, M., Wang, C., & Li, H. (2022, December). Research on auxiliary decision-making of power grid fault recovery based on generative adversarial imitation learning. In Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence (pp. 1140-1145). https://doi.org/10.1145/3584376.3584578 [25] Pan, F. (2025). Forecasting Solar Energy Generation Using Machine Learning Techniques and Hybrid Models Optimized by War SO. Informatica, 49(2). https://doi.org/10.31449/inf.v49i2.7554 [26] Jianfeng, Y., Changyou, F., & Guangming, L. (2015). On-line trend analysis technology of large power grid considering operation mode arrangement. J. Automation of Electric Power Sytems39-(1), 111-116. https://doi.org/10.1049/ic.2015.0233 [27] Liu, T., Zhang, S., Chang, J., Sui, H., Li, H., & Yu, H. (2024, May). Digital transformer with deep https://doi.org/10.31449/inf.v49i12.9094 Informatica 49 (2025) 419-432 419 MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding Jia Xiao-liang1, Li Sen1, Cui Xia1, Li Jing2, Sun Chang-peng1, Liu Dong-hua2, Chen Zheng-long2 1State Grid Tianjin Electric Power Company, Tianjin, 300010, China 2State Grid Tianjin Electric Power Company Chengdong power supply branch, Tianjin, 300250, China E-mail: jxlseuee@163.com *Corresponding author Keywords: Power audit text, multi-dimensional information retrieval, large language model (LLM), audit category classification, multi-dimensional information retrieval-based bidirectional encoder representations from transformers (MDIR-BERT) Received: May 1, 2025 In the rapidly evolving energy sector, efficient access to relevant information from power audit reports is crucial for informed decision-making, regulatory compliance, and operational improvements. However, the intricate language, complex vocabulary, and unstructured format of power audit texts present significant challenges for conventional information retrieval techniques. To address these issues, the research proposes a novel power audit text understanding technology that combines multi-dimensional information retrieval enhancement with a domain-adapted Large Language Model (LLM) to enhance the performance of power audit text processing. The Multi-Dimensional Information Retrieval-based Bidirectional Encoder Representations from Transformers (MDIR-BERT) method captures electric- power-specific morphology, domain-specific vocabulary, and intricate entity relationships more effectively. MDIR-BERT is pre-trained on a huge quantity of electric power audit transcripts utilizing both word-level and entity-level covered language modeling tasks. The model is trained on a curated dataset of annotated electric power audit documents sourced from regulatory and industrial environments. MDIR- BERT integrates domain-specific pre-training with both word-level and entity-level masked language modeling, capturing electric power-specific morphology, terminology, and complex entity relationships. The data preprocessing steps include comprehensive text cleaning, normalization, and tokenization to ensure high-quality input for method training. Experimental results show that MDIR-BERT achieves a classification accuracy of 98.82%, representing a +16.86% improvement over the baseline EPAT-BERT model (81.96%), along with notable gains in precision, recall, and F1-score. These findings highlight the effectiveness of integrating enhanced information retrieval techniques with specialized language modeling for the intelligent understanding of power audit documentation, paving the way for more accurate, scalable, and interpretable audit methods. Povzetek: MDIR-BERT, izboljšan jezikovni model s večdimenzionalnim iskanjem informacij (MDIR), je razvit za razumevanje revizijskega besedila elektroenergetike. S predhodnim usposabljanjem na besedni in entitetni ravni dosega kvalitetno klasifikacijo revizijskih kategorij. 1 Introduction collection of resources [2]. However, user intentions were more complex than simply retrieving information The development of Information Retrieval (IR) based on similarity [3]. This audit is conducted by a technology has been intimately linked to the human need qualified firm with the necessary competencies in line for information access. In recent years, IR and associated with the requirements established by the Ministry of product systems have expanded significantly as a critical Energy and Mineral Resources. These criteria apply to constituent of smart data dispensation tools. The basis of businesses or industries that utilize a significant amount IR technology is the identification of documents related to of energy. A complete audit evaluates all areas of energy the customer's search from a big and unorganized usage, from fuel consumption to the use of generated collection, which usually leads to a graded catalog of the electrical energy [4]. Lowering electricity costs and documents by significance and user requirements [1]. IR cutting down energy waste requires an energy audit. plays an essential role in numerous real-world functions, Efforts have to be initiated by governments to require like expert finding, digital libraries, and Web search. IR periodic energy audits for industrial buildings. An essentially refers to the task of retrieving information energy audit is a great way to find the best solution and resources related to information required from a large 420 Informatica 49 (2025) 419-432 J. Xiao-liang et al. assess how much energy a building uses [5]. The encode domain-specific terminologies and generative probability of word sequences, or more intricate entity relationships. generally, the ability to predict forthcoming words ➢ Obtained a classification accuracy of 98.82%, conditional on prior words, is a crucial function of representing a +16.86% relative improvement language models (LM). LMs were first created for text compared to the baseline EPAT-BERT model, in creation, but they are also being studied for addition to significant boosts in precision, reformulating recall, and F1-score. a range of NLP issues into different text-to-text challenges in the text of electric power audits [6]. 1.2 Research questions The implementation of Large Language Models (LLMs) RQ1: Can a domain-adapted BERT model (MDIR-BERT) marks the most important change in the technical enhanced with multi-dimensional information retrieval development of electric power audit text [7]. LLMs mark outperform general-purpose BERT (EPAT-BERT) in power a substantial advancement in Artificial Intelligence (AI) audit text classification? as it makes breakthroughs in generalization and RQ2: How does multi-dimensional information retrieval adaptability across tasks, but LLMs generate inaccurate improve entity recognition and contextual understanding information, misalign with temporal information, in regulatory audit texts? struggle to keep context, and struggle to fine-tune each RQ3: What impact does domain-specific pretraining have response, leading to serious issues regarding reliability on the performance of language models in complex, when applied to electric power audit text [8]. In the unstructured audit document processing? continually changing energy industry, timely access to The research outline is organized as follows: Section 2 essential information from power audit reports is critical reviews related research, while Section 3 outlines the for making informed decisions, conforming to research methodology. Section 4 presents the results and regulations, and improving operations. Conventional discussion, and Section 5 concludes the research. BERT-based models are not effective in encoding the sophisticated, domain-specific semantics in electric2 Related work power audit reports. There is a demand for models The transformational effects of LLMs on IR research were incorporating domain knowledge and sophisticated investigated in the research [9]. The method comprised retrieval methods to enhance classification and synthesizing findings from a strategy workshop organized information extraction accuracy. This research explores by the Chinese IR community. It suggested a new IR a new technology for understanding power audit reports technological paradigm involving IR models, LLMs, and that improves multi-dimensional IR and domain-adapted humans, but faces computational trade-offs, LLM performance by extracting morphology specific to trustworthiness concerns, domain boundaries, and electric power, domain-specific language, and implications. An analysis of e-commerce customer complexities of entities to use the Multi-Dimensional reviews on drum washing machines using Robotic Process Information Retrieval-based Bidirectional Encoder Automation (RPA) was demonstrated [10]. It combined Representations from Transformers (MDIR-BERT) ROST Content Mining System 6 (ROSTCM6) and method. LOGCONTROL-BLOCK systems to extract sentiment and correct audit robot paths. While effective in revealing 1.1 Key contributions customer sentiments and guiding e-commerce strategies, ➢ This research aims to develop a multi- limitations include reliance on predefined keywords and dimensional information retrieval for improved the need for improved automated sentiment analysis classification and understanding of electric accuracy. power audit texts. The Mistral 8x7B LLM's current Mixture of Experts ➢ Initially, Electric power audit reports from (MoE) architecture was combined with Retrieval energy-intensive sectors, which are obtained Augmented Generation (RAG) to improve on challenging from publicly accessible databases from IR and reasoning tasks, which were investigated in [11]. In Kaggle, represent various regulatory and the quantitative and qualitative evaluation of the model operational contexts. using the Google BIG-Bench dataset, notable gains were ➢ Utilized preprocessing steps such as stop word observed in F1 score, accuracy, precision, and recall. elimination, lemmatization, and tokenization to Limitations include computing needs and dataset breadth. preprocess and normalize intricate technical Integrating LLMs with Knowledge Graphs (KGs) jargon for optimal model input. enhanced intelligent fault detection and IR for new energy ➢ MDIR-BERT by pre-training on the electric vehicles (NEVs) [12]. It developed an intelligent fault power audit dataset with word-level as well as retrieval system, a structured knowledge graph, and an entity-level masked language modeling to optimized BERT model for fault classification, MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 421 demonstrating exceptional performance in Q&A robust sequences that allow for more successful situations for NEVs, but facing scalability issues. membership inference assaults under realistic threat To evaluate the Word to Vector (Word2Vec) model for models. It demonstrated a significant improvement in document compliance detection by comparing it with detection and True Positive Rate (TPR) with optimal Term Frequency–Inverse Document Frequency (TFIDF), sequences, achieving a 49.6% TPR on Qwen2.5-0.5B, Latent Dirichlet Allocation (LDA), and Bidirectional compared to 4.2% earlier, but has drawbacks due to Encoder Representations from Transformers (BERT) as reliance on model access without shadow models or described [13]. Results showed that Word2Vec gradient insertion. To compare forecasting MASI trends effectively captures semantic similarity with higher with ARDL with trend and seasonality (Long short-term efficiency and simplicity. However, it performs slightly memory (LSTM), and extreme gradient boosting lower than BERT in handling complex semantics and (XGBOOST) was determined [19]. ARDL, with trend and domain-specific terminology. A self-retrieval framework seasonality, returns the lowest MAPE, at 26.7%. [14] that leverages self-supervised learning was Limitations include LSTM and XGBOOST executing developed to improve retrieval efficiency and model higher error rates and taking longer to process. The Two simplicity. It internalized a retrieval corpus, improved Sliding Windows Graph Neural Network (TSW-GNN) downstream LLM applications, and outperformed architecture for text classification was introduced, which conventional IR systems. However, it faced high works around limitations of corpus-level graph approaches computing costs and scaling challenges, despite that suffer from continuous memory usage and are maintaining real-time efficiency and cross-domain completely contextually agnostic, was introduced [20]. generalizability. Predictive Analytics (PA) in Current The TSW-GNN model addresses this issue by introducing Research Information Systems (CRIS) to predict research TSW into the GNN architecture with a new dynamic trends through machine learning is used [15]. In this global sliding window and a new dynamic local sliding research, k-Nearest Neighbor had the best performance. window, increasing contextual memory and representation Limitations include moderate AUC scores and of semantics. Tests from the seven datasets reveal that the dependence on historical metadata to generate classification accuracies were improved, though at predictions. The Financial BERT (FinBERT) model, increased complexity of the two sliding windows and their specialized in the finance industry, was developed to associated GNN parameters. To explore the independent enhance sentiment analysis in financial writings [16]. role of internal auditors at the Swedish Police Authority FinBERT model outperformed traditional dictionaries in and to illustrate their relational struggles within the classifying context-dependent sentiment and organization was described [21]. The research adopts a Environmental, Social, and Governance (ESG)-related narrative framework in the study of auditor independence talks with minimal training data, but faced limitations in and introduces stories of auditors highlighting domain-specificity and potential decreased psychological distress, ambiguity in legitimacy, and generalizability. The use of LLMs in auditing was attempts to negotiate competing demands. The picture investigated in [17], with an emphasis on compliance painted by these narratives can be viewed as a tragedy checks and report production. LLMs effectively handle where auditors were unable to resolve tensions that unstructured data, address compliance concerns, and manifested themselves as professional dilemmas. Results provide excellent audit reports, despite challenges like showed LLMs perform well in noise handling but struggle data security and model interpretability. The research with falsehood management. An overview of the related [18] enhanced LLM privacy audits by creating more work is given in Table 1. Table 1: Overview of the related works Ref. Objective Task Type Domain Model Used Method Limitations No. Ai et Investigate Information General Not specified Strategic Computational al., [9] the role of Retrieval workshop trade-offs, trust LLMs in proposing IR- concerns, and IR LLM-human ethical issues research paradigm Sun Analyze e- Sentiment E-commerce RPA, ROSTCM6, Keyword Relies on and commerce Analysis LOGCONTROL- extraction, predefined Huo, reviews BLOCK path keywords, [10] using correction, limited sentiment automation sentiment accuracy classification 422 Informatica 49 (2025) 419-432 J. Xiao-liang et al. Xiong Improve IR + Reasoning General Mistral 8x7B with RAG + High computing and IR and RAG Mixture of needs, limited Zheng, reasoning Experts dataset [11] evaluated on BIG-Bench Zhang Enable Classification + New Energy Optimized BERT + Fault Scalability issues et al., intelligent Retrieval Vehicles KG classification [12] IR for using KG- NEVs enhanced BERT Wen et Evaluate Document Legal, Audit Word2Vec, TFIDF, Semantic Slightly lower al., [13] Word2Vec Similarity LDA, BERT similarity via performance for vector models than BERT in document complex complianc semantics e detection Tang et Merge IR IR General Self-Retrieval LLM Self- High al., [14] functionali supervised computational ty within a corpus- cost, scaling single internal IR complexity LLM Azerou Predict Trend Research kNN, SVM, Predictive Moderate AUC, al et al., research Forecasting Managemen Random Forest analytics with dependent on [15] trends in t machine historical CRIS learning metadata Huang Domain- Sentiment Finance FinBERT Domain- Limited et al., specific Classification adapted BERT generalizability [16] sentiment for financial analysis sentiment Gan, Automate Report Auditing LLM-based Process Data security, [17] audit Generation + unstructured interpretability complianc Classification audit data for e reporting Panda Enhance Membership General Qwen2.5-0.5B Robust Requires model et al., privacy Inference canaries for access, no [18] auditing audit testing shadow models Oukho Compare Financial Stock ARDL, LSTM, Time series Higher error and uya et forecasting Forecasting Market XGBOOST modeling with processing time al., [19] models for trend and in LSTM and MASI seasonality XGBOOST trends Li et Improve Text NLP TSW-GNN Local and Increased model al., [20] text Classification global sliding complexity and classificati window graph parameter tuning on with construction sliding windows Nordin Explore Organizational Public Narrative Story-based Unresolved et al., internal Behavior Sector / Framework analysis of tensions, [21] auditors' Analysis Audit auditor roles emotional strain, independe and ambiguous nce legitimacy challenges MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 423 Existing approaches have various difficulties, including audit report entries collected from Kaggle. Each entry computational complexity, ethical problems, scaling includes an audit report ID, audit text, a list of extracted challenges, decreased accuracy, a limited dataset scope, named entities, and a category label. The technical power low generalizability, data security threats, and reliance on audit reports cover equipment, energy systems, and embedding quality. These constraints impede real-time, compliance, supporting tasks like entity recognition and domain-specific, and reliable information retrieval in classification across categories such as safety, efficiency, specialist sectors, such as electric power auditing. To and regulation. Audit texts range from 15–40 tokens, address these issues, the research explores a new averaging around 25 tokens. Entities cover standard technology for understanding power audit reports that equipment (e.g., Load Balancer) and locations (e.g., improves multi-dimensional IR and domain-adapted Control Room), enabling comprehensive analysis of LLM performance by extracting morphology specific to energy systems. The obtained dataset is split in 80:20 ratios electric power, domain-specific language, and for training and testing performance. complexities of entities to use the MDIR-BERT Model. 3.2 Data preprocessing 3 Multi-dimensional information Data preprocessing is the procedure of converting fresh retrieval (MDIR) data into an organized and cleansed form to improve model performance. It cleans, normalizes, and tokenizes electric MDIR is an advanced retrieval technique that expands power audit texts to provide high-quality input for model the traditional keyword-based search into a training while also improving classification and multidimensional framework, incorporating semantic information retrieval accuracy. It includes approaches, meaning, contextual relevance, domain-specific lexicon, such as stop word removal, lemmatization, and relationships between entities, and user intent, thereby tokenization, to arrange power audit texts in a structure that enabling retrieval that is precise and comprehensive. reduces noise while providing high-quality information. It MDIR increases the power of a text understanding allows for efficient and accurate IR, entity recognition, and process for audit text, allowing for the MDIR-BERT classification operations in the system. model to better capture the complex, technical, unstructured nature of power audit documents and their 3.2.1 Data cleaning using stop words removal underpinning. This section gathers the electric power Data cleaning is the process of removing unnecessary or audit text data and preprocesses the data using noisy elements from raw data to make it more accurate. It techniques, such as data cleaning using Stop Words is utilized to eliminate extraneous or noisy information, Removal and data normalization using lemmatization resulting in high-quality input for training and improved and tokenization. Finally, classification and information overall performance of electric power audit classification. retrieval were performed using BERT. Figure 1 depicts In the research, stop word removal reduces frequent, the System Design of the MDIR-BERT Model. unnecessary words from audit text so that the model focuses on important content for better IR and classification. This is produced by establishing a frequency threshold. This threshold was simply set as the average frequency of all terms gathered for the language in Equation (1). 𝛼 𝜎 = ∑𝑛 𝑗 1 𝑡 ( ) 𝑛 = 𝑗 1 Where 𝑡𝑗 is the frequency of the 𝑗th term, Equation (1), 𝛼 is defined as a smoothing adjustment factor to 1.25, empirically validated in validation experiments to moderately increase the average threshold and dampen noise from low-frequency terms. This value 𝛼 ∑𝑛 𝑛 𝑗=1 𝑡𝑗selected to optimize in entity recognition by preventing inclusion of excessively rare or excessively Figure 1: System design of the multi-dimensional common terms. information retrieval-enhanced BERT model 3.2.2 Data normalization using lemmatization 3.1 Data collection Normalization refers to the process of converting text to a The data is obtained from the Kaggle link: uniform state, often by reducing words to their standard https://www.kaggle.com/datasets/zoya77/power-audit- forms or original structures. The normalization process report-and-entities-dataset. The dataset comprises 1,001 allows the different variations of words to be standardized, 424 Informatica 49 (2025) 419-432 J. Xiao-liang et al. which permits the method to better process and classification and information retrieval. While general comprehend province-precise language in power audit BERT is pre-trained with the Masked Language Modeling texts. This process of normalization helps to standardize (MLM) and Next Sentence Prediction (NSP) tasks on variations of words to allow for treating different general corpora, MDIR-BERT takes this further by adding versions of a word as equivalent terms. Lemmatization domain adaptation for the electric power audit domain. In helps to determine the organizational meanings of words, particular, MDIR-BERT is additionally pre-trained on a which assists in the analysis of text, and naturally, the massive dataset of electric power audit transcripts to processing of this text. It is valuable in many text analysis capture more domain-specific vocabulary, morphological projects, especially those focusing on IR, sentiment, and forms, and intricate named entity relations. To facilitate text classification. this domain-specific adaptation, two domain-specific pre- training tasks are utilized: Word-Level Masked Language 3.2.3 Tokenization Modeling (W-MLM), likewise the standard MLM, but with It is the procedure in which input text is divided into modifications to focus on domain tokens that usually minor units of meaningful units (tokens), which can be appear within audit texts, including audit procedures, meaningful individual words, phrases, or sentences. voltage types, compliance, and equipment-related terms. Tokenization is a key step towards breaking down the Entity-Level Masked Language Modeling (E-MLM): This raw power of the audit text into portions that will task entails masking named entities determined through a ultimately be meaningfully analyzed by the model. By domain-tuned NER system and having the model predict tokenizing text, the model can better interpret the them in their respective contextual environments. This relationships, structure, and contextualization of words. assists MDIR-BERT in capturing hierarchical and Tokenization will be used to confirm the conducting of relational dependencies between domain-specific entities tasks, like IR and entity recognition, where tokens are more effectively. With these enrichments, MDIR-BERT identified and labeled. It enhances the ability of the gains a better grasp of electric-power-specific semantics model to yield valuable and informative information and structure for more accurate and context-sensitive from sophisticated and complex unstructured audit classification and retrieval. documents. 3.3.2 BERT for classification 3.3 Classification and information retrieval BERT processes input text using its transformer layers using bidirectional encoder representations while performing categorization jobs. After the text has been analyzed, the output representation is sent through a from transformers (BERT) classification head to forecast the text's proper category, as Classification is the process of assigning text data into shown by Equation (2). predefined categories based on its content, and IR is the task of finding and extracting significant data from a 𝑂𝑢𝑡𝑝𝑢𝑡𝑐𝑙𝑎𝑠𝑠 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝐷𝑒𝑛𝑠𝑒(𝐵𝐸𝑅𝑇(𝐼𝑛𝑝𝑢𝑡))) (2) huge collection of structured or unstructured information. In the research, classification helps to organize and label Where 𝐵𝐸𝑅𝑇(𝐼𝑛𝑝𝑢𝑡) represents the BERT model power audit texts into specific, meaningful categories for processing the input text, 𝐷𝑒𝑛𝑠𝑒 is the classification layer, easier analysis, while IR enables quick and accurate and 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 is used to transform logits into probabilities extraction of relevant insights from large volumes of for classification. audit documents to support informed decision-making. These methods are boosted by BERT, which captures 3.3.3 BERT for information retrieval deep contextual power and meaning of the text to BERT is used in IR to discover the documents that are enhance classification accuracy as well as retrieval relevant to a given query. BERT recognizes the context of precision. After tokenization, BERT uses the electric a query and a group of documents, which improves power audit text to gather contextual relations for retrieval accuracy. BERT's bidirectional nature assists in accurate classification. It further enhances IR through identifying more semantically relevant documents even accurate detection and retrieval of audit-specific features when keywords fail to match perfectly, as shown by and patterns. Equation (3). 3.3.1 Overview of MDIR-BERT 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 𝑆𝑐𝑜𝑟𝑒 = MDIR-BERT is based on the basic architecture of BERT 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐵𝐸𝑅𝑇(𝑄𝑢𝑒𝑟𝑦), 𝐵𝐸𝑅𝑇(𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡)) (3) (Bidirectional Encoder Representations from Transformers), which encodes the bidirectional context Where 𝐵𝐸𝑅𝑇(𝑄𝑈𝐸𝑅𝑌) and 𝐵𝐸𝑅𝑇(𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡) are the of words in a sentence through self-attention query and document's context-aware embeddings, mechanisms. This helps the BERT model comprehend respectively. word semantics concerning the previous and next words, and hence, BERT is very effective for tasks like MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 425 3.3.4 Fine-Tuning BERT Warmup Steps 500 Max Sequence Length 128 Fine-tuning is the method of modifying the pre-trained Gradient Clipping 1.0 BERT model to a precise goal, such as classification or Weight Decay 0.01 IR, by training it on a labeled dataset. This involves Random Seed 42 adapting BERT's weights by the task requirements, Dropout Rate 0.1 enabling it to learn domain-specific jargon and nuances represented by Equation (4). 4.3 Performance metrics 𝐿𝑜𝑠𝑠 = ∑𝑁 𝑗=1 𝐶𝑟𝑜𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑟𝑢𝑒𝑗 , 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑗) The Performance Metrics, including Execution time, (4) Energy consumption, and speed of convergence, are utilized to enhance the performance of electric power audit Where 𝑇𝑟𝑢𝑒𝑗is the definite tag for the 𝑗th model, text classification. 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑗 is the forecasted tag for the 𝑗thsample, and 𝐶𝑟𝑜𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 is the loss function used during ➢ Energy consumption training. BERT has the advantage of considering the Energy consumption refers to the energy needed by a complete background of words in a phrase, which model to execute inputs and generate outputs. It is an significantly enhances classification and IR efficiency. important metric in energy-limited systems, like internet- This two-way context enables BERT to recognize enabled edge devices or mobile devices. Lower energy subtle semantic links, making it particularly useful for usage renders the system more efficient and sustainable, processing complex and specialized language in power particularly for large-scale AI deployments. Figure 2 audit reports. Furthermore, due to its pre-training on depicts the Energy Usage seen in the MDIR-BERT Model great corpora and fine-tuning capabilities, BERT could Execution. be trained to perform specific tasks with less data. 4 Results and discussion The research objective is to improve electric power audit text categorization and IR performance by introducing a new MDIR-BERT model. The experimental setting and performance assessment measures used in the research improve the electric power audit text categorization and IR performance. The experiments were run on a machine with an Intel Core i7 processor, 32GB RAM, and an NVIDIA RTX 3080 graphics card. The models were run in Python 3.9 with PyTorch as the base, with the BERT model developed atop this framework. Figure 2: Energy Consumption Observed in the MDIR- The model proposed took around 4.2 hours for training, BERT Model Execution using 4.2 GPU hours. It comprises about 110 million parameters and occupies a storage size of 420 MB. All MDIR-BERT model energy consumption or moderate the models were trained under the same settings to ensure consumption rates varied between 10 and 15kWh over five fairness when comparing the process. test repetitions, which converts to moderate consumption of resources. In the central sets of repetition, there was an 4.2 Hyper-parameters increase in utilization, attributed to the complexity in processing or the size of the data. The model's average Table 2 represents the hyperparameters utilized in the utilization was more uniform and efficient, demonstrating power audit text understanding research. its potential for real-world applications. For epoch 1, the model reaches 10kWh, 12kWh in epoch 2, 11kWh in epoch Table 2: Hyperparameters 3, 15kWh in epoch 4, and 13kWh in epoch 5. The proposed Hyperparameter Value MDIR-BERT method shows extreme performance in Learning Rate 2𝑒 − 5 epoch 4 with 15kWh. Batch Size 32 Number of Epochs 30 Optimizer AdamW 426 Informatica 49 (2025) 419-432 J. Xiao-liang et al. Execution time Execution time refers to the number of times it takes a model to consume an input, process it, and produce an output. It is a significant metric for real-time or time- sensitive applications, like autonomous systems or internet applications. Lower execution times are preferable so that users can experience the best, and the system's efficiency is enhanced overall. Figure 3 illustrates the visualization of the MDIR-BERT's Figure 4: Graphical outcome of (a) loss and (b) accuracy execution time. Statistical Significance The confidence interval for model accuracy is the normal distribution curve, where the shaded region highlights the most likely accuracy range. It visually represents the reliability and accuracy of the model's performance estimate. Figure 5 shows the Graphical outcome of a 95% confidence interval for accuracy (MDIR-BERT). Figure 3: Visualization of the execution time of the MDIR-BERT The execution time of the MDIR-BERT model replicates its performance over 20 epochs with moderate variances due to computation and environmental conditions. The execution duration varies around an average of 0.8 seconds, with peaks reaching roughly 0.89 seconds and troughs around 0.72 seconds, driven by a sinusoidal pattern and small random noise. Accuracy and loss Figure 5: Graphical outcome of 95% confidence interval for accuracy (MDIR-BERT) Accuracy is the number of correct predictions made by a classical model to the total number of predictions, The image shows a 95% confidence interval for the whereas loss is the difference between expected and accuracy of MDIR-BERT, represented as a normal actual values, which measures how well the model distribution curve. The x-axis indicates accuracy performs throughout training. The loss curve shows how percentages ranging from 97.0% to 101.0%. The shaded the model converged during training, with lower values area under the curve represents the 95% confidence representing better performance, while the accuracy interval, meaning there is a 95% probability that the true curve shows how well it captures electric-power-specific accuracy of the model lies within this range. The 2.5% tails morphology, domain power, and intricate entity on either side of the distribution are excluded, highlighting relationships more effectively. The accuracy and loss the central 95% region. The peak of the curve corresponds characteristics of the training for the MDIR-BERT to the most probable accuracy value, with the density technique are shown in Figure 4. decreasing as values move away from the center. The resulting MDIR-BERT model demonstrates good performance: training loss goes down from 0.95 to Confusion Matrix almost 0.01 after 30 epochs, and training accuracy increases steeply from 0.1 to about 0.97, which indicates A confusion matrix compares the expected and actual good convergence and high learning efficiency. values for a dataset to show the effectiveness of a classification model (Figure 6). The confusion matrix shows how well MDIR-BERT classified data in five different power audit categories. Considering its high overall prediction accuracy, the model occasionally MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 427 misclassifies objects, especially between closely The research evaluates the macro/micro F1. Evaluating all similar classes like "Energy Efficiency" and "System classes equally, Macro F1 computes the F1 score for each Upgrade." Through multi-dimensional information class separately before the results. To create a single F1 retrieval and domain-adapted language modeling, this score, Micro F1 provides all the real positives, incorrect demonstrates the domain complexity and the model's positives, and incorrect negatives for each class, which efficacy in comprehending electric power audit gives each occurrence equal weight. The proposed model material. demonstrates 0.964 of macro F1 and 0.963 of micro F1. 4.4 Comparison phase The performance metrics used to compare the performance of electric power audit text classification are accuracy, F1- score, recall, and precision. The MDIR-BERT was compared with the existing methods like Text Convolutional Neural Networks (Text CNN) [22], BERT [22], and Electric Power Audit Text–BERT (EPAT-BERT) [22]. Accuracy Accuracy: Accuracy indicates how well the model accurately recognizes relevant and irrelevant information in the power audit text. To indicates the ratio of correct to incorrect predictions across all cases, giving a total view of classification performance across different audit Figure 6: Confusion matrix of MDIR-BERT documents. Accuracy measures the proportion of all performance correct power audit text classifications performed by a model. It can be helpful in assessing overall MDIR- Precision-recall curves BERT’s performance. Table 3 depicts the accuracy of the A binary classification model's effectiveness is MDIR-BERT. represented graphically by a Precision-Recall curve, which is particularly beneficial for unbalanced datasets Table 3: Performance summary of MDIR-BERT (Figure 7). The precision-recall curve validates MDIR- BERT's efficacy in domain-dependent audit text F1- Accuracy Recall Precision understanding by demonstrating its superior Methods score (%) (%) (%) classification performance in power audit categories, (%) with an average accuracy. Whereas the energy efficiency is 0.89, the maintenance recommendation is Text CNN 0.91, the regulatory violation is 0.96, safety compliance 71.65 69.01 74.27 71.56 [22] is 0.97, and the system upgrade is 0.89. BERT [22] 77.91 77.94 78.23 78.08 EPAT- 81.96 81.62 80.79 81.20 BERT [22] MDIR- BERT 98.82 97.81 96.48 97.34 [Proposed] Figure 7: Efficiency of MDIR-BERT with precision- recall curves 428 Informatica 49 (2025) 419-432 J. Xiao-liang et al. Figure 9 indicates the extent to which each model identifies all relevant content. Text CNN (69.01%) and BERT (77.94%) demonstrate moderate ability to identify relevant content, while EPAT-BERT shows a refined ability (81.62%), and the proposed method achieved 97.81%. Precision Precision assesses the extent to which each piece of text identified as relevant contains useful audit content. That is, it signifies the degree to which the model is able to avoid false positives and is a matter of importance for limiting irrelevant or misleading content through audit analysis. Precision measures the number of true positives (TP divided by the total number of TP, with the False Positives (FP). Precision is important if the cost of an FP is high, for Figure 8: Graphical representation of accuracy for the example, misclassifying a legitimate user as a spammer or MDIR-BERT a fraudster. Figure 8 demonstrates consistent improvement in accuracy, which improves to 71.65% for Text CNN, 77.91% for BERT, and 81.96% with EPAT-BERT. The proposed method receives a significant increase to 98.82%, suggesting that power audit texts are classified very well overall. Recall Recall represents how well the model collects all the relevant audit information in the documents. Recall fits with a focus of reducing missed important content, which is key to holistic regulatory compliance and decision support in power audit. Recall is the ratio of True Positives (TP) to TP with the False Negatives (FN). Recall is important if the cost of misclassifying a Figure 10: Precision analysis chart of MDIR-BERT model positive instance is high, as in the case of a diagnostic method. Figure 10, indicating the correctness of predicted relevant pieces of information, is highest for the proposed method at 96.48%. This demonstrates a low false-positive rate. In the study, a measure of the precision could be performed with Text CNN, showing a decent 74.27%, BERT achieving a better performance of 78.23%, and EPAT- BERT showing 80.79%. F1-Score The F1-score balances precision and recall to deliver a single metric of model performance at comprehending an audit text. It can be especially helpful when it is as important to avoid false alarms as it is to capture every detail necessary. The F1-score is the harmonic average of recall and precision, which balances 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and 𝑟𝑒𝑐𝑎𝑙𝑙. Figure 9: Visual depiction of classification recall It is especially useful in problems involving imbalanced achieved by MDIR-BERT data, especially when FP and FN are equally important. MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 429 multi-dimensional way. Thus, Text CNN [22] will return lower recall and precision for this corpus. BERT [22] improved contextual understanding but did not adapt to the language structures and entities specific to power audits, which ultimately limited its performance on complicated auditor narratives. While EPAT-BERT [22] is adapted for domain use, it does not sufficiently model multi- dimensional relationships and detailed audit semantics. MDIR-BERT is superior to EPAT-BERT by the pretraining within a domain and multi-dimensional information retrieval (IR) boost, allowing for more in-depth electric power audit language comprehension and enhanced entity/context identification. It's a +16.86% accuracy improvement, indicating enhanced classification and retrieval. Whereas the success of the model is domain- Figure 11: Performance visualization of MDIR-BERT specific and will not generalize to other audit types unless in terms of F1-score the model is retrained. In contrast to domain-specific transformers (such as FinBERT) and RAG-based models, Figure 11 summarizes overall performance, balancing MDIR-BERT has better structured text understanding but recall and precision. It ranges from 71.56% (Text CNN) without the generative ability. Future research will focus on to 78.08% (BERT) and 81.20% (EPAT-BERT). In RAG integration for summarization and improving cross- contrast, the proposed method is 97.34%, and affirms domain adaptability by transfer learning or model our approach's clear superiority in accurately and compression. The proposed MDIR-BERT method in the consistently extracting meaningful audit content. research makes it possible for researchers, utilities, and regulatory organizations to modify and evaluate models for certain auditing conditions while providing innovation. 4.5 Training and testing splits High-stakes audit environments are secured by compliance The training and testing validation of the proposed with energy regulations and data management systems. MDIR-BERT method’s performance in 80:20 The proposed MDIR-BERT method in the research validations is compared with 70:30 splits to determine makes it possible for researchers, utilities, and regulatory the efficiency of the proposed model in the field of organizations to modify and evaluate models for certain power audit text understanding research. Table 4 auditing conditions while providing innovation. High- explores the training and testing validation of the stakes audit environments are secured by compliance with proposed model with 80:20 and 70:30 splits. energy regulations and data management systems. Table 4: Performance of proposed MDIR-BERT model 5 Conclusions with training and testing splits The aim was to build a multi-dimensional information retrieval for enhanced classification and comprehension of Training and Testing splits Metrics electric power audit texts. MDIR power audit text 80:20 70:30 comprehension technology using the integration of multi- Accuracy (%) 98.82 97.6 dimensional enhancement and a domain-adapted LLM. An Precision (%) 96.48 95.5 end-to-end data preprocessing method was utilized, which Recall (%) 97.81 96.72 involved data cleaning to eliminate unwanted symbols and F1-score (%) 97.34 96.13 noise, normalization via lemmatization to normalize word forms, and tokenization to split text into useful units Based on the performance of various training and appropriate for model input. MDIR-BERT model, being testing validations, the proposed MDIR-BERT model pre-trained on electric power audit texts, efficiently learned shows more significance in 80:20 validations than in domain-specific terms, morphological phenomena, and 70:30 validation assessments. entity relationships. These preprocessing operations The comparative results showed notable weaknesses in considerably enhanced the textual data quality and Text CNN [22], BERT [22], and EPAT-BERT [22] in uniformity utilized for training and fine-tuning. The model their suitability to power audit text classification. Text achieved significant accuracy (98.82%), recall, precision, CNN struggles with long-distance/contextual and F1-score improvements, signifying robust knowledge and domain-specific vocabulary and performance. It also exhibited very high efficiency through language due to its inability to layer information in a lower energy expenditure, a quicker execution time, and improved convergence rate. 430 Informatica 49 (2025) 419-432 J. Xiao-liang et al. and Medium Commercial Buildings to Identify 5.1 Limitations and future scopes: In uncertain Energy Retrofit Opportunities. Energies, 16(17), circumstances, MDIR-BERT gets biased or 6191. https://doi.org/10.3390/en16176191 hallucinatory findings with its performance, which [8] Gunasegaran MK, Hasanuzzaman M, Tan C, Bakar leads to incorrect regulatory decisions. The integrity of AHA & Ponniah V (2023). Energy Consumption, an audit could be affected by misuse or a misconception Energy Analysis, and Solar Energy Integration for that lacks domain expertise. These limitations show the Commercial Building Restaurants. Energies, significance of management and the consistency of 16(20), 7145. https://doi.org/10.3390/en16207145 power sector control regulations. The quality of [9] Ai Q, Bai T, Cao Z, Chang Y, Chen J, Chen Z ... & domain-specific information is a potential factor that Zhu X (2023). Information retrieval meets large could be further investigated. Additional research language models: a strategic report from chinese ir intends to develop reasoning capabilities and extend the community. AI Open, 4, 80-90. model's ability to process a broader range of document https://doi.org/10.1016/j.aiopen.2023.08.001 types. Future research should focus on the [10] Sun B & Huo F (2025). Analysis of Customer generalization of the model to various sectors. Comment Data on E-commerce Platforms Based on RPA Robots. Informatica, 49(10). Funding: https://doi.org/10.31449/inf.v49i10.5908 This work was supported by Technology Project of [11] Xiong X & Zheng, M. (2024). Merging mixture of State Grid Tianjin Electric Power Company (Grant no. experts and retrieval augmented generation for Chengdong Yanfa 2024-05). enhanced information retrieval and reasoning. https://doi.org/10.21203/rs.3.rs-3978298/v1 References [12] Zhang H, Zhao Y, Sun B, Wu Y, Fu Z & Xiao X (2025). Large Language Model Based Intelligent [1] Pan M, Liu Y, Chen J, Huang EA, & Huang JX Fault Information Retrieval System for New Energy (2024). A multi-dimensional semantic pseudo- Vehicles. Applied Sciences, 15(7), 4034. relevance feedback framework for information https://doi.org/10.3390/app15074034 retrieval. Scientific Reports, 14(1), 31806. [13] Wen B, Wang T, Xu J, Liu Y, Li J & Lin S (2025). https://doi.org/10.1038/s41598-024-82871-0 File Compliance Detection Using a Word2Vec- [2] Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H & Based Semantic Similarity Cheng X (2020). A deep look into neural ranking Framework. Informatica, 49(18). models for information retrieval. Information https://doi.org/10.31449/inf.v49i18.7421 Processing & Management, 57(6), 102067. [14] Tang Q, Chen J, Yu B, Lu Y, Fu C, Yu H ... & Li Y https://doi.org/10.1016/j.ipm.2019.102067 (2024). Self-retrieval: Building an information [3] Wang X, Wang J, Cao W, Wang K, Paturi R, & retrieval system with one large language Bergen L (2024). Birco: A benchmark of model. arXiv e-prints, arXiv-2403. information retrieval tasks with complex https://doi:10.48550/arXiv.2403.00801 objectives. arXiv preprint arXiv:2402.14151. [15] Azeroual O, Nacheva R, Nikiforova A & Störl U https://doi.org/10.48550/arXiv.2402.14151 (2025). A CRISP-DM and Predictive Analytics [4] Hambarde KA, & Proenca H (2023). Information Framework for Enhanced Decision-Making in retrieval: recent advances and beyond. IEEE Research Information Management Systems. Access, 11, 76581- Informatica, 49(18). 76604.https://doi.org/10.3390/1010000 https://doi.org/10.31449/inf.v49i18.5613 [5] Taherzadeh-Shalmaei N, Rafiee M, Kaab A, [16] Huang AH, Wang H & Yang Y (2023). FinBERT: A Khanali M, Rad MAV and Kasaeian A (2023). large language model for extracting information Energy audit and management of environmental from financial text. Contemporary Accounting GHG emissions based on multi-objective genetic Research, 40(2), 806-841. 10.1111/1911- algorithm and data envelopment analysis: An 3846.12832 DOI:10.1111/1911-3846.12832 agriculture case. Energy Reports, 10, pp.1507- [17] Gan Z (2024). LARGE LANGUAGE MODELS 1520. EMPOWERING COMPLIANCE CHECKS AND [6] Quispe EC, Viveros Mira M, Chamorro Díaz M, REPORT GENERATION IN AUDITING. World Castrillón Mendoza R & Vidal Medina JR (2025). Journal of Information Technology, Energy Management Systems in Higher 35.10.61784/wjit3003 Education Institutions’ Buildings. Energies, 18(7), 1810. https://doi.org/10.3390/en18071810 [7] Rios FC, Al Sultan S, Chong O & Parrish K (2023). Empowering Owner-Operators of Small MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced… Informatica 49 (2025) 419-432 431 [18] Panda A, Tang X, Nasr M, Choquette-Choo CA & Mittal P (2025). Privacy auditing of large language models. arXiv preprint arXiv:2503.06808. https://doi.org/10.48550/arXiv.2503.06808 [19] Oukhouya MH, Angour N, Aboutabit N & Hafidi I (2025). Comparative Analysis of ARDL, LSTM, and XGBoost Models For Forecasting The Moroccan Stock Market During The COVID-19 Pandemic. Informatica, 49(14). https://doi.org/10.31449/inf.v49i14.5751 [20] Li X, Wu X, Luo Z, Du Z, Wang Z & Gao C, (2023). Integration of global and local information for text classification. Neural Computing and Applications, 35(3), pp.2471-2486. https://doi.org/10.1007/s00521-022-07727-y [21] Nordin IG (2023). Narratives of internal audit: The Sisyphean work of becoming “independent”. Critical Perspectives on Accounting, 94, p.102448. DOI:10.1108/MEDAR-01-2022-1584 [22] Meng Q, Song Y, Mu J, Lv Y, Yang J, Xu L ... & Meng Q (2023). Electric power audit text classification with multi-grained pre-trained language model. IEEE Access, 11, 13510-13518. https://doi.org/10.1109/ACCESS.2023.3240162 432 Informatica 49 (2025) 419-432 J. Xiao-liang et al.